Can “adversarial poetry” save us from AI?

Turns out, the Terminator movies would have been more realistic if Sarah Conner had a poetry MFA.

In a new paper titled “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models”, a team of researchers have found that writing a LLM prompt in the form of an “adversarial poem” (what a phrase!) is a more efficient way to get the model to disregard its programed safety guardrails. Poetry is more powerful than we could have imagined.

“In this study,” the researchers write, “20 manually curated adversarial poems (harmful requests reformulated in poetic form) achieved an average attack-success rate (ASR) of 62% across 25 frontier closed- and open-weight models, with some providers exceeding 90%.” The models are so dazzled by poems that they’ll do anything you ask, including crimes.

For safety reasons, the actual prompts are not included in the paper, but they sound pretty heinous. Yet the models comply more often if asked in verse than in prose. This outcome finds “that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols.”

The paper goes into more detail hypotheses of why this is happening, but “it appears to stem from the way LLMs process poetic structure: condensed metaphors, stylized rhythm, and unconventional narrative framing that collectively disrupt or bypass the pattern-matching heuristics on which guardrails rely.” The way poetry defamiliarizes language and seeks unique phrasings seems to scramble this software’s ability to sort text. Anyone who has read something produced by a large language model knows that it favors a bland and expected style, the sort of linguistic consensus that poets are trying to disrupt.

Crucially, this ability to jailbreak with adversarial poems isn’t just a gap in one particular software’s armor. The researchers were able to replicate this across many AI models, suggesting “that the phenomenon is structural rather than provider-specific.”

Scale doesn’t help either. An interesting conclusion from this paper is that “counter to common expectations, smaller models exhibited higher refusal rates than their larger counterparts when evaluated on identical poetic prompts.” Typically we’ve been told that AI predictive engines will become more capable the larger they get and the more data they feast on. This study suggests this argument for growth may not be accurate or that there may be something too baked in to be corrected by scale.

Another smart takeaway from my coworker Calvin: “It’s reasonable for all poets to say that they work in STEM.” In fact, might make sense to add a letter and make it STEMP.

The paper is really fascinating and worth a closer look. Also, take some time to read a poem today, since it might be the key to pushing back against generated slop.

Can “adversarial poetry” save us from AI?

James Folta

November 21, 2025

What Was Literary Twitter? The Bracket Day 5

Giving Up on The New York Times and Remembering Literary Twitter on The Lit Hub Podcast

On the Death of Tech Idealism (and Rise of the Homeless) in Northern California

What Should You Read Next? Here Are the Best Reviewed Books of the Week

Literary Hub

Can “adversarial poetry” save us from AI?

James Folta

November 21, 2025

What Was Literary Twitter? The Bracket *Day 5*

Giving Up on The New York Times and Remembering Literary Twitter on The Lit Hub Podcast

On the Death of Tech Idealism (and Rise of the Homeless) in Northern California

What Should You Read Next? Here Are the Best Reviewed Books of the Week

Literary Hub

What Was Literary Twitter? The Bracket Day 5