Why AI Narrators Will Never Be Able to Tell a Real Human Story

I grew up with nary a TV to be seen. My mother’s parking lot come-to-Jesus moment and subsequent Evangelical faith translated to an aversion to He-Man and The Smurfs (an episode featuring Smurfette and a Voodoo doll put the nail in that one). Instead we read books, piles and piles of books. Every week I’d lug home a stack of books from the library, mostly science fiction and fantasy from the Golden Age: the Foundation series, along with Heinlein and Niven formed my earliest literary tastes.

I reveled in the speculative. I imagined a future in which automation would replace drudgery, freeing us for creative pursuits. I never imagined machines would come for my books and my voice.

My parents divorced when I was 18 months old, and my father moved to California. An actor and radio disc jockey, he was an early adopter of technology and nascent home studio equipment. He would record himself reading books and mail them cross country to small-town rural Illinois on cassette tapes. I’d eagerly hunch over my Fisher Price tape player in my room, listening to the voice of my father, disembodied but somehow also magically present, transporting me to other worlds. The power of the human voice, even transcribed imperfectly onto magnetic tape and played over a tinny speaker, was life-changing. Two components: a human soul, and a story, combined to create something new and meaningful.

Inspired to pursue my dream of telling stories, I discovered theatre and the joy of living stories on stage in front of people, swimming in a sea of eyes. In high school I joined the speech team and specialized in “prose interpretation.” To my mother’s not-so-mild distress I chose acting in college, studying theatre, and after moving to Chicago upon graduation I pursued whatever form of storytelling I could: commercial and film auditions, voiceover and theatre.

In 2005 I auditioned for my first audiobook, a grand 500-page historical fiction retelling of the Robin Hood legend. I realized there was a whole industry out there featuring the beautiful yet simple combination of a voice and a story—I was hooked.

I’ve narrated 800 titles now, for publishers large and small, in every genre and style. It’s exhilarating, exhausting, and I would gladly live this literary life till its end, keeping alive the dream of my father, though he is no longer alive to hear it. In 1993, the summer before my freshman year of high school, my father and his wife were killed in a car accident. Due to time and ungentle childhood handling, my cassette collection has dwindled to a single tape that bears his voice. The tape is a yellowed plastic Maxwell with a handwritten label. It’s a corny, spooky story that I believe he wrote called “The Beckon of Hampton Court,” featuring peak 80s sound effects, theremin wails, and nothing else but his voice.

For years leading up to the 2020s, the audiobook industry experienced rumblings of the coming robopocalypse. Amazon’s Kindle featured a read-aloud function that could narrate any eBook in its synthetic voice. Various small companies experimented with machine-generated voices, but they never really took off. A major saving grace was that Amazon’s audiobook arm, Audible, the #1 distributor and sales-point of audiobooks, did not allow nonhuman voices on its platform. Indeed, since its inception, Audible claimed to amplify the power of human storytelling. The pessimistic among us suspected that they were only enforcing this policy until they could figure out a way to monetize their own robot voices. We were right.

The power of the human voice, even transcribed imperfectly onto magnetic tape and played over a tinny speaker, was life-changing. Two components: a human soul, and a story, combined to create something new and meaningful.

In November of 2023 the robotic floodgates opened. For the first time, Audible allowed machine voices onto their platform (only their own). Dubbing them “Virtual Voices,” Audible offered Kindle Direct Publisher authors the ability to create audiobooks at the click of a button (called “AI audiobooks” by some). Within a month, over 10,000 robot books were uploaded and for sale. That number grew incredibly fast, within a few months surpassing 50,000 soulless “performances.” At the time of this writing, it hovers around 60,000.

I often teach classes on audiobooks to students looking to enter the industry, and one of the first things I impress upon them is the honor of performing this work. The ability to tell stories is what makes us human. The first homo sapiens 300,000 years ago were telling stories around the fire. In a term coined by Jonathan Gotschall, we are homo fictus, or “storytelling animals.” Narrating audiobooks today is the closest thing to that primal art form. One person, one voice, spinning a tale for another.

The corporate urge to replace art-making humans with machines is profoundly anti-human. It’s an old story: we have always tried to mechanize creativity and to remove messy human emotions. In the parable of John Henry a man’s livelihood is threatened by a machine, and he works himself to death in defense.

A man ain’t nothing but a man
But before I let your steam drill beat me down
I’d die with a hammer in my hand.

John Henry wins that last competition at terrible cost, but the listener knows, and John Henry knows deep down, that it is our last gasp. The machine, however, does not need to breathe; it will witness John’s last breath, and it will not mourn, for machines are not built to mourn.

Now the man that invented the steam drill
Thought he was mighty fine.

Eleven Labs is one of the tech startups leading the charge into artificial voice. By uploading a short sample, they can spit out a convincing-sounding clone. Would an audio clone of my father’s voice mean the same to me? As a child, when I received a tape it was as if I was receiving my father’s presence. His voice, pure and isolated, reached into my solitude and spoke to me. Against all odds, a series of sound waves hit my tympanic membrane and created meaning. More than meaning, it created relationship.

I struggle with the thought of doing this. When I teach audiobooks, I tell students that the words on the page are the low-hanging fruit, the actual treasure and goal of every narrator/actor is the subtext—that which is not said, not written, but instead absorbed through the silences between the words.

When a book is narrated aloud, a double art form is achieved, the book is interpreted by the narrator, who acts as a lens for the listener. Just as there are many productions of Ibsen’s A Doll House, and each is unique to the actors and theatre performing it, every recitation of a book is a singular performance. When preparing a manuscript, I tell my students to ask “where’s the love?” Every book is born out of love, every story told for a reason. I tell them even in a third person novel, that third person is a person. Who are they? Name them and why they’re telling this story, who they’re telling it to, and why now. In our world there is no such thing as an omniscient narrator.

A fellow narrator once shared the example of the dry and technical book she narrated about a rare form of cancer. She was having trouble finding the “in,” and resorted to reading the acknowledgements at the end of the book (a section we rarely include in the audio version). After thanking family and agents, at the very end the author drops the fact that as a child, her mother died from this cancer. There’s the heart and the subtext.

In acting and narrating we talk about intention. The intent of the author as well as our own in narrating. A machine cannot have intent.

Listening to my father, I heard the subtext. Sometimes it would surprise me when a well-known story such as “The Night Before Christmas” was rendered special by his particular vocal flair. He ended the poem by changing the line: “Happy Christmas to all, and to Adam a good night!” At all times, though, I understood what was beneath the words, what breathed meaning to me. The hidden message on every cassette, invariably, was the subtext every little boy longs to hear in their father’s voice: “I love you.”

My father told me stories. He loved me and he told me stories. His story ended abruptly at high speed against a tree on a lonely California highway. I was awakened early in the morning to the phone ringing upstairs, rushed footsteps, and intense whispered conversations. I remember hearing my little brother exclaiming, “I’ll go tell Adam!” and my mother stopping him. I ascended the stairs to see her silhouetted at the top.

Later she lay in bed with me and reached over to feel my eyes in the dark. They were dry. He left me too early, both in leaving my mother and our home, creating wounds that to this day won’t seal over properly. Wounds that I tell stories to myself about, to live with and to understand them. Stories that give it meaning. For a decade any phone call in the night created panic. To my deepest shame, and the reason I thought for decades I was irreparably broken in my interior life, the first thought that passed through my head was “no more Christmas presents.”

As we went through his possessions in his apartment, I wish I had the maturity to save the important things. I did not. I saved his old comic books and magazines, taped movies off HBO on VHS. The magazines were mostly old horror-movie fan zines (which would quickly dissolve into a pulpy pile due to a humidifier accident), movies like The Money Pit and Short Circuit. The stack of comic books I was convinced were worth gold, and indeed might have been if he hadn’t drawn mustaches and beards on most of the super-heroes.

A particular early issue of Silver Surfer elicited a disappointed clucking from our local comic shop owner due to the Groucho Marx facial hair scribbled on the cover’s titular character (a store that to this day my mother refers to as “the devil store”). Mere knick-knacks, which have all disappeared with the vicissitudes of years. Though I have no memory of it, my mother reminds me of the bag of white powder I withdrew from his nightstand. Later I would learn that the story of falling asleep at the wheel is likely not true, there were assuredly other factors at play. At age 46 all I have of him is hazy memories and a cassette tape.

Could a machine tell the story of a father and son, and fill it with meaning? In acting and narrating we talk about intention. The intent of the author as well as our own in narrating. A machine cannot have intent. A machine cannot tell a story because, among many other things, it does not have a body. Stories arise from embodied wisdom. The seminal story of hubristic man’s creation of intelligence, Mary Shelley’s Frankenstein, arose from the voice of a 19-year-old woman.

The editors to a 200-year anniversary edition claim “only Mary, with her bodily experience and embodied wisdom, could have written Frankenstein with such profundity.” Her tale grew from the debates around the “vitalist controversy on the definition of life,” a debate which has remained evergreen. Today programmers are claiming their AI creations have achieved sentience; Google fired a senior developer who claimed their chatbot had gained self-awareness. They surge electricity through their cobbled-together creations consisting of parts stolen from elsewhere and cry “It’s alive!”

In a recent narration project, Matthew, the main character I voiced in a multi-cast production, loses his son in a pool-party drowning accident. While narrating, I was caught off guard by a father’s grief. I let the tears roll, narrating through them. My own father and children merged into the author’s story, and my emotional life expanded. The fears of my imagination, the possibility of losing a child due to one inattentive moment, all combined to leave me gasping for breath. My voice strained in a half-whisper. The microphone records it all.

Could my stutter-step breathing and thick-ridden voice be imitated by a machine? Perhaps, but no machine can imitate my imagination or the memories that pour through me, or feel the fear of extinguishment. No machine can be changed by what it reads. The beeping devices and clicking bellows trying vainly in the ER to resuscitate Matthew’s son fail to give him back his breath.

Stories abound in folklore of the unnatural taking the place of the natural. Of the changeling child found ensconced in the family crib. The Japanese scientist Masahiro Mori coined the term “uncanny valley” in 1970 to describe a phenomenon in the early robotic industry. When an attempted replica looks and/or acts just like a human, but not quite, the uncanny valley has been entered.

In 1769, the Austrian Wolfgang von Kempelen built and toured a device he dubbed the Mechanical Turk around Europe, delighting courts and the nobility with an automaton that could play and even defeat humans at chess. Later, of course, it was revealed that a quite human dwarf was hidden within the machine, directing the Turk’s gameplay.

Replacing a narrator’s voice with a broken shadow does not count as “tool use,” it’s in a different category, it is a commutation of something living for something dead.

While the Turk is relatively well known (even lending its name to Amazon’s program for remote grunt workers), what is less known is that part of the same technological exhibition was a mechanical larynx, composed of a series of rubber flanges and bellows. Prefiguring Mori’s uncanny valley, attendees claimed “an uneasy feeling” filled them upon hearing the strangely human-like sounds. The viewers “looked at each other in silence and consternation and we all had gooseflesh produced by horror in the first moments.”

Technically, an AI voice algorithm can create something new, but it contains no interior self to express. Author Shannon Vallor captures the difference in his book The AI Mirror:

An AI tool can create a new sea shanty or a new sculpture or a new abstract shape. But what can it express through these? To express is to have something inside oneself that needs to come out. It pushes its way out: of your mouth, your diaphragm, your gesture, your rhythmic sway. Or you pull it out— because it resists translation, resists articulation. A generative AI model has nothing it needs to say, only an instruction to add some statistical noise to bend an existing pattern in a new direction. It has no physical, emotional, or intellectual experience of the world or self to express.

As far back as Aristotle, we have known that voice and soul are intertwined: “Now voice is a kind of sound of an ensouled thing. What produces the impact must have soul in it and must be accompanied by an act of imagination, for voice is a sound with a meaning…”

Philosopher Mladen Dolar notes this phenomenon in his book A Voice and Nothing More. He describes the “acousmatic voice,” it is “simply a voice whose source one cannot see, a voice whose origin cannot be identified, a voice one cannot place. It is a voice in search of an origin, in search of a body…we can immediately see that the voice without a body is inherently uncanny. The voice is the flesh of the soul, its ineradicable materiality, by which the soul can never be rid of the body.”

In early 2024, three months after Audible inflicted its Virtual Voice upon the public, I came down with a case of COVID-19 while attending a funeral, and while the infection cleared up quickly, it was followed by a strange tightening of my throat. Over the course of a single day, what little sound I could produce came out in a strained, thready whisper, and only a certain octave in my mid-tone was affected.

I realized quickly this was not the usual cold-induced loss of voice. There was no congestion or froggy-sounding croaks. My throat simply felt as if it were being squeezed by an invisible hand. I set up an appointment with an ENT who specializes in working with actors and singers. After I snorted lidocaine to numb the passages, he slid the scope down my throat. I could see the tunnel-vision video of my pink throat on the monitor beside us. I inhaled deeply and he brought the scope right down to my vocal folds. He had me sustain a few vowel notes and hum some tones, which seemed to confirm his suspicions. “There’s some nerve damage to your vocal folds, particularly one side.” He showed me the video of my folds vibrating together, one side clearly not in sync.

Immediately the familiar freelancer fear rose up. While some books have flexible schedules and could be pushed back, there’s a certain minimum I need to record every month to pay our bills. What if the publishers stopped hiring me?

In the new AI economy, however, there is no reason for a cloned narrator to ever turn down work.

The pain and discomfort I was feeling was all the little muscles in my throat trying to “force” the sound, to overcompensate for the lack of proper vocal cord functioning. The prognosis was vague. It was nerve related, there could be complete recovery in a month, or it could take years. The ENT prescribed speech therapy, which entailed a regimen of humming through straws of various diameters and massaging my throat with a small hand-held vibrator (which, to judge from the Amazon reviews, was not its intended purpose).

Frustratingly slowly, this helped. But even now, two years later, pain remains. Long days of narration pummel my throat, a throb that never really goes away. I narrate with a limp.

Unlike my frail biological instrument, an AI voice will never experience sickness. It will never suffer paralysis of its vocal cords; it will never falter in its energy or ability to narrate a text 24/7. It will also never learn from suffering. It will never experience thrill or sadness. It will keep driving its steam hammer incessantly, though humans die in droves by its side. Voice teacher J. Clifford Turner writes:

The narrator is the link between the author and the listener. His or her voice is the means by which the author’s work is bodied forth, and is the main channel along which thought and feeling are to flow. His voice, in fact, is an instrument, a highly specialized instrument, which is activated and played upon by the narrator’s intelligence and feeling, both of which have been stimulated by the imaginative power he is able to bring to bear upon the author’s creation. The narrator not only has a most rigorous standard of integrity to which he must adhere, but bears a definite responsibility to the author…and his listeners.

Only a human is capable of feeling responsible for the creative work of another. Audible’s launch of Virtual Voices was followed within a year by a beta-testing of their voice cloning program. Narrators on their ACX platform could opt in to have their voices cloned, then authors or rights-holders could choose to use that narrator’s clone instead of the human themselves, whether for price reasons or scheduling. At the time of this writing, the compensation model is unclear. There is brutal business logic to this, from both sides.

If a certain narrator is much in demand within a certain genre and the author knows their name alone sells books, this narrator’s voice clone could be correspondingly popular. Most full-time narrators can record around fifty books a year. A voice clone would allow Mr. or Mrs. Well-Known Narrator to produce hundreds of titles a year. Indeed, there is no theoretical limit to the number of titles they could produce. The author gets famous narrator’s name on her books, and the narrator gets a steady stream of residual income.

As a full-time narrator, I’m usually scheduled out two to four months, so I often turn down books and auditions that are short turnaround. Those projects go to other actors, and I have often benefited from projects others have had to pass on. In the new AI economy, however, there is no reason for a cloned narrator to ever turn down work. Because capitalism will capitalize, the well-known and celebrity will slowly gain the lion’s share of work, squeezing the middle-class narrators out of income and jobs.

Not only will I be competing against other human narrators and their clones, but even the voices of the dead. It only takes 30 minutes of clear audio to clone a voice, and narrators who have passed on have thousands of hours of pure audio for AI to mine. In a famous case, actor Edward Herman (known from his recurring role in Gilmore Girls and a prolific narrator of titles such as Boys in the Boat and Unbroken) died in 2014. His estate cloned his voice and is now producing titles with a zombie narrator. As older narrators pass on, the undead host will only grow.

Rarely have humans walked away from automation. In his magisterial history of the Luddite movement Blood in the Machine, Brian Merchant writes of the inevitability of automation:

The logic of unfettered capitalism ensures that any labor-saving, cost-reducing, or control-enabling device will eventually be put to use, regardless of the composition of the societies those technologies will disrupt. Consider it the iron law of profit-seeking automation: once an alluring way to eliminate costs with a machine or program emerges, it will be deployed.

Merchant writes primarily of the automation of manual labor, the power looms and shuttles of 19th-century England. The same profit-making logic will appeal to publishers, however: synthetic voices will lower costs, enable control, and save labor. The difference between manual labor and artistic creation is hard for the capitalist machine to quantify, however. Ironically, I had to turn down the audition to narrate Blood in the Machine for Hachette Audio due to my human limitations of time and scheduling.

In my community of narrators, there are two oft-repeated arguments: we merely need to adapt to this new technology, and AI is “just a tool.” The first states that performers have always needed to adapt to technological change. Proponents often trot out comparisons to past transitions, talking of Vaudevillian actors transitioning to the new-fangled silent cinema, silent film actors making the difficult move into talkies, all the way up to animation voiceover and motion capture.

After listening to synthetic voices and witnessing my fellow narrators pouring their whole soul into their art, I know we will never be fully replaced.

What this fails to take into account is that we are not experiencing a new medium, we are witnessing a sea change, not refinement but a replacement. There is no new technologically driven type of film or style of performance that actors merely need to adapt their creativity to, learning new styles and modes of performing, while underneath the primal acting core of vulnerability and expression, remain the same. Voice cloning may be a business adaptation, but not a performance adaptation.

Along with “adapting,” the other major trope often tossed about is that AI is “just a tool,” and it is merely one of many in the actor’s toolbox. To which I answer, indeed, I could imagine a narrator using AI tools to smooth out their invoicing process, AI add-ons for their recording software that help improve noise reduction, advanced algorithms to analyze text and create character summaries, but replacing a narrator’s voice with a broken shadow does not count as “tool use,” it’s in a different category, it is a commutation of something living for something dead.

Audible themselves offers mixed messaging. In 2023 I received a very nice holiday present from Audible Studios, the human-narrated arm of Audible, of an Apple Beats set of headphones and personalized journal, in a very creatively designed (by a local visual artist) box. The text of their note read: “We celebrate the power of inspiring storytelling. We want to thank you for being an integral part of making stories come to life.” My voice is not so “integral” after all: later that same year they introduced Virtual Voices, a move specifically designed to lessen the impact of human storytelling. Two tentacles of the same creature, working at cross-purposes.

Two years removed from my vocal fold paralysis, I can no longer narrate with quite the same stamina. A long day in front of the microphone now results in sore throat muscles, feeling like I have been kicked in the larynx by a very small gnome. My speech therapy exercises continue, now part of my daily ritual. Yet I am grateful: grateful for the stories I get to tell, for they remind me each time anew, that I am human, that I suffer, and I will not live forever.

I have been reading The Fellowship of the Ring aloud to my three children at night, a foundational book from my own childhood. They are sensitive children, and quail at the description of the Ring Wraiths in Tolkien’s text. My youngest son is the most dismayed so far at Sam having to leave his faithful pony companion, Bill, behind at the gates of Moria. He peeks at me from between the slats of his upper bunk bed, eyes wide, as he is enveloped fully into the story. I am here, alive, present in this moment and no other, connected with my family through a story. That is enough.

After listening to synthetic voices and witnessing my fellow narrators pouring their whole soul into their art, I know we will never be fully replaced. I listen again to the voice of my father on my one precious cassette tape, now digitized into an MP3 for posterity.

Eleven Labs offers their voice cloning service for a very reasonable price. I could send them this MP3 file and experience my father talking to me from beyond the pale. But the idea of cloning my father’s voice is repulsive to me. I remember his laugh, his smile, he and my stepmother holding hands in the darkness across the center console of the car while softly singing “Let me tell you ‘bout the birds and bees and the flowers and the trees, the moon up above, and a thing called love,” while I lay across the back seat pretending to sleep.

These memories and his voice are filtered through anguish and time, and I barely recognize the voice on the tape. Here’s the thing: it’s not about the voice. Most newcomers to the audiobook industry tell me “Everyone has always said I have a great voice, so I should try narrating.” I tell them that it has nothing to do with their voice, it’s about the ability to tell a story. Necromancing my father’s voice back would amount to emotional masturbation. There would be no subtext other than my own, it would be empty. What I long to hear (and never will) in his voice are simply the words “I am proud of you.”

For my life, my career, my story, there is only one way to fight back. We should not imitate the Luddites and take up our hammers to smash the machines that replace us. (The new power looms live in the cloud, what is there to smash?) No, like John Henry the hammer is in our hands, our hammer is the oldest way, and the best way. It is what defines us as a species. It is telling our stories. AI voices are unnatural children, worth only leaving mewling on a mountainside to die of exposure.

Mary Shelley had the right of it over 200 years ago. In every culture in all places and times we have known to shun the unnatural seeming-human, the European folk-lore changeling in the cradle, the uncanny eyes looking back at us, the Japanese Kitsune, the Polish Mamuna, the Welsh plentyn cael and the Igbo ogbanje, the golems whispering inhumanity. Mary Shelley had a term for them, and it’s just as applicable today. We call them monsters.

What will we lose if we outsource our humanity? Bit by bit, story by story, it impoverishes us. Wendell Berry said, “the machine economy has set afire the household of the human soul.” We have been so afraid of being turned into paperclips, plugged into pods as human batteries, or having our skulls crushed under titanium feet, that we’ve missed the more insidious danger of farming out our art and creativity to algorithms. If we follow this course, what will be left for our AI overlords to conquer? There would be no uprising, no plucky resistance, instead, tragically, we will have given it away, one story at a time.

Why AI Narrators Will Never Be Able to Tell a Real Human Story

Adam Verner Explores the Uncanny Valley of Automated Audiobooks

Adam Verner

Namwali Serpell on Approaching Toni Morrison’s Work As a Reader and a Critic

Bombing in the Breadline: A Day in the Life of the Average Gazan

Letter From Minnesota: Going From the Nightmare to the Poem

Sheila Heti on Torborg Nedreaas’s Nothing Grows by Moonlight

Literary Hub