A USC study finds that (some people think) AI is as funny as the average person.
A new study out of USC compared comedy writing by humans to comedy writing generated by ChatGPT, and found that “ChatGPT can produce written humor at a quality that exceeds laypeople’s abilities and equals some professional comedy writers.” But their experiments didn’t fully convince me that your next favorite joke will be generated by a program.
The study involved two experiments. The first asked a group of adults to generate punchlines for three different prompts: writing funny phrases to fit a provided acronym; finishing Apples-To-Apples-style phrases like, “A remarkable achievement you probably wouldn’t list on your resume: ________”; and coming up with “roast” punchlines like, “To be honest, listening to you sing was like ________.” ChatGPT completed the same tasks, and then a separate group of people rated the human and computer results. The evaluating group thought the computer did a little better: “ChatGPT’s responses were rated funnier than the human responses, with 69.5% of participants preferring them (26.5% preferred the human responses, and 4.0% thought both were equally funny).”
I don’t doubt that some people found things that a computer generated to be funny, but the experiment seems far too constrained to demonstrate much: using a game of MadLibs with pre-existing formats is hardly a demonstration of creativity.
More importantly, I think there’s a more compelling way of reading these study results: the average person off the street isn’t funnier than predictive text. This conclusion tracks for me—I don’t think it’s a particularly hot take to say that most people aren’t funny. Writing comedy is very, very hard and not the kind of thing an average person is going to be able to do well or consistently. Even the most seasoned comedy writer will tell you that joke-writing is a volume game: most of what a funny person writes will not be funny.
The second experiment by the USC scholars involved using 50 Onion headlines to generate 20 new Onion headlines, and then asking a group to rate all 70:
Those who self-reported seeking out comedy more and reading more satirical news rated the headlines as funnier, independent of whether they were AI-generated or produced by professional writers. Based on mean ratings, 48.8% preferred The Onion’s headlines, 36.9% preferred the headlines generated by ChatGPT, and 14.3% showed no preference.
Again: sure, maybe some people couldn’t tell the difference, but that’s hardly a demonstration of true comedic creativity. Being able to mimic The Onion’s distinctive voice and style, which a lot of writers worked hard to hone and perfect, doesn’t mean the model is writing jokes. It’s just good at copying off of the A-student sitting next to them. It also doesn’t mean that ChatGPT is “writing comedy”—A dog wearing sunglasses is funny, but that doesn’t mean the dog is doing a bit.
One of the things missing from this academic conception of AI’s comedy writing ability is that it’s discounting the importance of context to successful joke-telling. By starting with funny set-ups and comedy written by professionals, the framing of these experiments is doing a lot of the heavy lifting. Context is so important for a joke to land at all, which is why comedy ages so poorly—jokes work best when they’re responding to a specific time, place, and culture. Take the famous, oldest bar joke in the world from ancient Sumer: “A dog walks into a bar and says, ‘I cannot see a thing. I’ll open this one.’” Without any ancient Sumerians to explain the punchline to us, this joke is meaningless. A language model can’t take the pulse of the culture or read a room, so it won’t be able to generate comedy without a lot of help from a person to give its writing context.
What’s also missing here is the importance of taste. The study dismisses a lot of these concerns: “Our studies suggest that the subjective experience of humor may not be required for the production of good humor—merely knowing the patterns that make up comedy may suffice.” I don’t buy this. The computer can’t know what patterns are funny and what patterns aren’t without someone with enough taste to guide them. When a funny person does an impression in front of an audience or sits down to write a Reductress headline, they’re not regurgitating a pattern; they’re telling you, “this is funny to me.” A joke is an expression of taste, which is something that a language model can’t do without the help of an editor with a sense of humor. Someone needs to put the sunglasses on the dog.
So no, I don’t think that language models are on the verge of replacing all comedians and late night hosts. Overall, I think the hype around generative tech’s ability to make anything of value is overblown. Even the market seems to be in agreement: a recent analysis by Goldman Sachs found that AI isn’t making much money and is being used by a mere 5% of companies. It’s a problem for an industry that is spending an immense amount of money to create and operate these programs and decimating the environment in the process.
Even the AI fanatics seem to have lost the thread. This Reddit poster frets, “And now I have generated well over 200,000 images and I have no clue what I’m supposed to do with them? There has to be a use for that many images except I wouldn’t know what is.” Without an opinion or taste, who’s to know what’s good and what’s bad?
It shouldn’t be surprising that the corporate hype of AI isn’t panning out: remember when we were all going to be 24/7 living, laughing, and loving in the metaverse? Tech hype and PR have a remarkable ability to capture our imagination—because we, as humans, are good at telling stories—but it’s not often backed up by much more than hot air.
Overall, people promising that they can crunch the numbers and tell you what is funny aren’t to be trusted. It reminds me of a phrase from Connor O’Malley’s recent, very funny special, “Stand-Up Solutions”, where he plays an unhinged salesman for an AI standup comedian. O’Malley’s character claims that his AI-comedian can deliver “100% accurate comedy,” an absurd and nonsensical promise.
There is one conclusion that I agree with in this study, though I think they’re getting there in the wrong way:
“That ChatGPT can produce written humor at a quality that exceeds laypeople’s abilities and equals some professional comedy writers has important implications for comedy fans and workers in the entertainment industry,” they said. “For professional comedy writers, our results suggest that LLMs [large language models like ChatGPT] can pose a serious employment threat.”
I agree that writers should be worried, but not because of quality. If audiences enjoy jokes that a language model churns out, that’s fine—people are allowed to have whatever taste they want, even bad taste.
What worries me are the executives and purse-string-holders who are reading these studies that audiences can’t distinguish between AI and human text, and concluding that they no longer need to pay writers. I’ll keep beating the drum: LLMs are a problem because they’re a labor issue. If generative text is seen as a viable alternative by boardroom denizens, we’re going to see a lot more writers out of work and underpaid. We’re going to see a lot more writers who aren’t paid to think and to tell us what they think is funny, but to instead pick the best jokes that a computer generated, or punch up a first draft that ChatGPT extruded, or copy-edit a novel that a LLM squeezed out. These are jobs, sure, but they will be less abundant and less well-paid.
Without true writing jobs, how will people develop a sense of humor? How can people afford to try things out, to experiment? How will we get anything new, when everything is just a sausage made from pre-existing material? We’ll be left with a smaller, less equal, more top-heavy entertainment industry. And do we really want C-suites to have more creative input on our art?