Two days after the 2016 election, the New York Times published a story titled “How Data Failed Us in Calling an Election.” In it, technology journalists Steve Lohr and Natasha Singer chastised election forecasters for getting the contest “wrong.” They decried the media’s growing reliance on data to handicap the horse race. “Data science is a technology advance with trade-offs,” they wrote. “It can see things as never before, but also can be a blunt instrument, missing context and nuance.” Lohr and Singer charged election forecasters and handicappers with taking their eyes off the ball, focusing on the data without considering other sources for prognostication.
Election forecasters, in turn, blamed the pollsters. But it was not only the data that failed us in 2016, if indeed you want to call the performance a “failure.” It was faulty interpretation of that data—by the media, by some forecasters, by academics and intellectuals, and yes, by the public. Still, polls did miss the outcome in 2016, and the pieces to that puzzle have only recently fallen into place: after pollsters did even worse in 2020.
I called Charles Franklin as the sun began to set on a hot summer day in 2020. He answered from a book-filled rom in his home in the suburbs of Madison, Wisconsin—a state that had been key to Donald Trump’s 2016 victory. We went through the blow-by-blow of election night, which he spent at ABC headquarters on the “decision desk”—a group of nerds who analyze vote returns and call states as results become clearer—and discussed the various sources of error for his final poll of Wisconsin, which had Hillary Clinton beating Donald Trump by six percentage points, 46% to 40%. (In the end, Trump won by less than a point, 47.2% to Clinton’s 46.5%.)
Franklin believes that, more than anything, his error in Wisconsin was due to a late-breaking shift toward Donald Trump among key undecided voters. Franklin’s poll had stopped collecting data a week before the election, giving ample time for fence-sitters to make up their minds in favor of Trump. If Franklin’s last poll was right, then there was a seven-point swing to Trump in the last week of the campaign. He reckons that if he had conducted another poll a day or two before the election, he would not have missed this pivot to Trump.
Franklin says that twenty percent of Wisconsinites had unfavorable views of both Trump and Clinton, and 75% of those ended up voting for Trump. According to exit polls, voters who decided who to vote for in the final week of the campaign picked Trump by a 29-point margin. The late-breaking shift was strongest in the suburbs. The Marquette poll underestimated Trump’s margin by 14 points in the Milwaukee suburbs and 7 points outside Green Bay. According to Franklin, “A lot of those suburban Republican voters had expressed reservations about Trump. But they also had very deep antipathy toward Hillary Clinton.” In the end, their feelings about Clinton were simply too strong to be overcome by their reservations about Trump.
To Franklin, the predictive errors in 2016 amount to “a failure of design, not of methodology.” After Trump’s victory, the Marquette poll resumed its typical respectable record. It predicted that the Democratic candidate for governor in 2018, Tony Evers, would win the election by a one-point margin. He won by 1.1. The same year, Franklin polled the Wisconsin Senate race at a 10-point margin for Democratic incumbent Tammy Baldwin—and she won by 11.
Franklin reckons he would have projected a closer race in Wisconsin if he had stayed in the field through Election Day, but he had reasons for releasing data before that. He says he believes surveys should be snapshots in time: tools for measuring public opinion, good for more than predicting the outcome of the horse race. He is interested in putting “information out into the world” while people can still use it. A poll of how many voters support farm subsidies is useless to politicians if it is released on Election Day; if they have it a week easier, they might be motivated to support the popular position on the issue. For Franklin, the potential downside of missing a close contest that hinges on late deciders is less important in the grand scheme of things.
Charles Franklin is not the only pollster who faced a reckoning for his performance in 2016. In May 2017, the American Association for Public Opinion Research (AAPOR), the professional organization for pollsters, met in New Orleans to discuss the causes of, and fallout from, Trump’s surprise election seven months earlier. There were no fewer than six different panels or presentations on the accuracy of pre-election polls. Most of these presented similar theories for the 2016 misfire.
To the extent that there was a systemic failure of pre-election polling in 2016, it was that some polls didn’t survey enough white voters without college degrees, a group that tends to vote for Republicans. Educational attainment was a major predictor of presidential vote preference in the 2016 election, but many pollsters did not weight for it. An analysis by the New York Times found that nearly half of the respondents in a typical national poll had at least a bachelor’s degree. But the percentage of college graduates among the actual population is only 28%. This presents a problem for pollsters who don’t take the education of the electorate into account: because college-educated Americans are more likely to vote for Democrats, their unadjusted polls will overestimate support for Democrats.
Some pollsters suffered badly from making this mistake in 2016. According to AAPOR’s post-election report, the final University of New Hampshire poll had Clinton leading in the Granite State by 11 points. She ultimately won by a razor-thin 0.4-point margin. Andrew Smith, the director of the UNH poll, reported that they had adjusted their data for age, gender, and religion, but not education. The combination had worked just fine in previous election cycles. He wrote:
We have not weighted by level of education in our election polling in the past and we have consistently been the most accurate poll in NH (it hasn’t made any difference and I prefer to use as few weights as possible), but we think it was a major factor this year. When we include a weight for level of education, our predictions match the final number.
Other pollsters avoided making so egregious an error. Charles Franklin has weighted by education for his entire career. “it never dawned on me that polls would not be weighting by education,” he told me, “not only for vote consequences, but because one of the strongest and longest-lasting trends in polling is that more educated people are more likely to respond to polls. I just took it for granted that you would weight for education.”
Still, plenty of other pollsters missed the importance of education weighting. The New York Times reported shortly after the AAPOR conference that under a third of polls in battleground states included the variable in their adjustment protocols. In the Midwest, the percentages were even smaller; only 18% and 27% of surveys in Michigan and Wisconsin, respectively, had the right educational composition. Nate Cohn estimated that weighting by education would have improved the performance of polls by four percentage points. AAPOR found slightly larger effects in Michigan, where a final poll from Michigan State University would have decreased Clinton’s vote margin by seven points if it had been weighted to the share of educated voters in the population.
By 2020, most political analysts thought that the pollsters had fixed the problems from the last go-around. About half of pollsters who didn’t weight by education were weighting by it this time, and everyone knew (or should have) that large errors were possible.Instead of being hopelessly broken, pre-election polls face severe and prolonged threats from partisan nonresponse.
But things didn’t go much better for the pollsters in 2020. The outlier ABC/Washington Post poll that missed Joe Biden’s margin by 17 percentage points in Wisconsin was called out as one of the biggest misses in the history of polling. It nearly gave the Literary Digest a run for its money. But the aggregates in Wisconsin were also off: Nate Silver’s model had Biden up by eight points, when he only won by 0.6.
So while cherry-picking by focusing on outliers like the ABC/Washington Post poll is not fair to the pollsters, they are in some serious trouble. Worse, the polls in 2020 were off in all the same places they were in 2016, often by larger amounts. Whatever pulse the polls were taking in Ohio and Iowa, where forecasts were off by eight percentage points on average, was not the public’s. Polls in Florida, often viewed as the ultimate swing state, were nearly five points off. Overall, these errors were 1.5-2 times as large as should be expected based on the historical margin of error of a state-level polling aggregate.
Writing for the Atlantic the day after the election, David Graham characterized the results as “a disaster for the polling industry and for media outlets and analysts that package and interpret the polls for public consumption, such as FiveThirtyEight, the New York Times’s Upshot, and The Economist’s election unit. They now face serious existential questions.”
Graham’s critique missed the mark in two important ways. First, it measured polling error the day after the election, when true results were not yet known; the polls began to look much better during the week after the election, as more Democratic-leaning ballots from big cities were counted. And by repeating the conventional wisdom about how polls “failed” to predict the 2016 and 2020 elections, he indulged in hyperbole. Pre-election polling was not perfect—polls rarely, if ever, are—but it was not catastrophically bad either.
Instead of being hopelessly broken, pre-election polls face severe and prolonged threats from partisan nonresponse. They are not reaching enough Republican voters. That’s why polls underestimated Donald Trump in Wisconsin, and why the GOP candidate in Maine, Susan Collins, beat her polls in 2020 too. The precise reason for their lapse in numbers is harder to determine.
One theory, a holdover from the 2016 election, is that the voters who tend to be less trusting of their neighbors and government institutions are both less likely to answer polls and more likely to support Republicans for office, in ways that elude the variables that both traditional and twenty-first century pollsters can account for. A low-trust voter might be a 21-year-old college educated woman from Iowa now living in Philadelphia, or a 65-year-old Republican man in the Deep South. Low-trust voters tend to hide within aggregate statistics, biasing polls toward higher-trust people among all demographic groups. A poll that is not weighted to so-called social trust will miss this source of potential error.
Donald Trump’s supporters may also have been more likely than their demographics predicted to refuse pollsters’ calls because the president constantly railed against the pollsters, calling them corrupt and often alleging they were out to get him. This has never been a central theme in a presidential campaign before. After the election, Doug Rivers found that YouGov’s polls often had too many rural voters in them, even after weighting by geography, and that people who voted for Mr. Trump in 2016 but disapproved of him in 2020 were more likely than his approvers to answer surveys.
Pollsters also have the unenviable task of figuring out who is actually likely to turn up to vote on Election Day, and there is no agreed-upon method for predicting who is a “likely voter.” Naïvely, we might think that pollsters could simply ask a person if they are likely to cast a ballot. But people lie, systematically saying they’re more likely to vote than not, which throws the data out of whack.
Pollsters have to adjust for this, too, or their predictions will be off. They have toiled at the guesswork involved in anticipating Americans’ real voting behavior since nearly the birth of modern survey research. Every pollster has a different method for predicting who will end up voting, and none offer definitive accuracy. (There’s the Gallup method, the Perry-Gallup method, probabilistic vote-scoring, predictions from machine-learning models, the matching of vote-histories, and so on.) So ambiguous and uncertain are likely voter models that Burns “Bud” Roper, son of the public polling pioneer, said in 1984 that “one of the trickiest parts of an election poll is to determine who is likely to vote and who is not. I can assure you that this determination is largely art.”
The compound effects of weighting and filtering out nonvoters were so notable in 2016 that Nate Cohn arranged to have the Times proprietary data for a poll in Florida handed out to four different polling firms and groups of researchers to produce estimates using whichever method they preferred. Each pollster deployed defensible likely voter filters and performed standard weighting procedures to ensure samples were representative of the voting population by a mixture of age, sex, education, party registration, gender, and race.
In the end, the four polls—all based on exactly the same interviews—gave four different assessments of the race. One had Clinton up by four points, another by three, a third by one, and the fourth had Trump winning by a point. But which made the “right” methodological choices? There is no way of knowing for sure. The divergence in their results is not necessarily proof that some methods are better than others, but rather a testament to the true potential for error in the polls. (The poll that had Trump winning Florida by a point was also not necessarily right, as it was taken in late September and the race could have changed between then and November.)
In the end, the lack of an accurate “standard likely voter” model may have led polls to overestimate support for Clinton. According to Cohn’s polls in Florida, Pennsylvania, and North Carolina, Hillary Clinton’s supporters were likelier than Donald Trump’s to stay home after indicating their intention to vote. People who said they would vote favored the Democratic candidate by one point across the three states, but actual voters went for Trump by two.
However, according to Nate Cohn, his operation at the Times managed rather good prediction of turnout for each party in 2020. Missing the “likely voter” explained only about one percentage point of his error in vote share. That wouldn’t explain their nearly five-point error in Wisconsin or two-point missing Michigan.
Excerpted from Strength in Numbers: How Polls Work and Why We Need Them by G. Elliott Morris. Copyright © 2022. Available from W. W. Norton & Company, Inc.