Why the Polls Were Wrong

It’s not good news. From Vox:

What the hell happened with the polls this year?

Yes, the polls correctly predicted that Joe Biden would win the presidency. But they got all kinds of details, and a number of Senate races, badly wrong . . .

To try to make sense of the massive failure of polling this year, I reached out to the smartest polling guy I know: David Shor, an independent data analyst who’s a veteran of the Obama presidential campaigns who formerly operated a massive web-based survey at Civis Analytics before leaving earlier this year. . . . Shor’s been trying to sell me, and basically anyone else who’ll listen, on a particular theory of what went wrong in polling that year, and what he thinks went wrong with polling in 2018 and 2020, too.

The theory is that the kind of people who answer polls are systematically different from the kind of people who refuse to answer polls — and that this has recently begun biasing the polls in a systematic way.

This challenges a core premise of polling, which is that you can use the responses of poll takers to infer the views of the population at large — and that if there are differences between poll takers and non-poll takers, they can be statistically “controlled” for by weighting according to race, education, gender, and so forth. . . . If these two groups do differ systematically, that means the results are biased.

The assumption that poll respondents and non-respondents are basically similar, once properly weighted, used to be roughly right — and then, starting in 2016, it became very, very wrong [note: of course, 2016 was when Txxxx began poisoning the political process]. People who don’t answer polls, Shor argues, tend to have low levels of trust in other people more generally. These low-trust folks used to vote similarly to everyone else. But as of 2016, they don’t: they tend to vote for Republicans.

Now, in 2020, Shor argues that the differences between poll respondents and non-respondents have gotten larger still. In part due to Covid-19 stir-craziness, Democrats, and particularly highly civically engaged Democrats who donate to and volunteer for campaigns, have become likelier to answer polls. It’s something to do when we’re all bored, and it feels civically useful. This biased the polls, Shor argues, in deep ways that even the best polls (including his own) struggled to account for.

Liberal Democrats answered more polls, so the polls overrepresented liberal Democrats and their views (even after weighting), and thus the polls gave Biden and Senate Democrats inflated odds of winning. . . .

Dylan Matthews

So, David: What the hell happened with the polls this year?

David Shor

So the basic story is that, particularly after Covid-19, Democrats got extremely excited and had very high rates of engagement. They were donating at higher rates, etc., and this translated to them also taking surveys, because they were locked at home and didn’t have anything else to do. There’s some pretty clear evidence that that’s nearly all of it: It was partisan non-response. Democrats just started taking a bunch of surveys [when they were called by pollsters, while Republicans did not].. . .

Dylan Matthews

You mentioned social trust. Walk me through your basic theory about how people who agree to take surveys have higher levels of social trust, and how that has biased the polls in recent years.

David Shor

For three cycles in a row, there’s been this consistent pattern of pollsters overestimating Democratic support in some states and underestimating support in other states. This has been pretty consistent. It happened in 2018. It happened in 2020. And the reason that’s happening is because the way that [pollsters] are doing polling right now just doesn’t work. . . .

Fundamentally, every “high-quality public pollster” does random digit dialing. They call a bunch of random numbers, roughly 1 percent of people pick up the phone, and then they ask stuff like education, and age, and race, and gender, sometimes household size. And then they weight it up to the census, because the census says how many adults do all of those things. That works if people who answer surveys are the same as people who don’t, once you control for age and race and gender and all this other stuff.

But it turns out that people who answer surveys are really weird. They’re considerably more politically engaged than normal. . . . [They] have much higher agreeableness [a measure of how cooperative and warm people are], which makes sense, if you think about literally what’s happening.

They also have higher levels of social trust. . . . It’s a pretty massive gap. [Sociologist] Robert Putnam actually did some research on this, but people who don’t trust people and don’t trust institutions are way less likely to answer phone surveys. Unsurprising! This has always been true. It just used to not matter.

It used to be that once you control for age and race and gender and education, that people who trusted their neighbors basically voted the same as people who didn’t trust their neighbors. But then, starting in 2016, suddenly that shifted. . . . These low-trust people still vote, even if they’re not answering these phone surveys.

Dylan Matthews

So that’s 2016. Same story in 2018 and 2020?

David Shor

The same biases happened again in 2018, which people didn’t notice because Democrats won anyway. What’s different about this cycle is that in 2016 and 2018, the national polls were basically right. This time, we’ll see when all the ballots get counted, but the national polls were pretty wrong. If you look at why, I think the answer is related, which is that people who answer phone surveys are considerably more politically engaged than the overall population. . . .

Normally that doesn’t matter, because political engagement is actually not super correlated with partisanship. That is normally true, and if it wasn’t, polling would totally break. In 2020, they broke. There were very, very high levels of political engagement by liberals during Covid. You can see in the data it really happened around March. Democrats’ public Senate polling started surging in March. Liberals were cooped up, because of Covid, and so they started answering surveys more and being more engaged.

This gets to something that’s really scary about polling, which is that polling is fundamentally built on this assumption that people who answer surveys are the same as people who don’t, once you condition on enough things. . . . But these things that we’re trying to measure are constantly changing. And so you can have a method that worked in past cycles suddenly break. . . .

There used to be a world where polling involved calling people, applying classical statistical adjustments, and putting most of the emphasis on interpretation. Now you need voter files and proprietary first-party data and teams of machine learning engineers. It’s become a much harder problem.

Dylan Matthews

. . . Pollsters need to get way more sophisticated in their quantitative methods to overcome the biases that wrecked the polls this year. Am I understanding that right?

David Shor

. . . A lot of people think that the reason why polls were wrong was because of “shy Txxxx voters.” You talk to someone, they say they’re undecided, or they say they’re gonna vote for Biden, but it wasn’t real. Then, maybe if you had a focus group, they’d say, “I’m voting for Biden, but I don’t know.” And then your ethnographer could read the uncertainty and decide, “Okay, this isn’t really a firm Biden voter.” That kind of thing is very trendy as an explanation.

But it’s not why the polls were wrong. It just isn’t. People tell the truth when you ask them who they’re voting for. They really do, on average. The reason why the polls are wrong is because the people who were answering these surveys were the wrong people. If you do your ethnographic research, if you try to recruit these focus groups, you’re going to have the same biases. They recruit focus groups by calling people! Survey takers are weird. People in focus groups are even weirder. Qualitative research doesn’t solve the problem of one group of people being really, really excited to share their opinions, while another group isn’t. As long as that bias exists, it’ll percolate down to whatever you do.


I think what this means is that even if you correct for the low number of Republicans who answer polls, you’re still in trouble, because you’re still polling the kind of Republicans who answer polls (the relatively nice ones). Your sample of Republican voters doesn’t represent Republicans who vote.

This does not bode well for the two Senate races in Georgia. Polls show the Democrats are a few points behind. When the run-off elections occur in January, the polls will probably be wrong again, unless the pollsters have quickly figured out how to solve this problem.