In my past few blog posts (excluding the last one), I've primarily discussed the political race in the US primaries. Recently, we also had a referendum in the UK about leaving the EU (so-called Brexit) in which the UK chose to leave in a ~52%-48% lead. In this post, the results of Brexit or the US primaries are rather irrelevant, except in the more general aspects of prediction.
Misuse of Brexit Polls:
First off, there's been a lot of talk about how disastrous polling was in the UK from Brexit and there's also been discussion about polling failures more generally. So I'll begin by talking about the poll results coming into Brexit. Some are using arguments that say polling was completely useless because there were polls showing Remain over Leave by a large margin the day before the polls. Of course, when we use polls we've gotta remember that any given poll comes with a margin of error, can be poorly done by a polling firm, the poll can be an outlier, and a whole host of other factors. In other words, there's a real problem taking one particular day or one particular poll too seriously.
In other words, properly using polling means that the most important factor to take into account are polling averages across a wide array of polls from different methodologies. If the polls are relatively accurate and in line with one another, then that tells us the polls have a better chance of being accurate. If the polls are all over the place (like it was with many Brexit polls), then you can basically toss them out for the most part.
In the specific case of Brexit, not only were many of the polls all over the place but the Huffington Post polling average basically had Leave and Remain neck and neck for weeks with a very small margin. In other words, a proper diagnosis of the polls--assuming they were representative of the population--should've told us that the chance of a Leave vs a Remain is basically close to 50/50. Sure enough, that's what we got. We also know that there were certain demographic components like the young favoring Remain by a large margin while older voters were favoring Leave alongside more educated voters favoring Remain while those less-educated favoring Leave. So what does that tell us? Well, we know that older voters have higher turnout and younger voters have lower turnout. So that fact alone would tell us that it may actually make sense to favor Leave because older voters will likely have higher turnout. However, the educational factor tells us the same thing about favoring Remain. In essence, we should've expected a very close vote.
So anyone saying that this Leave vote was ridiculous or absurd or totally unpredictable where the polls were completely wrong is really just not sound thinking. Quite frankly, that's just about as terrible as a use of polling as anyone could have. So why did the betting markets have Remain at such high odds? I don't know, but if I had been given those kinds of odds before the election, I'd have taken them (I bought a straddle on a UK equity ETF expecting much higher volatility off things like this, on which I was correct).
US Polling More Reliable Than Foreign Countries:
Now, I'll discuss why American polling and American pollsters are simply more effective and better at accurate polling than other places. Much of this has to do with the American political structure which is a decentralized federal structure. We have 50 states that hold Senate and House elections every two years and Presidential elections every 4 years. We also have state polling, local polling, federal polling, and polling for both party primary elections and the general elections. Parliamentary systems, on the other hand, only have one election every few years with a referendum that's even more rare.
Now, I'll discuss why American polling and American pollsters are simply more effective and better at accurate polling than other places. Much of this has to do with the American political structure which is a decentralized federal structure. We have 50 states that hold Senate and House elections every two years and Presidential elections every 4 years. We also have state polling, local polling, federal polling, and polling for both party primary elections and the general elections. Parliamentary systems, on the other hand, only have one election every few years with a referendum that's even more rare.
In other words, the US political system allows for pollsters to do much more tinkering to adjust polls and find ways to limit error. If we take the UK as a simple example, they haven't had a referendum like this in a long time. They also had an election in 2015 with the previous elections in 2010, 2005, and 2000. In other words, they've got one election every 5 years where pollsters have to figure out--as a country--how they're gonna account for these shifts. In order to correct their polls, if you were to go back 4 elections that'd compromise an entire generation AND you only have 3-4 pieces of actual data on which to compare the polls with. Compared with the errors created by a new generation of voters, those kinds of adjustments done by data 5, 10, 15, and 20 years ago are essentially useless because of shifts in demographics.
However, in the US, we've got 50 different states each with their own polling, their own demographics, and they hold 2-3 sets of elections every 2 years. We also have a much larger and more diverse country which means we can use demographic factors like age, race, gender, income, etc much more effectively to weight our polls in order to reduce our errors. One major advantage in terms of a general election for President of the United States is that we know where most states are gonna line up going into the November of a Presidential year. Since we have an electoral college format in our elections, we can go into almost all elections knowing what most of the results are gonna be. If the exit polling, regional, and demographic data match up with the polls, we can call our elections almost immediately. In parliamentary democracies where there's just not the level of size or diversity and they hold elections between centralized parties, they simply do not have those advantages.
The US also has the advantage of having a "two-party system". So when we have a general election, there's usually only two candidates (and occasionally 3 candidates) that have a realistic shot of winning. So we don't have to worry about 4 or 5 or 6 parties each with a projected amount that have a different margin of error and how that margin of error affects everyone else's vote. When you have such a kind of situation in a parliamentary democracy, a 3% error rate for each party could easily translate into a 1 standard deviation chance of a 6-10% vote swing in one direction which completely changes the composition of the parliament and the composition of the future ruling coalition. Another issue regarding parliamentary systems is that the political parties are centralized with no primary contests and the ruling coalition can often call elections at any time it so chooses. So therein represents another problem for parliamentary democracies.
Good Polling and a Correct Use of Polling is an Art and not a Science:
In essence, conducting and using polls is not a science. Effectively using polling isn't about fancy number crunching or being crazy or anything like that. Accurately and effectively using polling is about the communication of the results and the implications of the methods of communication. What do I mean? I mean that anyone with half a brain can look at numbers to compare which one is large. It takes much more than just being able to compare numbers to interpret polling well.
Interpreting polling properly means how much knowledge can we get out of the information on a poll. For example, it's important to be able to spot outliers because any one poll can contain a sample not representative of the population as a whole. So obviously, being able to spot outliers is a necessity considering that all polls have a given confidence interval. If you don't spot outliers, your conclusions will be wrong. I don't know how they'll be wrong, but they will be wrong.
Another aspect of polling is understanding and dealing with bias, which means being able to spot possible biases in various kinds of polling. For example, if the approval rating for the President is ~52% nationally while a poll has his approval rating at ~45%, we can probably say that the poll is biased based along partisan lines. Similarly, we can say that if a poll as 40% of each party as registration in its sample while you actually see 40% in one party and 35% in another, then we can probably say that the poll is biased.
Other ways we can check the bias of polls are along demographic lines. For example, political alliances in such a diverse country like the US will always be driven across demographic lines. If a set percent of some demographic has voted a certain way (like minority populations for Democrats) and they're underrepresented in a certain sample, then we can say the poll may be biased. Similarly, we can use basic demographic percentages and back of the hand calculations to check and see if the polls align.
So when we do look at polls, it's important to note all of the sub-categorizations. Another important factor when examining polls is how they deal with turnout factors. Some polls use registered voters while others use likely voters and some have both. Many who lean a certain way that don't show up to vote aren't, by definition, representative of the voting population as a whole. So it may be important to adjust polling estimates for turnout.
In other words, all of the listed aspects of every poll which include voter alignment, demographics, turnout, etc are essential in using polls properly. As I've said, there's no correct or incorrect way of assessing whether polls are biased or on spotting outliers. As long as it'll work, it'll do just fine, but we must be careful in that process to make sure we're thinking and analyzing rigorously.
So how do we think and analyze the data in front of us rigorously? The first key is to remain skeptical at all times and always try to find out when you're wrong. It becomes crucial to understand all of your assumptions and systemically make sure every logical step we take is valid and that the implicit assumptions behind each logical step holds. Another key is to understand how to deal with error effectively, which means error propagation across the entire system. The most important thing to understand is error dynamics and how the error shifts as your assumptions shift. These are the most important guidelines to using polls effectively.
Interpreting polling properly means how much knowledge can we get out of the information on a poll. For example, it's important to be able to spot outliers because any one poll can contain a sample not representative of the population as a whole. So obviously, being able to spot outliers is a necessity considering that all polls have a given confidence interval. If you don't spot outliers, your conclusions will be wrong. I don't know how they'll be wrong, but they will be wrong.
Another aspect of polling is understanding and dealing with bias, which means being able to spot possible biases in various kinds of polling. For example, if the approval rating for the President is ~52% nationally while a poll has his approval rating at ~45%, we can probably say that the poll is biased based along partisan lines. Similarly, we can say that if a poll as 40% of each party as registration in its sample while you actually see 40% in one party and 35% in another, then we can probably say that the poll is biased.
Other ways we can check the bias of polls are along demographic lines. For example, political alliances in such a diverse country like the US will always be driven across demographic lines. If a set percent of some demographic has voted a certain way (like minority populations for Democrats) and they're underrepresented in a certain sample, then we can say the poll may be biased. Similarly, we can use basic demographic percentages and back of the hand calculations to check and see if the polls align.
So when we do look at polls, it's important to note all of the sub-categorizations. Another important factor when examining polls is how they deal with turnout factors. Some polls use registered voters while others use likely voters and some have both. Many who lean a certain way that don't show up to vote aren't, by definition, representative of the voting population as a whole. So it may be important to adjust polling estimates for turnout.
In other words, all of the listed aspects of every poll which include voter alignment, demographics, turnout, etc are essential in using polls properly. As I've said, there's no correct or incorrect way of assessing whether polls are biased or on spotting outliers. As long as it'll work, it'll do just fine, but we must be careful in that process to make sure we're thinking and analyzing rigorously.
So how do we think and analyze the data in front of us rigorously? The first key is to remain skeptical at all times and always try to find out when you're wrong. It becomes crucial to understand all of your assumptions and systemically make sure every logical step we take is valid and that the implicit assumptions behind each logical step holds. Another key is to understand how to deal with error effectively, which means error propagation across the entire system. The most important thing to understand is error dynamics and how the error shifts as your assumptions shift. These are the most important guidelines to using polls effectively.
No comments:
Post a Comment