Wednesday, June 29, 2016

On the Effectiveness of Polls and the Artistry Behind Using Polls Well

In my past few blog posts (excluding the last one), I've primarily discussed the political race in the US primaries. Recently, we also had a referendum in the UK about leaving the EU (so-called Brexit) in which the UK chose to leave in a ~52%-48% lead. In this post, the results of Brexit or the US primaries are rather irrelevant, except in the more general aspects of prediction.

Misuse of Brexit Polls:
First off, there's been a lot of talk about how disastrous polling was in the UK from Brexit and there's also been discussion about polling failures more generally. So I'll begin by talking about the poll results coming into Brexit. Some are using arguments that say polling was completely useless because there were polls showing Remain over Leave by a large margin the day before the polls. Of course, when we use polls we've gotta remember that any given poll comes with a margin of error, can be poorly done by a polling firm, the poll can be an outlier, and a whole host of other factors. In other words, there's a real problem taking one particular day or one particular poll too seriously.

In other words, properly using polling means that the most important factor to take into account are polling averages across a wide array of polls from different methodologies. If the polls are relatively accurate and in line with one another, then that tells us the polls have a better chance of being accurate. If the polls are all over the place (like it was with many Brexit polls), then you can basically toss them out for the most part.

In the specific case of Brexit, not only were many of the polls all over the place but the Huffington Post polling average basically had Leave and Remain neck and neck for weeks with a very small margin. In other words, a proper diagnosis of the polls--assuming they were representative of the population--should've told us that the chance of a Leave vs a Remain is basically close to 50/50. Sure enough, that's what we got. We also know that there were certain demographic components like the young favoring Remain by a large margin while older voters were favoring Leave alongside more educated voters favoring Remain while those less-educated favoring Leave. So what does that tell us? Well, we know that older voters have higher turnout and younger voters have lower turnout. So that fact alone would tell us that it may actually make sense to favor Leave because older voters will likely have higher turnout. However, the educational factor tells us the same thing about favoring Remain. In essence, we should've expected a very close vote.

So anyone saying that this Leave vote was ridiculous or absurd or totally unpredictable where the polls were completely wrong is really just not sound thinking. Quite frankly, that's just about as terrible as a use of polling as anyone could have. So why did the betting markets have Remain at such high odds? I don't know, but if I had been given those kinds of odds before the election, I'd have taken them (I bought a straddle on a UK equity ETF expecting much higher volatility off things like this, on which I was correct). 

US Polling More Reliable Than Foreign Countries:
Now, I'll discuss why American polling and American pollsters are simply more effective and better at accurate polling than other places. Much of this has to do with the American political structure which is a decentralized federal structure. We have 50 states that hold Senate and House elections every two years and Presidential elections every 4 years. We also have state polling, local polling, federal polling, and polling for both party primary elections and the general elections. Parliamentary systems, on the other hand, only have one election every few years with a referendum that's even more rare.

In other words, the US political system allows for pollsters to do much more tinkering to adjust polls and find ways to limit error. If we take the UK as a simple example, they haven't had a referendum like this in a long time. They also had an election in 2015 with the previous elections in 2010, 2005, and 2000. In other words, they've got one election every 5 years where pollsters have to figure out--as a country--how they're gonna account for these shifts. In order to correct their polls, if you were to go back 4 elections that'd compromise an entire generation AND you only have 3-4 pieces of actual data on which to compare the polls with. Compared with the errors created by a new generation of voters, those kinds of adjustments done by data 5, 10, 15, and 20 years ago are essentially useless because of shifts in demographics.

However, in the US, we've got 50 different states each with their own polling, their own demographics, and they hold 2-3 sets of elections every 2 years. We also have a much larger and more diverse country which means we can use demographic factors like age, race, gender, income, etc much more effectively to weight our polls in order to reduce our errors. One major advantage in terms of a general election for President of the United States is that we know where most states are gonna line up going into the November of a Presidential year. Since we have an electoral college format in our elections, we can go into almost all elections knowing what most of the results are gonna be. If the exit polling, regional, and demographic data match up with the polls, we can call our elections almost immediately. In parliamentary democracies where there's just not the level of size or diversity and they hold elections between centralized parties, they simply do not have those advantages.

The US also has the advantage of having a "two-party system". So when we have a general election, there's usually only two candidates (and occasionally 3 candidates) that have a realistic shot of winning. So we don't have to worry about 4 or 5 or 6 parties each with a projected amount that have a different margin of error and how that margin of error affects everyone else's vote. When you have such a kind of situation in a parliamentary democracy, a 3% error rate for each party could easily translate into a 1 standard deviation chance of a 6-10% vote swing in one direction which completely changes the composition of the parliament and the composition of the future ruling coalition. Another issue regarding parliamentary systems is that the political parties are centralized with no primary contests and the ruling coalition can often call elections at any time it so chooses. So therein represents another problem for parliamentary democracies.

Good Polling and a Correct Use of Polling is an Art and not a Science:
In essence, conducting and using polls is not a science. Effectively using polling isn't about fancy number crunching or being crazy or anything like that. Accurately and effectively using polling is about the communication of the results and the implications of the methods of communication. What do I mean? I mean that anyone with half a brain can look at numbers to compare which one is large. It takes much more than just being able to compare numbers to interpret polling well.

Interpreting polling properly means how much knowledge can we get out of the information on a poll. For example, it's important to be able to spot outliers because any one poll can contain a sample not representative of the population as a whole. So obviously, being able to spot outliers is a necessity considering that all polls have a given confidence interval. If you don't spot outliers, your conclusions will be wrong. I don't know how they'll be wrong, but they will be wrong.

Another aspect of polling is understanding and dealing with bias, which means being able to spot possible biases in various kinds of polling. For example, if the approval rating for the President is ~52% nationally while a poll has his approval rating at ~45%, we can probably say that the poll is biased based along partisan lines. Similarly, we can say that if a poll as 40% of each party as registration in its sample while you actually see 40% in one party and 35% in another, then we can probably say that the poll is biased.

Other ways we can check the bias of polls are along demographic lines. For example, political alliances in such a diverse country like the US will always be driven across demographic lines. If a set percent of some demographic has voted a certain way (like minority populations for Democrats) and they're underrepresented in a certain sample, then we can say the poll may be biased. Similarly, we can use basic demographic percentages and back of the hand calculations to check and see if the polls align.

So when we do look at polls, it's important to note all of the sub-categorizations. Another important factor when examining polls is how they deal with turnout factors. Some polls use registered voters while others use likely voters and some have both. Many who lean a certain way that don't show up to vote aren't, by definition, representative of the voting population as a whole. So it may be important to adjust polling estimates for turnout.

In other words, all of the listed aspects of every poll which include voter alignment, demographics, turnout, etc are essential in using polls properly. As I've said, there's no correct or incorrect way of assessing whether polls are biased or on spotting outliers. As long as it'll work, it'll do just fine, but we must be careful in that process to make sure we're thinking and analyzing rigorously.

So how do we think and analyze the data in front of us rigorously? The first key is to remain skeptical at all times and always try to find out when you're wrong. It becomes crucial to understand all of your assumptions and systemically make sure every logical step we take is valid and that the implicit assumptions behind each logical step holds. Another key is to understand how to deal with error effectively, which means error propagation across the entire system. The most important thing to understand is error dynamics and how the error shifts as your assumptions shift. These are the most important guidelines to using polls effectively.

Thursday, June 16, 2016

On Climate Change, Climate "Science", Forecasting Issues and Geopolitical Impacts

I suspect that in the 2016 Presidential Election, the topic of climate change will be an important issue between the two major Presidential candidates: Hillary Clinton and Donald Trump. Of course, Hillary Clinton views climate change as the most important issue we face today while Donald Trump has claimed that climate change is a hoax created by the Chinese government (I don't think he actually thinks it's a conspiracy, but I'm sure he does think it's a hoax).

So in this post, I'll discuss the issue of climate change, how it's changing things, how it could continue to change things, the possible errors in forecasting and modeling, and climate change vs global warming. Hence, this post will be split into 4 topics:
1. The Flaws of "Global Warming"
2. Forecasting Issues and Mathematical Modeling of the Impacts of Climate Change
3. Climate Change is More Sound than Global Warming
4. Security Risks and Geopolitical Impacts of Climate Change

1. The Flaws of "Global Warming":
First, I'll discuss the issue of "global warming" vs climate change and the specific differences, the scientific flaws in their arguments, and the risks we face today.

In the idea of "global warming", the basic concept is just to use the well-known greenhouse effect for CO2 emissions as a way to determine and predict future temperatures. My personal view of this theory is that it's bogus because it generally assumes away the complex factors that are the environment and the second and third order effects involved here.

Another problem with the idea of "global warming" is that it's inherently not falsifiable and not repeatable in controlled environments, which means that such ideas are automatically not science by definition (science is a procedure involving the testing of hypotheses by experiments in fixed conditions). So the entire idea of "global warming" is kind of a joke.

A third problem is how we measure mean temperatures across the globe. Do we do it by area or by latitude and count every latitude as the same and how do we weight the temperatures across the globe. Unfortunately, you rarely see the methodology of calculation discussed or critiqued in most discussions which is a real problem.

2. Forecasting Issues and Mathematical Modeling of the Impacts of Climate Change:
Many of those who do support the idea of "global warming" often claim that they have forecasting models which can predict the future. Of course, even a half-decent look at these models would lead someone to realize that these "forecasting models" are really mathematical models built on certain assumptions that are trying to predict a very complex system we do not understand. These mathematical models are what're called "chaotic" in mathematics and have an entire field of mathematics devoted to their study--the field is called chaos theory. So what does "chaos" mean in mathematics? Chaos means that there's a deterministic model that's highly sensitive to small shifts. In other words, a minor shift in either the initial conditions or the inputs for the parameters could lead to a completely different outcome. This is better known as the "butterfly effect" where a butterfly's movement in Texas creates a tornado in Missouri. In other words, these models are highly sensitive to any kind of shift.

If these models are sensitive to very small shifts, the implications of such models as "forecasting tools" would imply that even something as small as measurement error in the estimation of parameters would make the models quite useless at forecasting effectively. So does that mean these models are useless? Of course not. These models may not be useful in prediction the numbers or the future precisely, but what they can do is give us a general view of how such kind of a complex system can shift depending on the shifts in the parameters and initial conditions.

So the idea of "global warming", while there is some evidence backing it, is largely not reliable because of the dynamics of the models being used. However, the term climate change is a completely different story. Although the idea of "global warming" being scientific is nonsensical and impossible (as I stated before), climate change is a very real threat to our very existence. Why do I say this?

3. Climate Change Is More Sound Than Global Warming:
Well, the risk underlying climate change is that there's man-made actions and other things human beings have done in the past few centuries since the beginning of industrialization that's created drastic, large-scale shifts in the earth's climate. For example, carbon dioxide emissions have spiked in the past few hundred years as has deforestation and a whole bunch of other human activities like extraction of fossil fuels and other minerals. Having these actions take place on a large-scale would certainly have some impact on the environment even though such impacts would be unclear.

We clearly have data linking the data of historical global temperatures to carbon dioxide in the atmosphere. We also have data linking certain other impacts of global weather shifts to some of the man-made events we've done going back several thousand years, especially local impacts of certain constructions or man-made activities creating future environmental shifts. So when we largely increase the scale of such activities while seeing an almost 30-40 fold increase in the total human population of the world with rapid industrialization concentrated in 200 years, you're bound to screw something up in the environment somewhere. And when we add in the nonlinear impacts of shifts in any dose or the sensitivity of the models we do have on the initial conditions or parameters, we're really setting ourselves up for a huge fall in the future.

In my view, we're clearly seeing global weather patterns shift significantly right now from man-made impacts of industrialization. Why do I say this? Because we're seeing shifting fruit cycles in certain years (like Indian mangoes this year) or shifting periods of monsoon winds and irregular weather across the entire globe. In other words, we're seeing sudden shifts in the volatility of climate patterns that we haven't seen on a global scale in at least about 10,000 years. In certain places, we're seeing consistently irregular weather that hasn't been common and isn't in line with the historical cycles embedded into their cultures or traditional calendars.

Of course, there's the counterargument by those who reject the idea of climate change who say that we can't know that it's man-made and that "the climate is always changing". Obviously, the climate and its weather patterns are always changing, but I'd also argue that we've seen human actions affect climate patterns in history before. For example, the climate patterns around Europe were definitely impacted by the actions taken by Rome including certain construction of infrastructure, deforestation, and other sorts of effects. Roman cities were known to have pollution and ash in the air, which is something that's rarely mentioned. All of those things over the course of hundreds of years will definitely have an impact on the climate. There was also a time when the Sahara desert was actually fertile and home to lots of human life. I'm of the view that it was certain human actions like exhausting certain water resources, various infrastructure that wasn't environmentally sound, and other factors of the sort which led to real problems.

We have clear scientific linkages between shifts in carbon dioxide levels and overall climate shifts in the earth's history. Does this mean a causation? Of course it doesn't, but that's not the question at hand. The question at hand is what are the potential costs vs the potential benefits. So the tradeoff is: we just keep doing what we're doing and risk a serious catastrophe from either extractive industries, emissions, future impacts on the environment via the costs of extraction or refinement of commodities and other natural resources, or other things of the kind. In other words, we have little to gain and a lot to lose.

More importantly, I've also discussed how economies that rely heavily on extractive industries can undergo serious problems. Obviously, environmental costs must clearly be a part of this discussion and there's plenty of good reasons which lead us to the conclusion that extractive industries are definitely related to various shifts in the earth's climate.

4. Security Risks and Geopolitical Impacts of Climate Change:
As I've stated above, the mathematical models involved in the observations of climate change aren't very accurate for forecasting, but they're very effective for determining various scenarios of the qualitative behaviors of the system. What we do know is that the models are highly sensitive to the initial conditions and the shifts in parameter values, but since the earth's climate is a complex system it probably has many of the same features. In other words, we're taking on a lot of risk for little gain.

My primary concerns about climate change have nothing to do with "scientific predictions" regarding sea level rise or a rise in mean temperatures. I'm much more concerned about shifts in the volatility of climate patterns? What do I mean? I mean shifting swaths of arable land that will create mass migrations. I'm much more concerned about shifting weather patterns like rainfall or wind patterns that could potentially devastate areas that haven't had too much rainfall with much more rainfall than expected while other areas that need rainfall don't meet their rainfall requirements. I'm concerned about how shifts in climate patterns affect trade routes and lines of supply across the globe.

Among the biggest risks from climate change that concerns me the most is how shifts in rainfall or wind patterns could affect bodies of water and the potential geopolitical impacts of such shifts. If that scenario is the case, we could be headed towards disaster.

The global security risks posed by climate change could result in large population corrections within the next 50 years. These population corrections could even be as large as 50-60% of the world's population. So we're talking about billions of lives at risk due to climate change. Other risks of climate change also come from rising sea levels where you have many coastal cities that could be at risk. Other factors that could change are viable ports or harbors and the locations of those ports or harbors.

There are many other possible risks and geopolitical impacts about climate change with the biggest factor being: WE DON'T ACTUALLY KNOW THE FULL IMPACT!!! All we know is that it's gonna screw something up somehow and the interconnectedness of our world makes the human species fragile because we've got little to gain and basically everything to lose.