Why Nate Silver's and 538's "model" isn't that good.
After a number of years of being wrong, Nate Silver is rolling out his so-called "model" again.
(Note: This will be Part 1 of a three-part series about three “models” in the political forecasting world. The next two will be about Allan Lichtman and Larry Sabato.)
In 2008, Nate Silver was all the rage, but only because nobody had ever brought any sort of statistical modeling of elections to the forefront for the average person to read and see. Yes, political scientists in the halls of the University of Michigan, Columbia University, and the University of Iowa had been making quite interesting forecasting models for years. However, the media-savvy Silver was able to bring attention to his model, truly making him the “first” in this “forecasting” genre. He knew the power of understanding blogging, and rode in on that fame. Unfortunately, tenured professors of great stature don’t have an inkling for media, nor do they need to.
Of course, since his 2008 model (which, honestly, predicted one of the most predictable elections in since 1980), most of his models have been spotty at best, and flat out wrong many times. For example, in 2018, FiveThirtyEight said that Democrat Andrew Gillum would win Florida’s gubernatorial election, with some odds being as high has 3/4. However, Republican Ron DeSantis won the election. Admittedly, one model did get the results in Florida correct…mine. Yeah, I’ll toot my own horn here.
But still, why is Nate Silver’s model meh? And, possibly more importantly, why are people misinformed when it comes to his so-called “forecasting model”. Let’s delve into it.
First of all, the biggest mislabeling is that Silver’s model is a “forecasting” model. That’s simply not true. In fact, it’s a “nowcasting” model, something that I do as well. And since polling only gives us a snapshot of a certain point in time, Silver’s model today only gives us what the results would be if the election were held today. Therefore, any of FiveThirtyEight’s projections are essentially useless until Election Day, and doesn’t really predict anything, as the results change on a daily basis. To give credit, FiveThirtyEight does put this disclaimer in their website.
Of course, nowcasting models aren’t bad. My highly-accurate model of Florida is a nowcast. However, the data being used makes all the difference. In my model, I use voter turnout. With voter turnout, you can get an idea of certain trends, such as who has yet to vote. With polling data, you have no idea how the polls will move over the course of a month, a week, or even a day. Simply, polls aren’t a tool of predictive behavior in the future.
With that, let’s look at what FiveThirtyEight uses, which are polls. Of course, there are a lot of problems when it comes to using polls as a predictive tool. First of all, how do you know that the polling firm is correct? The averaging of the polls tries to mitigate this factor, but it’s still a factor nonetheless. Simple averaging polls doesn’t account of outliers that might skew the aggregate (such as the dreaded Rasmussen Poll).
Another problem that FiveThirtyEight has (and possibly Silver’s new model, but I don’t know), is the polling “rating”, and how they grade polls from best to worst. According to their website, they account for accuracy, stating
Accuracy, as measured by the average error and average bias of a pollster's polls. We quantify error by calculating how close a pollster's surveys land to actual election results, adjusting for how difficult each contest is to poll. Bias is error that accounts for whether a pollster systematically overestimates Republicans or Democrats.
Of course, the obvious problem here is that “accuracy” can only be tested on Election Day, not before. This means that a poll only needs to be “accurate” once, in their last poll before Election Day. Any polls that are done before that can’t be taken into account because there is no other data to compare the polls with (because we don’t have elections every day), which makes “accuracy” a mute point.
Also, when do those polls need to be “accurate”? Let’s say, in a hypothetical election, a Democratic candidate wins by 2% in a general election. Two months before, in September, “Polling Firm A” only did one poll, and nailed the election results with a 2% margin for the Democrat. However, at the same time (in September), all other polls had the Republican candidate winning by around 6%. Obviously, the political climate in September was vastly different than what would happen on Election Day. Still, is “Polling Firm A” considered accurate? Of course, we don’t know because the election was held two months later. Therefore, it’s extremely hard to tell if “Polling Firm A” firm is “accurate”.
When it comes to forecasting models (which we already said that Silver’s/538’s isn’t), Dr. Michael Lewis-Beck set out four criteria as to what makes a good forecast: accuracy, lead, parsimony, and reproductibility. For Silver/538, they really don’t have any of these. So what are these?
Lead - How far in the future can an election outcome be predicted?
Accuracy - How accurate is the model?
Parsimony - How few variables are needed to make a prediction?
Reproductibility - Can the model be used from election to election?
Lead - We already mentioned how Silver’s model violates the “lead” rule, as it is a nowcast and not a forecast.
Accuracy - When it comes to accuracy, that can be debated. Since they use a “win ratio” instead of an outright winner, FiveThirtyEight gives itself little wiggle room to still claim victory, since they don’t claim absolute predictions with the exception of the safest of seats. For example, in Florida, FiveThirtyEight said that Joe Biden had a 69 out of 100 chance of winning the state. But he didn’t win the state. Still, because they aren’t claiming an outright winner, they can fall back on the “we were close” excuse instead of the “we were wrong” reality.
Oh, by the way, my 2020 Florida nowcast nailed it.
Just a side note, the Silver model becomes even more chaotic when looking at other countries and projecting seats in a parliamentary system. Let’s take Eric Grenier’s 2015 model of the Canadian election. Gernier’s poll-tracker, based on Nate Silver’s model, was extremely inaccurate. However, because he probably used standard deviations in determining seat totals, he could claim a slight victory. For example, he said that the Conservative Party of Canada would win 122 seats, with the low watermark being 99 seats and the high watermark being 139 seats. A 40-seat swing is 12% of the entire Canadian Parliament! Still, the Tories won 99 seats, so he was “technically” correct. As for the seat projections for the Liberals and the NDP, he wasn’t even within the standard deviation margins.
Parsimony - This is where the 538 (sorry, I’m tired of typing up FiveThirtyEight all the time) really falls apart. The idea is basically that the fewer variables that you have, the better the model is. Why? Because once you start adding more variables, you come to the point that you are just making descriptions of observations and not predicting anything.
Let me give you an example. Let’s say I’m thinking of a car and I give you hints. If I just tell you “I’m thinking of a car”, that too broad. However, if I say that “the car was made in the 1970s”, that narrows it down. If I further say that “this car is no longer in production”, that narrows it down more. If I say that “this car had major problems with its fuel tank”, now you know it’s probably a Ford Pinto, which it is. That last variable basically just confirmed the choice, and also become a variable only applicable in this particular situation.
In the case of 538, they do the same thing. Their “economic conditions” variable has an additional number of variables in it. Essentially, any time their model misses the point, they just add more variables to make the model work correctly. Yes, a model can be tweaked, but shouldn’t go through complete overhauls.
Also, when 538 adds more variables, why do they pick these specific ones? It seems like many of their variables are arbitrary based on correlation and not based on casual factors. In forecasting, one assumes that a voter is or isn’t voting a certain way because there are casual factors pushing voters that way. In my nowcast, my assumption is that if a voter is registered with a political party, they are likely to vote for that party in the general election. The 538 and Silver models have nothing regarding causality.
Reproductibility - I already hit on this when talking about parsimony. The more and more variables that are added election after election, it no longer becomes reproducable.
Speaking of the variables, and the 538/Silver models in general, the lack of causality makes them hard to swallow. Neither of these models say X leads to vote choice. Again, they just pick variables that just happen to go with the result.
Why is this important? Well, let’s say one of our variables is that Republicans win the general election every time that a full moon happens on July 15th on the election year (which I just made up). That isn’t causation, but just simply correlation. Same thing can be said with the age of Miss America and the number of murders by steam, hot vapors, and hot objects (shown in the graph below). While 538 might not go to this extreme, they aren’t far off. Their models real don’t show anything regarding causation whatsoever.
Now, of course, you can say that if we are just looking at odds of winning, the causation shouldn’t matter. Fair enough, I can see where that argument comes from, if you look at it from a pure statistical standpoint. But since we are talking about human behavior, we have to take into account that external social, not scientific, factors influence vote choice, thus influence the election’s final result (what is being predicted). Human behavior isn’t easily measurable as is, for example, the reaction of iron to certain other elements. In the latter, you can purely use statistics to understand results. You can’t do that in behavioral science (which is why the theory part of most stats-driven “political scientists” are usually pure garbage, which is another story for another day).
Finally, and this is a problem with all models regarding elections in the United States, is that there are only two choices. All American models have to do is to put their money on black or red. And with the exception of 2016, picking the winning side has been extremely easy since 1980. Coincidentally, that’s when most predictive models came out. So I can’t say this is a fault of Silver or 538, but when you have a 50/50 choice, and it’s pretty certain that one of those choices have a higher likelihood of winning, then it isn’t really that impressive. That’s why in my models, I try to predict major-party vote percentages, not just who wins or loses.
Next week, I’ll talk about why other so-called “predictive models” have their flaws.