Borrowing power from other states
Our previous working model only used data from competitive states. Can we improve our prediction accuracy by combining data from non-swing states?
There are no significant differences in RMSE whether we include non-swing states. Let’s see whether it impacts the prediction interval of 2024. We choose the model with the start year of the training data as 1992.
This tells us that we should stick with training on swing states if we want to predict the swing states!
Quantifying uncertainty
Here we collected quantiles from the random forest and ran a Monte Carlo simulation on 1000 draws to show jointly how often do each party win in each state.
Harris wins with more than 50% occurence in Virginia, Maine, Michigan, Minnesota, Nevada, New Hampshire and Wisconsin.
How about the electoral college?
Harris will win 45% of the time. The reader might be curious how this can be different from a predicted Harris win last time. This relates to how the mean and the median are different objects. The previous presentation relies more on the mean, and this presentation relies more on the median.
I also want to note that this model predicts that Harris has a more than 40% chance of winning Iowa, and the probability of the other non-swing states being won is not close to 90%. This is different from prevailing judgement and the betting market, so we might have to tweak the model further to include state-level approval for Harris.