Methodology

The following is a review of the methodology of my project.

While I probably use 95% of all polls released, I do institute some minimum standards so as to weed out some polls which are more likely than not going to produce highly questionable results.  The minimum requirements that I institute are:

  1. A poll must be a phone-conducted poll
  2. A poll must have at least 400 respondents
  3. A poll must have been conducted in 2008

At this point, the final requirement doesn’t really matter, since polls that old are weighted so little that it’s unlikely that it would impact the results, but it made my life easier entering the polls in any case.

The second part of my methodology is the weighing of polls.  Polls are weighted based on how much older they are than the most recent poll conducted in that state.

This has two side effects.  The first is that if the most recent poll in a state was conducted on May 12th, then the “age” of all polls in that state are from May 12th, not the current date. The second effect is that, since polls lose impact as they age, even though all polls are counted towards the average, very old polls may count so little that it doesn’t actually impact the average.

The most recent poll in a state is given a weight of 1, meaning that it is counted at 100%. After that, the poll is added to the average based on it’s weight. Polls are roughly weighted as such (based on THE math that I get into with more detail below):

  • The most recent poll is weighted at x1
  • A poll 3 days old is weighted at about x0.92
  • A poll 1 week old is weighted at about x0.67 (or 2/3 the weight of the most recent poll)
  • A poll 10 days old is weighted at about x0.5 (or 1/2 the weight)
  • A poll 2 weeks old is weighted at about x0.33 (or 1/3 the weight)
  • A poll 1 month old is weighted at about x0.1
  • A poll about 100 days old is weighted at about x0.01

I then divide the sum of the weighted poll totals by the sum of the weights and I get the poll average.

The date of a poll is determined by the last day that polling was done for the poll, not the day the poll is released. So if a poll is conducted from June 15th to June 18th, the date of that poll is set at June 18th. Obviously, if multiple polls are released on the same day, they have the same weight.

If a poll has both a Registered Voter and a Likely voter number, I use the Likely voter model.

THE Math

While Karl Rove may think he has the math, I have the absolute real math.

The average of a candidate’s percentage in any given state is calculated thus:

Z = Σ(RpWp) / ΣWp

Or, in other words, I take the summation of the results of Poll p times the Weight of Poll p and divide it by the summation of the weight of all Polls p.

How do I figure out the weight for any given poll?  Here’s how:

Wp = 1 / (((ΔD)2 / 100) + 1)

Or, in other words, it’s the age of the poll squared, divided by 100, plus 1 divided into 1.

Don’t ask me why this equation works.  I mostly got it by trial and error, but it gives the effect of a gentler slope for the most recent polls, then a steeper slope as polls get older, then it levels out again when polls get really old.

As for the ΔD, it’s simply the age of the poll.

Isn’t math fun?