Recent telephone polling in North Carolina has been so unbelievable that it’s hard to figure out how one can even get such results.
In the past six telephone surveys - all of which were at least started no later than September 6th - have the following results:
- McCain +20
- McCain +4
- McCain +3
- McCain +17
- McCain +11
- McCain +1
At first when the first 3 polls came out, and we had +4, +3, and +20, it was naturally assumed that the +20 poll was an outlier. Then we got the +17 poll and we had two pairs of polls diametrically opposed to each other. Now after today, we have two threesomes of polls which are diametrically opposed to each other.
However, when one looks at it closely, one can actually see how we can get the results that we do (or at least make them more plausible). Here are those polls with the margin of error with 95% confidence, applied to both candidates (remember, the first number would be O+moe and M-Moe and the second number is O-Moe and M+moe, thus why a poll with a 4% MOE has a 16% margin):
- McCain +12.4% to +27.6%
- Obama +3.8% to McCain 11.8%
- Obama +5% to McCain +11%
- McCain +9% to McCain +25%
- McCain +3% to McCain +19%
- Obama +6% to McCain +8%
Taking these ranges, we get two pretty plausible ranges that the polls may fall into. Both ranges include 4 of the 6 polls.
The first range leaves the best poll for McCain and the worst poll for McCain out of the mix. If we do this, the four middle polls all fall into a range between 9% and 11% within their 95% confidence level. The two remaining polls fall outside this range by only 1% and 1.4%
The second range takes the 4 most pro-Obama polls. In this case, the range would be between McCain +3 and McCain +8 within their 95% confidence level. It’s quite a big larger range, and the two polls left out are left out by 1% and 4.4%.
Obviously McCain wold prefer the 11% number while Obama wold prefer the 3% number, but i think the truth is somewhere inbetween, and the lead probably sits somewhere between 7% and 8% right now (my average has it at 7.9%). In any case, for those who are desperate to know where North Carolina stands, we can probably say it stands somewhere between McCain +3% and McCain +11%. Which of course is the difference between being a Too Close To Call state and being a Strong McCain state, so needless to say the range is pretty useless.
Also, having at minimum 1/3 of the polls as outliers, no matter how you cut it, is still a bit much. And there is also another point here: there are in a way two type of outliers. There are statistically significant outliers and polls which aren’t statistical outliers, but are effectively outliers.
This is what I mean:
If you have 5 polls, all with a margin of error of 4%, all showing a race of between 2% and 4%, and then a poll comes out with someone 22% ahead with a margin of error of 4%, there is no way that the 22% poll and any of the 5 other polls can jive and still all be within all of their 95% confidence levels. I would consider that a statistical outlier.
But let’s say a poll came out which showed a race of 15% with a margin of error of 4%. Given the exercise above, one could say “well, even the closest poll could possibly have a real margin of up to 10% while the 15% poll could be as low as 7%, so where’s the problem?” Of course, the issue here would be: what are the chances that you have 5 polls whose results are between 2% and 4% when the actual margin is between 7% and 10%? Sure, it’s possible, but it becomes quite a bit less likely. It would make more sense to say that the 15% poll is effectively an outlier, even if it isn’t statistically one.
So where does North Carolina come into this? In my example above with the two ranges for the NC polls, we’re kind of treading into the territory of the “effective outlier.” The problem is that the 3 polls showing a race within 5%, the poll showing a 11% race, and the two polls showing a near or at 20% race are all effectively outliers of each other (not to mention that we’re guaranteed at least 2 statistical outliers as well).
But unlike the example above with 5 close polls and a 15% poll, enough polls are in each camp that we just can’t know which group of polls consists of the outliers. Even though the 11% poll is by itself, it’s still taking up the middle ground between the two other groups.
My current score for North Carolina as of this writing - 7.9% - is 3.1% away from any of the six polls. How oven do you have an average of polls where the average is so far away frm the results of any of the polls in the average? Yet that’s what we have.

