We all learned a hard lesson on the night of November 8. Nearly every single election poll had Hillary Clinton winning by a sizeable margin. The gold standard, Nate Silver’s 538 website, which mashes hundreds of state by state polls into their algorithm had her winning with 302 electoral votes, 32 more than needed. Their model projected that there was only a 10.5% chance that Clinton would win the popular vote and lose the Electoral College. To make matters worse, they had already adjusted for likely voters, so turnout should hardly have affected their conclusions. Despite being so wrong this time, like everyone else, in 2008 and 2012, 538 got every single state and DC right in their model. My question is very simple: If the most sophisticated and redundant survey systems available got a simple binary choice wrong, how can we trust the surveys we use in business (which are much more complicated) to make decisions about where to invest media and marketing money and what the intended result will be? Maybe it’s time for a little review on surveys.
We begin by understanding what is at stake when we depend on surveys to give advice to clients and take action on their behalf. We risk the money we invest, our company’s reputation, our own credibility, client relationships and perhaps even our jobs. So we start with the measures of risk associated with surveys. Generally speaking they are called confidence intervals or conversely ranges of error. These measures assume the following about the samples they measure: The samples are supposed to be perfectly random yet representative of the population, questions are presumed to be perfectly phrased and airtight, and respondents are truthful. Nonetheless, some samples we use are overstuffed, overburdened, overblown, oversold, overused, under-sampled and imperfect.
Every single survey we use has at least one or more of these characteristics, but we choose to ignore them. I understand why. I understand that it costs too much to solve many of these problems and that our surveys, in many cases, represent the only game in town. This is not an indictment of research companies. It is, however a hard look at how we use their product. That said, given what happened on November 8 I think its time to take a closer look at our understanding of surveys.
Generally speaking clients want surveys to reflect reality, while agencies that understand the nuances of sampling are content with consistency, and the media just want big numbers. Maybe we should look more closely at margins of error. The factors that affect the reliability of samples are the quality of the sample, truthfulness of respondents, market size, demo size and the size of the category being considered.
Even if all these considerations were perfect, sampling error merely tells you that if you went out and re-sampled 100 times you would get a fairly consistent result within the margin of error. With that in mind, that error levels measure consistency (that approximates reality), let’s look at a few results. We will use a two sigma level of assurance, which is another way of saying if you went out and sampled another 100 times, 95 times out of 100 you would get a consistent result in the range given.
On broadcast network TV a huge rating of 15% against a big demo like adults 25-54 is between 13.8% - 16.2%. That’s a swing of 1.2% or a variance of 8% on the original 15% rating. (1.2/15.0 = 8%). Not bad.
However a 3% rating against a smaller demo like women 18-24 varies from 2.3% - 3.7%. That’s a 27% variance on the 3% rating (+ or – 0.7/ 3,0 = 27%).
On lower rated cable stations the variance can be more than 75% of the original rating. In small TV markets, with low rated shows, the variance can be as much as 99% of the original rating. Sure, these numbers improve with higher rated programs, dayparts, stations and demos, but remember the only thing reflected here is the statistical theory that the sample approximate reality in a perfect world.
Put another way, samples do not have the veracity of a complete census. You can do this error range exercise with CPMs and CPPs. In a smaller market like Louisville, in late fringe, for example, a cost per point of $45 has an error margin of $45.
Wow. Remember, we programmatically negotiate for pennies.
Okay, I know what critics will say. When you gross up an entire schedule, the confidence level increases and ranges of error decrease. So too when discussing reach and frequency. When I discussed this with the late Erwin Ephron (The Media Guru) he liked to say it was “akin to regressing to the mean.” Like when you love a restaurant the first time you go and then revisit only to find that the experience is not quite as good. The experience gets watered down. So too, it does with large schedules reducing extremes.
However, to make matters worse, start to think about the small numbers associated with micro-targeting and digital initiatives. Frustrating, isn’t it?
So Trump is President-Elect and we still use the same ratings systems. What should we do to avoid the trap of November 8 in the way we use samples?
There is no silver bullet on this subject. Research houses do a wonderful job of trying to reflect reality with the money made available to them by the people who us their service and within the boundaries of sampling science. Now it is up to us not to be fooled by the whimsy of chance.
Click the social buttons above or below to share this content with your friends and colleagues.
The opinions and points of view expressed in this article are exclusively the views of the author and/or subject(s) and do not necessarily represent the views of MediaVillage.com/MyersBizNet, Inc. management or associated bloggers.