Lying Liars that Lie with Data • Novel Investor

There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact. — Mark Twain

Many a statistic is false on its face. It gets by only because the magic of numbers brings about a suspension of common sense. — Darrell Huff

Data is easily manipulated. That is to say, it can be made deceptive, misleading, or be a big fat lie. It happens everywhere someone wants to make a point fit their narrative or agenda. Politics and advertising are some of the biggest manipulators around.

The world of finance is no slouch either. It happens with earnings…well, adjusted earnings. When you absolutely, positively have to show a good quarter, just conveniently leave out a few expenses.

In How to Lie with Statistics, Darrell Huff takes a witty approach to all the ways people can deceptively turn raw data into biased “facts” and how to spot it:

The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify. Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, “opinion” polls, the census. But without writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense.

Take average.

Nothing is tossed around more in investing than average returns. Sometimes it’s misused.

For example, I can say the stock market averaged a 10.4% return. I can claim it’s 14.4% too. Or I could say it’s 8.5%.

So which is true? They all are, to an extent.

Each one is an average, but only one is used correctly. That would be the last one, by the way.

The first — 10.4% — is the arithmetic average, which gets misused a lot. The second — 14.4% — was the median or middle of the set, which is pointless on its own. And the last — 8.5% — was the geometric average, which is the correct use because it accounts for compounding.

Even then, my “average return” isn’t useful because I conveniently left out the time period and the source. For the record, it’s the total return over the last 10 years (2008 to 2017) for the S&P 500.

But if I really wanted to be deceitful, I can use price return: 6.2% return. But I won’t tell you its the return price because who needs dividends right? But since I only want to be somewhat dishonest, that return is over the same 10 years for the S&P 500.

I could use a different index instead: 6.4% return. If I was being less tricky, I might tell you it’s the Dow over the same 10 years.

Or maybe an 8.6% return sounds better with the Russell 3000?

(If I really wanted to be underhanded, I’d probably use a factor index like the S&P 500 value or growth index — really, the equal-weight — and forget to tell you.)

I would make it more exact too: 8.4948498602287500% return. I mean, I want you to trust me and nothing builds trust more than extra decimal places. Who’s gonna believe a roughly 8% return with that type of precision hanging around?

I might change the start or end dates and make it look worse: say a 7.2% return. Again, I won’t tell you that’s the S&P 500 over the last 20 years.

Or better: 9.9% return. (Might be the S&P 500 over 15 years, but won’t say for sure).

Or phenomenal: 19.5% return. But I scheming prevents me from saying it’s the 1950s because why would I? It’s 19%.

Or cumulative: 432.2% return. Because I want sneaky extra exaggeration compared to annually without telling you the difference (to be sort of honest, that’s the 1990s).

How about 10.2% for the S&P 500 from 1926 to 2017? That’s almost useful.

But don’t worry, I’ll leave out the fact that the S&P was only 90 stocks till 1956. The other 410 were added in ’57. I also refuse to mention how nothing else was uniformly consistent over that period either: not the tax code, trading costs, make up of the index, breakdown across sectors, inflation, interest rates…it’s a long list. Don’t worry about it.

And I definitely promise that whatever return I use, it will infer consistency from year to year (there is no way I’m telling you how inconsistent returns are from one year to the next if I’m trying to pull one over on you). I mean, that’s all one number tells you anyways. And if you’re lucky, I might even extrapolate it into the future.

Okay, I won’t actually do that, but that’s how easy it is to manipulate average returns purposely or accidentally just by leaving out some important information. And average is just one of the ways data can be misleading.

The point Huff makes in his book is to avoid blind belief in a piece of data. Don’t write it off it out of hand either. Instead, go in with skepticism to weed out the phony stuff from the useful statistics by asking 5 questions:

Who Says So? — look for conscious and unconscious bias. What’s the agenda? Do they benefit from presenting data in that specific way? Does it tell the entire story or a partial story (only the good or bad part)? Does it misuse average? Does it deceptively hide behind an “authoritative” name to back the claim? Does any number of biases – recency, overconfidence, confirmation, self-serving, sunk cost, hindsight, groupthink, etc. — come into play?
How Do They Know? — look for sampling bias. Where did the data come from? How did they get it? Is it a representative sample of the population? Can it be biased in any way? Is the sample big enough to make a reliable conclusion?
What’s Missing? — look for missing information that would make it more usable, thus trustworthy. Does it define the “average” being used? Is the standard error included? Is it raw data or percentages or both? Out of how many? Is a comparison or causal factor needed but missing?
Did Somebody Change The Subject? — look for a subject change between the raw data and conclusion. Does the data show a rise in X or is it just being covered more often? Does the title/article/paper relay the results or is it a biased use of it? If it’s a census or survey, can it be backed up, disproved, or made more informed with secondary data or a little second-level thinking and is it? Again with surveys, do those answering stand to benefit (get hurt) by answering in a certain way? If its a comparison, is it apples to apples or apples to oranges? Is it a post hoc fallacy?
Does It Make Sense? — look out for anything that fails the common sense test. Is it a complex, complicated thing reduced to a single number? Is the data presented in a range or as a precise figure? How many decimal places? Is it an extrapolation of a trend or a prediction of the future?