How to Lie with Statistics by Darrell Huff • Novel Investor

How to Lie with Statistics Buy the Book: Print | eBook

Darrell Huff’s book is about the long history of data deception. He explains the many ways data can be manipulated — to misrepresent facts, to tell a different story — in advertising, politics, and other areas and how to defend yourself from it.

The Notes

“Averages and relationships and trends and graphs are not always what they seem. There may be more in them than meets the eye, and there may be a good deal less. The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify. Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, “opinion” polls, the census. But without writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense.”
Statistics can be used to deceive. Data can be consciously or unconsciously biased — by the statistician or the source.
A poor data sample is one of the biggest errors. A data sample is a small representative sample of a bigger statistical population.
Huff uses an example of a barrel of red and green beans. Rather than count every bean to see how many red beans there are, you grab a handful thinking the red/green ratio in your hand is similar to what’s in the barrel. The sample needs to large enough and selected carefully enough or it won’t represent the whole barrel closely enough.
A biased sample leads to conclusions no better than an educated guess, yet could still be seen as fact.
Sources lie: polls and surveys come with an additional risk of bias from the source. The person being asked the questions can exaggerate their answers to make themselves look better or give the more pleasing answer – the answer they think the surveyor wants to hear.
Being skeptical and giving some extra thought to how the sample might be biased is the best deterrent.
“The test of the random sample is this: Does every name or thing in the whole group have an equal chance to be in the sample?”
Assuming the sample passes all the tests, it’s best not to view the results as perfectly accurate.
“Average” has multiple meanings that can be used deceptively. There’s mean, median, or mode. In investing, there is the geometric average which is different from the arithmetic average. Taken from a series of numbers, each “average” can be different, to be misused based needs.
In a normal distribution like the height of a population, you get a bell curve, with the mean, median, and mode falling close together. A skewed distribution, like income for a population, leans to the left or the right, producing an obviously different mean, median, and mode.
Ask yourself: “Average of what? Who’s included?”
A small sample size is easily manipulated and often hidden in plain sight (a reason to read the small print).
Chance — luck, dumb luck — can have a big impact on the results in a too small sample. Flipping a coin should produce heads 50% of the time. Now flip a real coin ten times: 80/20 is just as possible as 50/50. Chance will have a bigger impact on 10 flips than on 1,000 flips.
A large enough “flips” or trials are needed for the probability to be useful.
“If the source of your information gives you also the degree of significance, you’ll have a better idea of where you stand. This degree of significance is most simply expressed as a probability… For most purposes, nothing poorer than this five percent level of significance is good enough. For some, the demanded level is one percent, which means that there are ninety-nine chances out of a hundred that an apparent difference, or whatnot, is real. Anything this likely is sometimes described as “practically certain.””
Standard deviation — the range from average — is often purposely left out to oversimplify the average or give it the appearance of precision. It which case the average is useless.
“Knowing nothing about a subject is frequently healthier than knowing what is not so, and a little learning may be a dangerous thing.”
Standard error measures the accuracy in which a data sample represents a population. It’s presented as “± X”.
The point of standard errors is to start thinking of results as being in a range, even if the standard error isn’t stated. The big mistake is believing results are precise when it really represents a range of probabilities.
“There is terror in numbers. Humpty Dumpty’s confidence in telling Alice that he was master of the words he used would not be extended by many people to numbers. Perhaps we suffer from a trauma induced by grade-school arithmetic.”
Charts and graphs are a convenient way to use data to exaggerate, deceive, or lie. Also known as chart crimes.
Ignore all charts and graphs without numbers or measures on the X or Y axis.
Watch out for line graphs that change the proportion between X and Y axis to steepen or flatten the slope of the line/curve.
Beware of pictographs and bar charts that change width, height, and length but represent a single factor. It can give an exaggerated visual impression of comparison.
“If you can’t prove what you want to prove, demonstrate something else and pretend that they are the same thing. In the daze that follows the collision of statistics with the human mind, hardly anybody will notice the difference. The semiattached figure is a device guaranteed to stand you in good stead. It always has.”
A semi-attached figure is using one thing as a way to claim proof of something else, even though there’s no correlation between the two. For example, 32% of doctors think Mercedes vehicles are safe. It’s meaningless. “The only answer to a figure so irrelevant is “So what?””
“If you’d like to go on a hunt for semiattached figures, you might try running through corporation financial statements. Watch for profits that might look too big and so are concealed under another name… The truth is, what the company reports as profits is only a half or a third of the profits. The part that isn’t reported is hidden in depreciation, and special depreciation, and in reserves for contingencies… There are often many ways of expressing any figure. You can, for instance, express exactly the same fact by calling it a one per cent return on sales, a fifteen per cent return on investment, a ten-million-dollar profit, an increase in profits of forty per cent (compared with 1935- 39 average), or a decrease of sixty per cent from last year. The method is to choose the one that sounds best for the purpose at hand and trust that few who read it will recognize how imperfectly it reflects the situation.”
Post hoc fallacy — “If B follows A, then A has caused B.” For example: if data shows that runners have a higher average income, it would be wrong to assume that running will increase your income. Yet, people come to that conclusion. In every case, there are many possible explanations why (like chance) and you can’t just pick the one that fits your opinion.
“Given a small sample, you are likely to find some substantial correlation between any pair of characteristics or events that you can think of.”
Other errors in conclusions: confusing cause and effect (correlation and causation), assuming the data extends beyond what limits allow (more of one thing, equates to more of another),
Be ever watchful for the bag of tricks used to intentionally (or accidentally) distort statistical claims.
The decimal point adds the inference of precision where it rarely exists. Percentages are no different. And when combined (like in finance taking percent to the nearest hundredth) it gets silly.
Poor math using percentages — adding, subtracting, multiplying, dividing — should be added to the watch list.
For example, combining a sales discount of 50% and 20% is not 70% off but 60% off. Another example: if your portfolio takes 50% loss followed by a 50% gain, you’re not back to even. You’re only halfway back to even. A 100% gain is needed to break even after a 50% loss.
“It is sometimes a substantial service simply to point out that a subject in controversy is not as open-and-shut as it has been made to seem.”
“The fact is that, despite its mathematical base, statistics is as much an art as it is a science. A great many manipulations and even distortions are possible within the bounds of propriety. Often the statistician must choose among methods, a subjective process, and find the one that he will use to represent the facts. In commercial practice, he is about as unlikely to select an unfavorable method as a copywriter is to call his sponsor’s product flimsy and cheap when he might as well say light and economical.”
The point is to avoid blind belief in a piece of data. Don’t write it off it out of hand. Instead, go in with skepticism to weed out the phony data from the useful data by asking 5 questions:
1. Who Says So? — look for conscious and unconscious bias. What’s the agenda? Do they benefit from presenting data in that specific way? Does it tell the entire story or a partial story (only the good or bad part)? Does it misuse average? Does it deceptively hide behind an “authoritative” name to back the claim? Does any number of biases – recency, overconfidence, confirmation, self-serving, sunk cost, hindsight, groupthink, etc. — come into play?
2. How Do They Know? — look for sampling bias. Where did the data come from? How did they get it? Is it a representative sample of the population? Can it be biased in any way? Is the sample big enough to make a reliable conclusion?
3. What’s Missing? — look for missing information that would make it more usable, thus trustworthy. Does it define the “average” being used? Is the standard error included? Is it raw data or percentages or both? Out of how many? Is a comparison or causal factor needed but missing?
4. Did Somebody Change The Subject? — look for a subject change between the raw data and conclusion. Does the data show a rise in X or is it just being covered more often? Does the title/article/paper relay the results or is it a biased use of it? If it’s a census or survey, can it be backed up, disproved, or made more informed with secondary data or a little second-level thinking and is it? Again with surveys, do those answering stand to benefit (get hurt) by answering in a certain way? If its a comparison, is it apples to apples or apples to oranges? Is it a post hoc fallacy?
5. Does It Make Sense? — look out for anything that fails the common sense test. Is it a complex, complicated thing reduced to a single number? Is the data presented in a range or as a precise figure? How many decimal points? Is it an extrapolation of a trend or a prediction of the future?
“Many a statistic is false on its face. It gets by only because the magic of numbers brings about a suspension of common sense.”
“Extrapolations are useful, particularly in that form of soothsaying called forecasting trends. But in looking at the figures or the charts made from them, it is necessary to remember one thing constantly: The trend-to-now may be a fact, but the future trend represents no more than an educated guess. Implicit in it is “everything else being equal” and “present trends continuing.” And somehow everything else refuses to remain equal, else life would be dull indeed.”
Mark Twain on the nonsense of extrapolation: “In the space of one hundred and seventy-six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oolitic Silurian Period, just a million years ago next November, the Lower Mississippi River was upward of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-rod. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo and New Orleans will have joined their streets together, and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.” — from Life on the Mississippi
Disraeli: “There are three kinds of lies: lies, damn lies, and statistics.”
Artemus Ward: “It ain’t so much the things we don’t know that get us in trouble. It’s the things we know that ain’t so.”
Samuel Johnson: “Round numbers are always false.”

Buy the Book: Print | eBook

Or read other book notes.