Statistical Pet Peeves

I am fairly well known around my office for cynicism skepticism around “fun facts”. I usually need to see the figurative receipt before I believe it. When it comes to more complicated topics, I’m going to want to see the study.

Not all studies are created equal, however, and there are a million ways to use results to mislead the target audience (often journalists or the public).

There are three kinds of lies: lies, damned lies, and statistics.

Benjamin Disraeli

There are two things I do when I spot statistic abuse: the first is to close my browser tab, shake my head briefly for having been tricked into wasting a click, and forgetting it ever happened. The second is, if the subject matter is interesting enough, to look into the original study and see what the authors actually said. If the authors themselves are the ones making things purposefully unclear, I usually just assume the study is biased too, and don’t update my bayesian priors at all.

Without further ado, two fast ways to make me think the author is a clown in the best case, or purposefully misleading in the worst case.

1.) Stating a change of a percentage as a percentage:

For example: Reporting a change in income taxes from 10% to 11% that looks like:

Income taxes are slated to rise by 10% starting in 2017.

The average reader is going to have absolutely no clue what this means, and is likely to conclude that taxes (which they probably know are already close to 10%) are going to double.

This sort of thing happens all the time, and is most often seen in headlines, which is one of my unforgivable sins. A faster way to my ‘ignore list’ does not exist. That is why my adblocker also blocks the entire news section of yahoo finance.

Now, there is a perfectly acceptable way to report the percent on a percent change, but using the same example above, it looks like:

Income taxes are slated to rise from 10% to 11% starting in 2017, a 10% increase.

It’s so easy to make things clear that I can’t help but have a harsh interpretation when it’s not done.

2.) Dual Y-axis Graphs

What is a dual Y-axis graph and why do I have beef?

Here’s the first sample I found on google.

Note the two y-axes. this is the hallmark of a two y-axis graph.

So what is the problem? The problem is that you can make a two y-axis graph “say” virtually anything you want. The relative values of units are completely out the window (and don’t get me started if the units are different). The only thing you can’t abuse is the direction of relationship from start to finish, assuming the data is a time series (meaning the x-axis represents dates/times).

If a graph has two x-axes and doesn’t have zero showing at the bottom of both axes, it’s a clear sign that the graphee has an agenda.

Now, sometimes you want to make as strong a case as possible, and sometimes you have a time series (which, in my opinion in the most proper time to use a two-axis graph, if one exists) rather than a bar chart comparing discrete variables or something, and the point of the graph is to reinforce the relationship within the series (upticks and downticks on the same days in the series, for instance), then a two-axis graph is a powerful visual tool.

For me, this means the person publishing the graph needs to have already made the point they are trying to prove clear (rather than using the graph as the smoking gun), and needs to have an already impeachable standing as far as statistical integrity. Needless to say, there aren’t many people whose graphs pass this test.

Dual axis graphs are a staple of finance presentations (investment bankers, sell-side firms, etc.) and publications with agendas, and it takes too much time to unravel the actual relationships shown in the data for them to be worth much.

Get more angry rants below:


Leave a Reply

Your email address will not be published. Required fields are marked *