Spurious correlations

compute-r

linear correlation coefficient

I don’t know if you’ve ever had to suffer through a statistics class. Perhaps you’ve taken a statistics class and didn’t consider it suffering at all. I have a friend who majored in statistics, on purpose (Hey, Claire!). Stranger things have happened.

In college I suffered through some statistics, and I don’t remember a lot, but I can at least recall the difference between correlation (the relationship between things that happen or change together) and causation (the act or process of causing something to happen or exist–thank you, Merriam-Webster).

When two things are correlated, it’s very tempting to conclude that one of them must be causing the other; hence the need for the old statistics refrain, “Correlation does not imply causation.”

You can be told this over and over–you can repeat it to yourself, and tape it to your mirror. But nothing brings the point home quite like a good batch of spurious correlations–things that track together closely, but that cannot possibly have a causal relationship. Such a batch I have for you today, courtesy of tylervigen.com.

mainemarg

Would you have predicted that these two things would track so closely? Following are a few of my favorites.

 

Don’t you want to try to make some sort of meaningful connections? It feels like there’s a beckoning hand at the opening of the path up ahead, down which lies the idea that eating a lot of cheese late at night makes you more likely to tangle with your bed linens.

 

cropped-cheese

a local wheel of cheese about the size of a snow tire

 

You can almost make a quasi-plausible argument (that’s got a rhythm to it, yes?) about youngsters whiling away endless hours in gaming arcades, which somehow manifests itself in amazing computer science talent (and the perseverance to write a dissertation). If that were the case, though, you’d expect to see a time lag of between 8 and 14 years, wouldn’t you? I mean, all those kids haunting arcades endlessly won’t be trying to arrange their PhD stoles properly until many years later.

 

arcade

 

How about picturing earnest spellers seeking out quiet corners in which to practice their spelling words, the same sorts of places attractive to many species of potent arachnids. But wait–it’s not participants of spelling bees who are being bitten by venomous spiders–it’s that long spelling bee words† coincide with lots of poisonous spider bites. Maybe there’s a code embedded in those long words that drives the spiders mad? Maybe the spiders are very literate–I’ve read Charlotte’s Web–I know how this could go.

spider

 

Constructing these lines of reasoning can seem like a lot of work, but our brains seem to crave it–we’re pattern-seeking, pattern-generating creatures. When we “recognize” patterns that aren’t there, we can draw conclusions that don’t make sense, and embark on courses of action that may not be in our best interests.

There is one conclusion that I feel fairly confident in drawing from these graphs: I think it’s likely that Tyler Vigen had access to a lot of trend lines that he could lay next to or on top of one another until he found some with very similar shapes. I also think he tended to favor pairings that were most likely to seem preposterous. What’s your guess?

My final thought here is this: I had no idea that there are data collections out there that can show me trends in death by venomous spider, or in length of spelling bee words. The world is an interesting place.

 

*Tyler Vigen says that he put these spurious correlations together to entertain, and to invite interest in statistics, not as a way of undermining trust in the usefulness of data analysis in general or correlations in particular. I’m sure he would be disturbed if anyone left his site thinking, “Well, now I know that statistics are as useful as a bunch of horsefeathers.”

†The winning word for 2016 was gesellschaft (a rationally developed mechanistic type of social relationship characterized by impersonally contracted associations between persons). I don’t know what it predicts in terms of spider bites.

[Images: mathbits.com, yours truly, arstechnica.com, thestar.com]

 

 

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Advertisements

2 thoughts on “Spurious correlations

  1. Love it! Actually, the reason I studied statistics was that I wanted to be an informed consumer, not having to rely on reports from others about “this must be causing/explaining that.” Though of course I love patterns and datasets (not baseball and insurance like most people think when statistics comes up) :). And I will say that understanding the data for myself is empowering! Fun stuff! Congratulations on 250 posts!

  2. Thanks, Claire. I think some of my statistics-related trauma comes from the confusion that dogged me regarding the particular type of analysis I was doing for my thesis. In a less fraught setting I think that statistics and I might get along better. And I definitely feel empowered when I can understand the data!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s