I don’t know if you’ve ever had to suffer through a statistics class. Perhaps you’ve taken a statistics class and didn’t consider it suffering at all. I have a friend who majored in statistics, on purpose (Hey, Claire!). Stranger things have happened.
In college I suffered through some statistics, and I don’t remember a lot, but I can at least recall the difference between correlation (the relationship between things that happen or change together) and causation (the act or process of causing something to happen or exist–thank you, Merriam-Webster).
When two things are correlated, it’s very tempting to conclude that one of them must be causing the other; hence the need for the old statistics refrain, “Correlation does not imply causation.”
You can be told this over and over–you can repeat it to yourself, and tape it to your mirror. But nothing brings the point home quite like a good batch of spurious correlations–things that track together closely, but that cannot possibly have a causal relationship. Such a batch I have for you today, courtesy of tylervigen.com.
Would you have predicted that these two things would track so closely? Following are a few of my favorites.
Don’t you want to try to make some sort of meaningful connections? It feels like there’s a beckoning hand at the opening of the path up ahead, down which lies the idea that eating a lot of cheese late at night makes you more likely to tangle with your bed linens.
You can almost make a quasi-plausible argument (that’s got a rhythm to it, yes?) about youngsters whiling away endless hours in gaming arcades, which somehow manifests itself in amazing computer science talent (and the perseverance to write a dissertation). If that were the case, though, you’d expect to see a time lag of between 8 and 14 years, wouldn’t you? I mean, all those kids haunting arcades endlessly won’t be trying to arrange their PhD stoles properly until many years later.
How about picturing earnest spellers seeking out quiet corners in which to practice their spelling words, the same sorts of places attractive to many species of potent arachnids. But wait–it’s not participants of spelling bees who are being bitten by venomous spiders–it’s that long spelling bee words† coincide with lots of poisonous spider bites. Maybe there’s a code embedded in those long words that drives the spiders mad? Maybe the spiders are very literate–I’ve read Charlotte’s Web–I know how this could go.
Constructing these lines of reasoning can seem like a lot of work, but our brains seem to crave it–we’re pattern-seeking, pattern-generating creatures. When we “recognize” patterns that aren’t there, we can draw conclusions that don’t make sense, and embark on courses of action that may not be in our best interests.
There is one conclusion that I feel fairly confident in drawing from these graphs: I think it’s likely that Tyler Vigen had access to a lot of trend lines that he could lay next to or on top of one another until he found some with very similar shapes. I also think he tended to favor pairings that were most likely to seem preposterous. What’s your guess?
My final thought here is this: I had no idea that there are data collections out there that can show me trends in death by venomous spider, or in length of spelling bee words. The world is an interesting place.
*Tyler Vigen says that he put these spurious correlations together to entertain, and to invite interest in statistics, not as a way of undermining trust in the usefulness of data analysis in general or correlations in particular. I’m sure he would be disturbed if anyone left his site thinking, “Well, now I know that statistics are as useful as a bunch of horsefeathers.”
†The winning word for 2016 was gesellschaft (a rationally developed mechanistic type of social relationship characterized by impersonally contracted associations between persons). I don’t know what it predicts in terms of spider bites.
[Images: mathbits.com, yours truly, arstechnica.com, thestar.com]