Clearly correlated - Blog

It started a while ago. For some time, I couldn’t quite put my finger on it, I wasn’t sure, but something was definitely off. And then it finally hit me (or maybe the idea finally reached the surface of my conscience, so to speak): correlation is not the same thing as causation. This is no groundbreaking discovery, of course, it is a very basic truth. However, my personal realization of it had profound implications for me. At the time as was trying to make sense of some data, exploring the possible connections between social networking and weight loss, and discussing about it with some colleges, and I realized that we had to be extremely careful about the claims we were gonna made.

As a consequence, I think I have become something of a fierce detractor of correlations happily interpreted as causality. They are everywhere! We are so used to seeing scatter plots with a nice and statistically significant linear fit attached to them, and to just jump to the conclusion that X directly affects Y. Just like that. Why wouldn’t we? I admit that this way of looking at things is comforting, somehow.

For people who come from hard sciences, like Physics, it is maybe even an understandable mistake: our experiments take place in a well-controlled environment. Thus, the only thing that usually affects the macroscopic behavior of electrons is the intensity of an electromagnetic field, the only thing that usually affects the speed of sound is the density of the medium it propagates in, and so on.

But things always get messier when trying to study complex systems, and especially social systems. First, because there is much more variability in the data. Since there are a million factors that can affect one particular behavior of (just normal, physically and mentally healthy) individuals, and we only know a few of them. And second, because there are always confounding factors and hidden variables that shouldn’t be swept under the rug. Thus, establishing a causal relationship between two variables shouldn’t be done lightly.

This is why I admire the work by Sinen Aral so much. He and his team were really concerned about disentangling social influence from homophily. The first one refers to the contagion of behavior from one individual to another by social contact, the second describes the effect of similar people tending to gather together. A lot of papers had been written, describing how certain behaviors (obesity, happiness, smoking,...) seem to be contagious, when what is happening most of the time is that the individuals in the population are similar to begin with, so they are more likely to all behave a certain way. What these researchers did was design a very smart set of product adoption experiments on Facebook, carefully controlling for things like a priori adoption probabilities vs. real peer influence and possible cross contamination due to common friends. In other words, if two friends have both similar demographics and adoption probabilities, then you can’t really assure that there is influence, when one after the other start using that new product. Instead, you have to admit that there is probably just a case of homophily. In their study, they found that usually this social influence is overestimated by 300 to 700%, which has a huge impact, for example for marketing purposes.

The bottom line here is that it is important to be careful about these issues, and keep thinking of ways to distinguishing them. Thus, even when temporal information is available, one should always try to test their influence-like hypotheses against all other less exciting but also possible alternative explanations.

http://xkcd.com/552/

– Julia Poncela-Casasnovas.