Xiaohan Zeng, Andrea Lancichinetti
Edited by Nick Timkovich
For thousands of years, people thought that the Sun and the Moon were opposites: the Sun governs the day, and the Moon dominates the night. Surprisingly, this observation is common across many cultures. In Greek mythology, Apollo, the god of the Sun, and Artemis, the goddess of the Moon, are twin brother and sister. The Chinese word for the Sun literally means “the grand Yang”, and the Moon once had a name “the grand Yin.” This perception seems easy to understand, since the Sun rises at dawn and sets at dusk, while the Moon rises in the evening and sets in the morning.
Or do they?
An astronomer would give the answer easily, but not everyone spends every day studying the sky. One day Andrea, a fellow lab member, raised the question why the Moon rises in the evening. On one hand, the appeal of the “common sense” Sun-Moon duality is strong, and many people may think that the Moon doesn’t appear during the day. On the other hand, given that the orbital period of the Moon around the Earth is independent from that of the Earth around the Sun, logically there is no synchronization.
So we checked the model of the solar system. Of course, the model indicates that the Moon doesn’t always rise in the evening. We further plotted the distribution of the Moonrise time, which turns out to be quite uniform. The result might be a bit surprising, in that (1) the Moon doesn’t govern the evening as goes in the mythology, and (2) the power of the bias is really strong.
It is curious why there is the perception that the Moon rises in the evening. An obvious explanation is that during the day the Sun far outshines the Moon, so the probability of seeing the Moon during the evening is much higher than that of seeing it during the day. Eventually people form the expectation of seeing the Moon in the evening. But forming the expectation is just the first step; what comes next could be explained by selective memory and confirmation bias.
Confirmation bias states that people favor information that confirms their expectations. Since you generally expect to see the Moon in the evening, you remember evenings when you see the Moon, but NOT those when you don’t see it, or days when you see it. That is, you choose to remember the evenings where the observation confirms your expectation, reinforcing the bias and consequently your belief.
Selective memory happens often in our daily life. Take astrology for example. There is no scientific evidence for categorizing people according to the astrological signs. Yet, many people strongly believe in it, citing anecdotes where astrology works. While it does “work” occasionally, they choose to forget situations where astrology totally fails. Why fortune tellers are so good at predicting your behavior is not because they know the mystic rules that govern who you are, but because they have collected such a large sample of data that they can infer a lot of information based on your features. (To be honest, I would say many fortune tellers could be very good data scientists because they know the power of statistics!)
Another example is when it feels like you see a person more often after you know them. You suddenly begin to run into them so frequently that you suspect they are following you around. However, you simply remember the one day you see them and forget the other ninety-nine when you don’t.
In fact, confirmation bias may lead to the “proof by example” method. However, this method contradicts with the basic principles of statistics. Statistics tells us that always seeing the Moon after nightfall does not prove that the Moon only appears in the evening; what it actually tells us is that the Moon doesn’t only appear during the day. Put in statistical terms, we can reject the null hypothesis, but not prove the alternative hypothesis. Proof by example can be very dangerous. In research, a common mistake is to recognize and report experiments that confirm a theory, while ignoring those experiments which do not show the validity of the theory. This could easily lead to false discoveries.
As data scientists, we should exercise caution and consciously avoid confirmation bias (and worse, cherry-picking). We need to collect data in an unbiased way and perform statistical tests. When showing results, we should not only include examples in which the theory works, but also indicate to what extent the theory applies to all the data.