Scientific Utopias: Or How I Learned to Stop Worrying and Love the h-index

People love ranking things. And why shouldn’t they? My introduction to this topic was probably similar to most 8 year old boys: who is the best (baseball / insert-your-favorite-sport-here) player? A seemingly trivial question, it’s obviously whoever hits the most home runs, right? Or is it who has the highest batting average? Who was selected as the Most Valuable Player? The answer, of course, depends entirely on your value judgments. Yet, at the same time, we know that some metrics are clearly better than others: only the most foolish baseball scout would evaluate someone on their batting average rather than their slugging percentage (and only the most foolish gambler would trust the opinion of the scouts over the data, but I digress…).

Maybe you don’t care about baseball (I sure don’t anymore). But I do care about what I consider to be important problems such as how the National Institute of Health decides to spend its $30 billion budget. In the sciences, there has been constant backlash against the idea of “rating” or “ranking” scientists, institutions, and academic journals among others. It’s understandable why most people are resistant to being ranked… lots of them will look bad (you could even say that 50% of them will fall below the median!). But the key to understand ranking is to recognize that science has a severe problem of scarce resources in terms of grant money and faculty openings (to name two of the 500 pound gorillas in the room). Someone will get a job, someone will win a grant, and others quite frankly won’t. So we have a system where we’re absolutely, positively guaranteed to have to rank proposals/individuals (unless we’re happy with a random number generator). The question then becomes: how should we rank?

Jorge Hirsch came up with the h-index in 2005 and it has since become both ubiquitous and universally hated by most scientists. To be fair, the h-index has lots of shortcomings that we’ll get to. But it also has quite a few advantages such as its straightforward interpretation and ease of calculation. Yet every year, what seems like 50 different papers are published in journals of information sciences that propose some slight changes or new way of looking at citations as a ranking metric. Some of these even manage to make it into top-tier journals (quantified by their Impact Factor, of course) though the merits of these papers and their proposed indicators are essentially indistinguishable from every other one (the occasion for todays writing: YAWN ).

In the glory days before the h-index and other indicators (and still pervasive to this day, I might add), grants were decided not by some algorithm but rather by people who evaluated applicants holistically and on their merits (insert laugh track here). Of course, these “people” that I’m referring to were in reality groups of friends at the same elite institutions who knew one another intimately. Reviewers and faculty hiring committees knew the institutions where applicants came from, recognized and discriminated against female and non-white sounding names, and allowed themselves to be especially skeptical of anyone they didn’t know (see this great recent article for a discussion of these very issues). Of course, no one would ever admit to the more inflammatory of those claims. But they didn’t have to, the data bore it out in practice. Let’s just leave that subject at the fact that study after study has consistently shown that scientific institutions used to be (and are becoming admirably less so) blatantly discriminatory towards under-represented groups.

So I’m a cynic (my favorite synonym for realist); I believe that committees of people in power tend to, nay, will absolutely under any/all circumstances use that power towards their own biased personal ends. Of course, among the many complaints about the h-index is that it is biased as well. My retort is simply that any modestly competent person, after reading a few papers to help them understand the limitations of this metric (gasp), can see that the method wears that bias on it’s sleeve loud and clear: older researchers get an enormous leg up, citations aren’t even close to being normally distributed, scientists inflate their h-indices with self-citations and review articles, etc.

But the point is that we know these biases, and we can use the h-index as a tool while recognizing these limitations. The problem that I have with you is that I don’t know your biases. I’m not sure what your agenda is but the one thing that I am absolutely sure of is that you are a human with an agenda that will either consciously or unconsciously rear its ugly head and lead you to discriminate against the people who aren’t like you, the people who you don’t know, and the people who won’t help your own career or those of your friends somehow (all that without even mentioning how you’ll treat your direct competitors…).

So we started with scarcity and the recognition that every grant reviewer/faculty hiring committee by necessity must go through some intellectual process of ranking individuals – full stop while you digest that for a minute. My argument is simply that we should at very least strive to codify that process. A moonlighting techno-utopian though I may be, I recognize that we’re doomed to never achieve perfection in any metric, but by explicitly spelling out our system, we can iterate and improve. And if we can’t minimize the biases, we can at least put those biases front and center so that everyone can swallow our metric with the heaping spoonful of salt that it deserves. Why settle for one metric? No sane person would. Any metric is simply a useful tool to aid in the process of evaluating someone, in the same way that batting average and home runs are complementary measurements of baseball talent. Don’t fault the metric for reducing someone to digits on the page, fault the Cassandra’s who cry that “the number is flawed and shouldn’t be all that we care about”. It’s a red-herring argument and no one believes that you could quantify a baseball player or a scientist with a single magical number. But I’d much rather be a practicing scientist in a world where my work is evaluated in a data-driven manner than one where an “unbiased” committee of scientific elders dole out their decisions based “on the merits of my work”. Because really, isn’t the latter a far more unlikely utopian future?

The h-index isn’t boiling a person down into a single number, we are. Also, people in power are the worst and will act as such even if they’re scientists