Expert evaluation vs. citation analysis

The recent Nature editorial “Experts still needed” (Nature, vol 457,
pp. 7-8, 2009) made me smile with amusement. There is no question that there is still a significant lack of understanding
of what bibliometric measures actually measure, and that the heavy,
simple-minded, use of such metrics for evaluation of disciplines,
nations, organizations, or scientists is quite dangerous.

At the same time, I cannot fail to notice the absence of
research on those topics in the pages of the Nature, Science, or
PNAS. Considering the practical and fundamental importance of the
matter, its intrinsic complexity, and the current state of our
understanding, I would expect scientometrics to by a high-priority in
the mind of the editors of the top journals.

Concerning the opinions expressed in the editorial, I could not fail to wonder about the
practical difficulties involved in assuring that expert assessment
yields reliable, accurate results. Among the issues with expert
assessment, I would just mention the different scales used by
reviewers and the possibility of bias either due to personal
connections or to the reviewers own research interests. In fact, if one
understood citation dynamics, one would be able to view citations as a
proxy for a large-scale expert-review exercise.

The difficulty, of course, is our current understanding of citation
dynamics; a fact clearly illustrated by the example in the editorial.
I would note a few ameliorating facts that make the difference in
citations of the two papers less astonishing (the basis for these
observations can be found in Stringer et al. PLoS One 3, e1638, 2008).
First, the two papers were published too recently for their ultimate
number of citations to be evident. Second, the more cited paper is
1.5 times older, a condition that makes a huge difference because these papers are still accelerating
their rate of accruing citations. Third,
citation growth for highly cited papers has a multiplicative nature, which
means that a more reliable measure is the logarithm of the number
of citations.

Maybe the “Encode” paper will end up with 10 thousand citations and
the “proton pump” paper will end up with “only” 2 thousand. Would
citations have done such a bad job of determining the relative
importance of the two papers? And can one be really sure that the
“Encode” paper is not going to foster more significant breakthroughs?

Luis Amaral