Evolution of protein families: Is it possible to distinguish between domains of life?
Understanding evolutionary relationships between species can shed new light into the rooting of the tree of life and the origin of eukaryotes, thus, resulting in a long standing interest in accurately assessing evolutionary parameters at time scales on the order of a billion of years. Prior work suggests large variability in molecular substitution rates, however, we still do not know whether such variability is due to species-specific trends at a genomic scale, or whether it can be attributed to the fluctuations inherent in any stochastic process. Here, we study the statistical properties of gene and protein-family sizes in order to quantify the long time scale evolutionary differences and similarities across species. We first determine the protein families of 209 species of bacteria and 20 species of archaea. We find that we are unable to reject the null hypothesis that the protein-family sizes of these species are drawn from the same distribution. In addition, we find that for species classified in the same phylogenetic branch or in the same lifestyle group, family size distributions are not significantly more similar than for species in different branches. These two findings can be accounted for in terms of a dynamical birth, death, and innovation model that assumes identical protein-family evolutionary rates for all species. Our theoretical and empirical results thus strongly suggest that the variability empirically observed in protein-family size distributions is compatible with the expected stochastic fluctuations for an evolutionary process with identical genomic evolutionary rates. Our findings hold special importance for the plausibility of some theories of the origin of eukaryotes which require drastic changes in evolutionary rates for some period during the last 2 billion years. (C) 2007 Elsevier B.V. All rights reserved.