Abstract
The molecular and cellular basis of novelty is an active area of research in evolutionary biology. Until very recently, the vast majority of cellular phenomena were so difficult to sample that cross-species studies of biochemistry were rare and comparative analysis at the level of biochemical systems was almost impossible. Recent advances in systems biology are changing what is possible, however, and comparative phylogenetic methods that can handle this new data are wanted. Here, we introduce the term “phylogenetic latent variable models” (PLVMs, pronounced “plums”) for a class of models that has recently been used to infer the evolution of cellular states from systems-level molecular data, and develop a new parameterization and fitting strategy that is useful for comparative inference of biochemical networks. We deploy this new framework to infer the ancestral states and evolutionary dynamics of protein-interaction networks by analyzing >16,000 predominantly metazoan co-fractionation and affinity-purification mass spectrometry experiments. Based on these data, we estimate ancestral interactions across unikonts, broadly recovering protein complexes involved in translation, transcription, proteostasis, transport, and membrane trafficking. Using these results, we predict an ancient core of the Commander complex made up of CCDC22, CCDC93, C16orf62, and DSCR3, with more recent additions of COMMD-containing proteins in tetrapods. We also use simulations to develop model fitting strategies and discuss future model developments.
Our ability to probe the inner workings of cells is constantly growing. This is true not only for workhorse model organisms like fruit flies and brewer’s yeast, but increasingly for organisms whose biology is less well trodden—corals, butterflies, exotic plants and fungi, and even precious clinical samples are all fair game. However, the mathematical models that we use to compare biology across species and infer evolutionary dynamics have not kept pace. Sophisticated models exist for DNA and protein sequences, but models that can handle functional cellular data are in their infancy. In this study we introduce a new model that we use to infer the evolutionary history of protein interaction networks from cutting-edge high-throughput proteomics data. We use this model to reconstruct the cell biology of the ancestors we share with fungi and slime molds, and propose a path by which a recently described protein complex involved in human development might have evolved.
Collapse