1
|
Goldenberg M, Mualem L, Shahar A, Snir S, Akavia A. Privacy-preserving biological age prediction over federated human methylation data using fully homomorphic encryption. Genome Res 2024; 34:1324-1333. [PMID: 39237299 PMCID: PMC11529865 DOI: 10.1101/gr.279071.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 08/07/2024] [Indexed: 09/07/2024]
Abstract
DNA methylation data play a crucial role in estimating chronological age in mammals, offering real-time insights into an individual's aging process. The epigenetic pacemaker (EPM) model allows inference of the biological age as deviations from the population trend. Given the sensitivity of this data, it is essential to safeguard both inputs and outputs of the EPM model. A privacy-preserving approach for EPM computation utilizing fully homomorphic encryption was recently introduced. However, this method has limitations, including having high communication complexity and being impractical for large data sets. The current work presents a new privacy-preserving protocol for EPM computation, analytically improving both privacy and complexity. Notably, we employ a single server for the secure computation phase while ensuring privacy even in the event of server corruption (compared to requiring two noncolluding servers in prior work). Using techniques from symbolic algebra and number theory, the new protocol eliminates the need for communication during secure computation, significantly improves asymptotic runtime, and offers better compatibility to parallel computing for further time complexity reduction. We implemented our protocol, demonstrating its ability to produce results similar to the standard (insecure) EPM model with substantial performance improvement compared to prior work. These findings hold promise for enhancing data security in medical applications where personal privacy is paramount. The generality of both the new approach and the EPM suggests that this protocol may be useful in other applications employing similar expectation-maximization techniques.
Collapse
Affiliation(s)
- Meir Goldenberg
- Department of Computer Science, The University of Haifa, Haifa 3103301, Israel;
| | - Loay Mualem
- Department of Computer Science, The University of Haifa, Haifa 3103301, Israel;
| | - Amit Shahar
- Department of Computer Science, The University of Haifa, Haifa 3103301, Israel;
| | - Sagi Snir
- Department of Evolutionary and Environmental Biology, The University of Haifa, Haifa 3103301, Israel
| | - Adi Akavia
- Department of Computer Science, The University of Haifa, Haifa 3103301, Israel;
| |
Collapse
|
2
|
Duchêne DA, Duchêne S, Stiller J, Heller R, Ho SYW. ClockstaRX: Testing Molecular Clock Hypotheses With Genomic Data. Genome Biol Evol 2024; 16:evae064. [PMID: 38526019 PMCID: PMC10999959 DOI: 10.1093/gbe/evae064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 01/11/2024] [Accepted: 03/21/2024] [Indexed: 03/26/2024] Open
Abstract
Phylogenomic data provide valuable opportunities for studying evolutionary rates and timescales. These analyses require theoretical and statistical tools based on molecular clocks. We present ClockstaRX, a flexible platform for exploring and testing evolutionary rate signals in phylogenomic data. Here, information about evolutionary rates in branches across gene trees is placed in Euclidean space, allowing data transformation, visualization, and hypothesis testing. ClockstaRX implements formal tests for identifying groups of loci and branches that make a large contribution to patterns of rate variation. This information can then be used to test for drivers of genomic evolutionary rates or to inform models for molecular dating. Drawing on the results of a simulation study, we recommend forms of data exploration and filtering that might be useful prior to molecular-clock analyses.
Collapse
Affiliation(s)
- David A Duchêne
- Center for Evolutionary Hologenomics, University of Copenhagen, Copenhagen 1352, Denmark
- Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen 1352, Denmark
| | - Sebastián Duchêne
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3010, Australia
| | - Josefin Stiller
- Villum Centre for Biodiversity Genomics, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Rasmus Heller
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen 2100, Denmark
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
3
|
Di Lena P, Sala C, Nardini C. Evaluation of different computational methods for DNA methylation-based biological age. Brief Bioinform 2022; 23:6632619. [PMID: 35794713 DOI: 10.1093/bib/bbac274] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/27/2022] [Accepted: 06/14/2022] [Indexed: 11/13/2022] Open
Abstract
In recent years there has been a widespread interest in researching biomarkers of aging that could predict physiological vulnerability better than chronological age. Aging, in fact, is one of the most relevant risk factors for a wide range of maladies, and molecular surrogates of this phenotype could enable better patients stratification. Among the most promising of such biomarkers is DNA methylation-based biological age. Given the potential and variety of computational implementations (epigenetic clocks), we here present a systematic review of such clocks. Furthermore, we provide a large-scale performance comparison across different tissues and diseases in terms of age prediction accuracy and age acceleration, a measure of deviance from physiology. Our analysis offers both a state-of-the-art overview of the computational techniques developed so far and a heterogeneous picture of performances, which can be helpful in orienting future research.
Collapse
Affiliation(s)
- Pietro Di Lena
- Department of Computer Science and Engineering, University of Bologna, Mura Anteo Zamboni 7, 40126 Bologna, Italy
| | - Claudia Sala
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Via Massarenti 9, 40138, Bologna, Italy
| | | |
Collapse
|
4
|
Hibernation slows epigenetic ageing in yellow-bellied marmots. Nat Ecol Evol 2022; 6:418-426. [PMID: 35256811 PMCID: PMC8986532 DOI: 10.1038/s41559-022-01679-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 01/20/2022] [Indexed: 01/02/2023]
Abstract
Species that hibernate generally live longer than would be expected based solely on their body size. Hibernation is characterized by long periods of metabolic suppression (torpor) interspersed by short periods of increased metabolism (arousal). The torpor–arousal cycles occur multiple times during hibernation, and it has been suggested that processes controlling the transition between torpor and arousal states cause ageing suppression. Metabolic rate is also a known correlate of longevity; we thus proposed the ‘hibernation–ageing hypothesis’ whereby ageing is suspended during hibernation. We tested this hypothesis in a well-studied population of yellow-bellied marmots (Marmota flaviventer), which spend 7–8 months per year hibernating. We used two approaches to estimate epigenetic age: the epigenetic clock and the epigenetic pacemaker. Variation in epigenetic age of 149 samples collected throughout the life of 73 females was modelled using generalized additive mixed models (GAMM), where season (cyclic cubic spline) and chronological age (cubic spline) were fixed effects. As expected, the GAMM using epigenetic ages calculated from the epigenetic pacemaker was better able to detect nonlinear patterns in epigenetic ageing over time. We observed a logarithmic curve of epigenetic age with time, where the epigenetic age increased at a higher rate until females reached sexual maturity (two years old). With respect to circannual patterns, the epigenetic age increased during the active season and essentially stalled during the hibernation period. Taken together, our results are consistent with the hibernation–ageing hypothesis and may explain the enhanced longevity in hibernators. Species that hibernate generally have longer lifespans than expected based on their body size. The authors show epigenetic ageing patterns from a natural population of hibernating yellow-bellied marmots consistent with the hypothesis that ageing is suspended during hibernation.
Collapse
|
5
|
Abstract
Phylogenetic trees inferred from sequence data often have branch lengths measured in the expected number of substitutions and therefore, do not have divergence times estimated. These trees give an incomplete view of evolutionary histories since many applications of phylogenies require time trees. Many methods have been developed to convert the inferred branch lengths from substitution unit to time unit using calibration points, but none is universally accepted as they are challenged in both scalability and accuracy under complex models. Here, we introduce a new method that formulates dating as a nonconvex optimization problem where the variance of log-transformed rate multipliers is minimized across the tree. On simulated and real data, we show that our method, wLogDate, is often more accurate than alternatives and is more robust to various model assumptions.
Collapse
Affiliation(s)
- Uyen Mai
- Department of Computer Science and Engineering, UC, San Diego, CA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC, San Diego, CA
| |
Collapse
|
6
|
Farrell C, Snir S, Pellegrini M. The Epigenetic Pacemaker: modeling epigenetic states under an evolutionary framework. Bioinformatics 2021; 36:4662-4663. [PMID: 32573701 DOI: 10.1093/bioinformatics/btaa585] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 06/11/2020] [Accepted: 06/15/2020] [Indexed: 01/30/2023] Open
Abstract
SUMMARY Epigenetic rates of change, much as evolutionary mutation rate along a lineage, vary during lifetime. Accurate estimation of the epigenetic state has vast medical and biological implications. To account for these non-linear epigenetic changes with age, we recently developed a formalism inspired by the Pacemaker model of evolution that accounts for varying rates of mutations with time. Here, we present a python implementation of the Epigenetic Pacemaker (EPM), a conditional expectation maximization algorithm that estimates epigenetic landscapes and the state of individuals and may be used to study non-linear epigenetic aging. AVAILABILITY AND IMPLEMENTATION The EPM is available at https://pypi.org/project/EpigeneticPacemaker/ under the MIT license. The EPM is compatible with python version 3.6 and above.
Collapse
Affiliation(s)
- Colin Farrell
- Department of Human Genetics, University of California, Los Angeles, CA, USA
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Matteo Pellegrini
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, CA, USA
| |
Collapse
|
7
|
Tsyvina V, Zelikovsky A, Snir S, Skums P. Inference of mutability landscapes of tumors from single cell sequencing data. PLoS Comput Biol 2020; 16:e1008454. [PMID: 33253159 PMCID: PMC7728263 DOI: 10.1371/journal.pcbi.1008454] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 12/10/2020] [Accepted: 10/20/2020] [Indexed: 11/18/2022] Open
Abstract
One of the hallmarks of cancer is the extremely high mutability and genetic instability of tumor cells. Inherent heterogeneity of intra-tumor populations manifests itself in high variability of clone instability rates. Analogously to fitness landscapes, the instability rates of clonal populations form their mutability landscapes. Here, we present MULAN (MUtability LANdscape inference), a maximum-likelihood computational framework for inference of mutation rates of individual cancer subclones using single-cell sequencing data. It utilizes the partial information about the orders of mutation events provided by cancer mutation trees and extends it by inferring full evolutionary history and mutability landscape of a tumor. Evaluation of mutation rates on the level of subclones rather than individual genes allows to capture the effects of genomic interactions and epistasis. We estimate the accuracy of our approach and demonstrate that it can be used to study the evolution of genetic instability and infer tumor evolutionary history from experimental data. MULAN is available at https://github.com/compbel/MULAN.
Collapse
Affiliation(s)
- Viachaslau Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, Georgia, United States of America
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, Georgia, United States of America
| | - Sagi Snir
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, Georgia, United States of America
| |
Collapse
|
8
|
Epigenetic pacemaker: closed form algebraic solutions. BMC Genomics 2020; 21:257. [PMID: 32299339 PMCID: PMC7161103 DOI: 10.1186/s12864-020-6606-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Background DNA methylation is widely used as a biomarker in crucial medical applications as well as for human age prediction of very high accuracy. This biomarker is based on the methylation status of several hundred CpG sites. In a recent line of publications we have adapted a versatile concept from evolutionary biology - the Universal Pacemaker (UPM) - to the setting of epigenetic aging and denoted it the Epigenetic PaceMaker (EPM). The EPM, as opposed to other epigenetic clocks, is not confined to specific pattern of aging, and the epigenetic age of the individual is inferred independently of other individuals. This allows an explicit modeling of aging trends, in particular non linear relationship between chronological and epigenetic age. In one of these recent works, we have presented an algorithmic improvement based on a two-step conditional expectation maximization (CEM) algorithm to arrive at a critical point on the likelihood surface. The algorithm alternates between a time step and a site step while advancing on the likelihood surface. Results Here we introduce non trivial improvements to these steps that are essential for analyzing data sets of realistic magnitude in a manageable time and space. These structural improvements are based on insights from linear algebra and symbolic algebra tools, providing us greater understanding of the degeneracy of the complex problem space. This understanding in turn, leads to the complete elimination of the bottleneck of cumbersome matrix multiplication and inversion, yielding a fast closed form solution in both steps of the CEM.In the experimental results part, we compare the CEM algorithm over several data sets and demonstrate the speedup obtained by the closed form solutions. Our results support the theoretical analysis of this improvement. Conclusions These improvements enable us to increase substantially the scale of inputs analyzed by the method, allowing us to apply the new approach to data sets that could not be analyzed before.
Collapse
|
9
|
Sevillya G, Adato O, Snir S. Detecting horizontal gene transfer: a probabilistic approach. BMC Genomics 2020; 21:106. [PMID: 32138652 PMCID: PMC7057450 DOI: 10.1186/s12864-019-6395-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 12/12/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Horizontal gene transfer (HGT) is the event of a DNA sequence being transferred between species not by inheritance. HGT is a crucial factor in prokaryotic evolution and is a significant source for genomic novelty resulting in antibiotic resistance or the outbreak of virulent strains. Detection of HGT and the mechanisms responsible and enabling it, is hence of prime importance.Existing algorithms rely on a strong phylogenetic signal distinguishing the transferred sequence from its recipient genome. Closely related species pose an even greater challenge as most genes are very similar and therefore, the phylogenetic signal is weak anyhow. Notwithstanding, the importance of detecting HGT between such organisms is extremely high for the role of HGT in the emergence of new highly virulent strains. RESULTS In a recent work we devised a novel technique that relies on loss of synteny around a gene as a witness for HGT. We used a novel heuristic for synteny measurement, SI (Syntent Index), and the technique was tested on both simulated and real data and was found to provide a greater sensitivity than other HGT techniques. This synteny-based approach suffers low specificity, in particular more closely related species. Here we devise an adaptive approach to cope with this by varying the criteria according to species distance. The new approach is doubly adaptive as it also considers the lengths of the genes being transferred. In particular, we use Chernoff bound to decree HGT both in simulations and real bacterial genomes taken from EggNog database. CONCLUSIONS Here we show empirically that this approach is more conservative than the previous χ2 based approach and provides a lower false positive rate, especially for closely related species and under wide range of genome parameters.
Collapse
Affiliation(s)
- Gur Sevillya
- Dept. of Evolutionary and Environmental Biology, University of Haifa, Haifa, 3498838, Israel
| | - Orit Adato
- Dept. of Evolutionary and Environmental Biology, University of Haifa, Haifa, 3498838, Israel
| | - Sagi Snir
- Dept. of Evolutionary and Environmental Biology, University of Haifa, Haifa, 3498838, Israel.
| |
Collapse
|
10
|
Duchêne DA, Tong KJ, Foster CSP, Duchêne S, Lanfear R, Ho SYW. Linking Branch Lengths across Sets of Loci Provides the Highest Statistical Support for Phylogenetic Inference. Mol Biol Evol 2019; 37:1202-1210. [DOI: 10.1093/molbev/msz291] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
AbstractEvolution leaves heterogeneous patterns of nucleotide variation across the genome, with different loci subject to varying degrees of mutation, selection, and drift. In phylogenetics, the potential impacts of partitioning sequence data for the assignment of substitution models are well appreciated. In contrast, the treatment of branch lengths has received far less attention. In this study, we examined the effects of linking and unlinking branch-length parameters across loci or subsets of loci. By analyzing a range of empirical data sets, we find consistent support for a model in which branch lengths are proportionate between subsets of loci: gene trees share the same pattern of branch lengths, but form subsets that vary in their overall tree lengths. These models had substantially better statistical support than models that assume identical branch lengths across gene trees, or those in which genes form subsets with distinct branch-length patterns. We show using simulations and empirical data that the complexity of the branch-length model with the highest support depends on the length of the sequence alignment and on the numbers of taxa and loci in the data set. Our findings suggest that models in which branch lengths are proportionate between subsets have the highest statistical support under the conditions that are most commonly seen in practice. The results of our study have implications for model selection, computational efficiency, and experimental design in phylogenomics.
Collapse
Affiliation(s)
- David A Duchêne
- Research School of Biology, Australian National University, Canberra, ACT, Australia
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - K Jun Tong
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - Charles S P Foster
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - Sebastián Duchêne
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC, Australia
| | - Robert Lanfear
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
11
|
Mello B, Schrago CG. The Estimated Pacemaker for Great Apes Supports the Hominoid Slowdown Hypothesis. Evol Bioinform Online 2019; 15:1176934319855988. [PMID: 31223232 PMCID: PMC6566470 DOI: 10.1177/1176934319855988] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 05/17/2019] [Indexed: 11/16/2022] Open
Abstract
The recent surge of genomic data has prompted the investigation of substitution rate variation across the genome, as well as among lineages. Evolutionary trees inferred from distinct genomic regions may display branch lengths that differ between loci by simple proportionality constants, indicating that rate variation follows a pacemaker model, which may be attributed to lineage effects. Analyses of genes from diverse biological clades produced contrasting results, supporting either this model or alternative scenarios where multiple pacemakers exist. So far, an evaluation of the pacemaker hypothesis for all great apes has never been carried out. In this work, we tested whether the evolutionary rates of hominids conform to pacemakers, which were inferred accounting for gene tree/species tree discordance. For higher precision, substitution rates in branches were estimated with a calibration-free approach, the relative rate framework. A predominant evolutionary trend in great apes was evidenced by the recovery of a large pacemaker, encompassing most hominid genomic regions. In addition, the majority of genes followed a pace of evolution that was closely related to the strict molecular clock. However, slight rate decreases were recovered in the internal branches leading to humans, corroborating the hominoid slowdown hypothesis. Our findings suggest that in great apes, life history traits were the major drivers of substitution rate variation across the genome.
Collapse
Affiliation(s)
- Beatriz Mello
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Carlos G Schrago
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
12
|
Snir S, Farrell C, Pellegrini M. Human epigenetic ageing is logarithmic with time across the entire lifespan. Epigenetics 2019; 14:912-926. [PMID: 31138013 DOI: 10.1080/15592294.2019.1623634] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Epigenetic changes during ageing have been characterized by multiple epigenetic clocks that allow the prediction of chronological age based on methylation status. Despite their accuracy and utility, epigenetic age biomarkers leave many questions about epigenetic ageing unanswered. Specifically, they do not permit the unbiased characterization of non-linear epigenetic ageing trends across entire life spans, a critical question underlying this field of research. Here we provide an integrated framework to address this question. Our model, inspired from evolutionary models, is able to account for acceleration/deceleration in epigenetic changes by fitting an individual's model age, the epigenetic age, which is related to chronological age in a non-linear fashion. Application of this model to DNA methylation data measured across broad age ranges, from before birth to old age, and from two tissue types, suggests a universal logarithmic trend characterizes epigenetic ageing across entire lifespans.
Collapse
Affiliation(s)
- Sagi Snir
- a Department of Evolutionary Biology, University of Haifa , Haifa , Israel
| | - Colin Farrell
- b Department of Molecular, Cell and Developmental Biology, University of California , Los Angeles , CA , USA
| | - Matteo Pellegrini
- b Department of Molecular, Cell and Developmental Biology, University of California , Los Angeles , CA , USA
| |
Collapse
|
13
|
Ali A, Melcher U. Modeling of Mutational Events in the Evolution of Viruses. Viruses 2019; 11:v11050418. [PMID: 31060293 PMCID: PMC6563203 DOI: 10.3390/v11050418] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 04/27/2019] [Accepted: 05/02/2019] [Indexed: 11/24/2022] Open
Abstract
Diverse studies of viral evolution have led to the recognition that the evolutionary rates of viral taxa observed are dependent on the time scale being investigated—with short-term studies giving fast substitution rates, and orders of magnitude lower rates for deep calibrations. Although each of these factors may contribute to this time dependent rate phenomenon, a more fundamental cause should be considered. We sought to test computationally whether the basic phenomena of virus evolution (mutation, replication, and selection) can explain the relationships between the evolutionary and phylogenetic distances. We tested, by computational inference, the hypothesis that the phylogenetic distances between the pairs of sequences are functions of the evolutionary path lengths between them. A Basic simulation revealed that the relationship between simulated genetic and mutational distances is non-linear, and can be consistent with different rates of nucleotide substitution at different depths of branches in phylogenetic trees.
Collapse
Affiliation(s)
- Akhtar Ali
- Department of Biological Sciences, University of Tulsa, Tulsa, OK 74104, USA.
| | - Ulrich Melcher
- Department of Biochemistry & Molecular Biology, Oklahoma State University, Stillwater, OK 74078-3035, USA.
| |
Collapse
|
14
|
Snir S, Pellegrini M. An epigenetic pacemaker is detected via a fast conditional expectation maximization algorithm. Epigenomics 2019; 10:695-706. [PMID: 29979108 DOI: 10.2217/epi-2017-0130] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
AIM DNA methylation has proven to be a remarkably accurate biomarker for human age, allowing the prediction of chronological age to within a couple of years. Recently, we proposed that the Universal PaceMaker (UPM), a flexible paradigm for modeling evolution, could be applied to epigenetic aging. Nevertheless, application to real data was restricted to small datasets for technical limitations. MATERIALS & METHODS We partition the set of variables into to two subsets and optimize the likelihood function on each set separately. This yields an extremely efficient Conditional Expectation Maximization algorithm, alternating between the two sets while increasing the overall likelihood. RESULTS Using the technique, we could reanalyze datasets of larger magnitude and show significant advantage to the UPM approach. CONCLUSION The UPM more faithfully models epigenetic aging than the time linear approach while methylated sites accelerate and decelerate jointly.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, 3498838, Israel
| | - Matteo Pellegrini
- Deptartment of Molecular, Cell & Developmental Biology, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
15
|
Abstract
Several studies have pointed out that the tight correlation between genes' evolutionary rate is better explained by a model denoted as the Universal PaceMaker (UPM) rather than by a simple rate constancy as manifested by the classical hypothesis of molecular clock (MC). Under UPM, each gene is associated with a single pacemaker (PM) and varies its evolutionary rate according to this PM ticks. Hence, the relative rates of all genes associated with the same PM remain nearly constant, whereas the absolute rates can change arbitrarily according to the PM ticks. A consequent question to that mentioned is finding the gene-PM association only from the gene sequence data. This, however, turns to be a nontrivial task and is affected by the number of variables, their random noise, and the amount of available information. To this end, a clustering heuristic was devised by exploiting the correlation between corresponding edge lengths across thousands of gene trees. Nevertheless, no theoretical study linking the relationship between the affecting parameters was done. We here study this question by providing theoretical bounds, expressed by the system parameters, on probabilities for positive and negative results. We corroborate these results by a simulation study that reveals the critical role of the variances.
Collapse
Affiliation(s)
- Sagi Snir
- The Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
16
|
Petitjean C, Makarova KS, Wolf YI, Koonin EV. Extreme Deviations from Expected Evolutionary Rates in Archaeal Protein Families. Genome Biol Evol 2018; 9:2791-2811. [PMID: 28985292 PMCID: PMC5737733 DOI: 10.1093/gbe/evx189] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/12/2017] [Indexed: 02/07/2023] Open
Abstract
Origin of new biological functions is a complex phenomenon ranging from single-nucleotide substitutions to the gain of new genes via horizontal gene transfer or duplication. Neofunctionalization and subfunctionalization of proteins is often attributed to the emergence of paralogs that are subject to relaxed purifying selection or positive selection and thus evolve at accelerated rates. Such phenomena potentially could be detected as anomalies in the phylogenies of the respective gene families. We developed a computational pipeline to search for such anomalies in 1,834 orthologous clusters of archaeal genes, focusing on lineage-specific subfamilies that significantly deviate from the expected rate of evolution. Multiple potential cases of neofunctionalization and subfunctionalization were identified, including some ancient, house-keeping gene families, such as ribosomal protein S10, general transcription factor TFIIB and chaperone Hsp20. As expected, many cases of apparent acceleration of evolution are associated with lineage-specific gene duplication. On other occasions, long branches in phylogenetic trees correspond to horizontal gene transfer across long evolutionary distances. Significant deceleration of evolution is less common than acceleration, and the underlying causes are not well understood; functional shifts accompanied by increased constraints could be involved. Many gene families appear to be “highly evolvable,” that is, include both long and short branches. Even in the absence of precise functional predictions, this approach allows one to select targets for experimentation in search of new biology.
Collapse
Affiliation(s)
- Celine Petitjean
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
17
|
Abstract
BACKGROUND Deciphering the history of life on Earth has long been regarded as one of the most central tasks in biology. In past years, widespread discordance between the evolutionary histories of different groups of orthologous genes of prokaryotes have been revealed, primarily due to horizontal gene transfers (HGTs). Nonetheless, evidence that support a strong tree-like signal of evolution have been uncovered, despite the presence of HGT events. Therefore, a challenging task is to distill this tree-like signal from the noise induced by all sources of non-tree-like events. RESULTS In this work we tackle this question, using real and simulated data. We first tighten a recent related theoretical result in this field. In a simulation study, we infer individual quartet topologies, and then use the inferred quartets to reconstruct simulated species trees. We demonstrate that accurate tree reconstruction is feasible despite surprisingly high rates of HGT. In a real data study, we construct phylogenies of two sets of prokaryotes, and show that our tree reconstruction scheme is comparable with (and complementary better than) other commonly used methods. CONCLUSIONS Using a blend of theoretical and empirical investigations, our study proves the feasibility of accurate quartet-based phylogenetic reconstruction, the vast impact of HGT events notwithstanding.
Collapse
Affiliation(s)
- Eliran Avni
- Department of Evolutionary Biology, University of Haifa, 199 Aba Khoushy Ave. Mount Carmel, Haifa, 3498838, Israel
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, 199 Aba Khoushy Ave. Mount Carmel, Haifa, 3498838, Israel.
| |
Collapse
|
18
|
Tong KJ, Duchêne S, Lo N, Ho SYW. The impacts of drift and selection on genomic evolution in insects. PeerJ 2017; 5:e3241. [PMID: 28462044 PMCID: PMC5410144 DOI: 10.7717/peerj.3241] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Accepted: 03/28/2017] [Indexed: 11/20/2022] Open
Abstract
Genomes evolve through a combination of mutation, drift, and selection, all of which act heterogeneously across genes and lineages. This leads to differences in branch-length patterns among gene trees. Genes that yield trees with the same branch-length patterns can be grouped together into clusters. Here, we propose a novel phylogenetic approach to explain the factors that influence the number and distribution of these gene-tree clusters. We apply our method to a genomic dataset from insects, an ancient and diverse group of organisms. We find some evidence that when drift is the dominant evolutionary process, each cluster tends to contain a large number of fast-evolving genes. In contrast, strong negative selection leads to many distinct clusters, each of which contains only a few slow-evolving genes. Our work, although preliminary in nature, illustrates the use of phylogenetic methods to shed light on the factors driving rate variation in genomic evolution.
Collapse
Affiliation(s)
- K Jun Tong
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia
| | - Sebastián Duchêne
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia.,Centre for Systems Genomics, University of Melbourne, Melbourne, Victoria, Australia
| | - Nathan Lo
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
19
|
Snir S. Ordered orthology as a tool in prokaryotic evolutionary inference. Mob Genet Elements 2017; 6:e1120576. [PMID: 28090377 DOI: 10.1080/2159256x.2015.1120576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Revised: 10/27/2015] [Accepted: 11/10/2015] [Indexed: 10/22/2022] Open
Abstract
Molecular data is accumulated at exponentially increasing pace. This deluge of information should have brought us closer to resolving one of the most fundamental issues in biology - deciphering the history of life on Earth. So far, however, this abundance of data only seems to blur our understanding of the problem. This is largely due to horizontal gene transfer (HGT), the transfer of genetic material between evolutionarily unrelated organisms that transforms the prokaryotic tree into a network of relationships. Recently, we developed a method to infer evolutionary relationships among closely related species where the conventional evolutionary markers do not provide a strong enough signal. The method relies on the loss of synteny, gene order conservation among species that provides a stronger signal, sufficient to classify even strains of a given species. Here we elaborate on this method and suggest further uses of it in the context of detecting HGT events and genome architecture.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary Biology, University of Haifa , Haifa, Israel
| |
Collapse
|
20
|
Snir S, vonHoldt BM, Pellegrini M. A Statistical Framework to Identify Deviation from Time Linearity in Epigenetic Aging. PLoS Comput Biol 2016; 12:e1005183. [PMID: 27835646 PMCID: PMC5106012 DOI: 10.1371/journal.pcbi.1005183] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 10/05/2016] [Indexed: 01/09/2023] Open
Abstract
In multiple studies DNA methylation has proven to be an accurate biomarker of age. To develop these biomarkers, the methylation of multiple CpG sites is typically linearly combined to predict chronological age. By contrast, in this study we apply the Universal PaceMaker (UPM) model to investigate changes in DNA methylation during aging. The UPM was initially developed to study rate acceleration/deceleration in sequence evolution. Rather than identifying which linear combinations of sites predicts age, the UPM models the rates of change of multiple CpG sites, as well as their starting methylation levels, and estimates the age of each individual to optimize the model fit. We refer to the estimated age as the “epigenetic age”, which is in contrast to the known chronological age of each individual. We construct a statistical framework and devise an algorithm to determine whether a genomic pacemaker is in effect (i.e rates of change vary with age). The decision is made by comparing two competing likelihood based models, the molecular clock (MC) and UPM. For the molecular clock model, we use the known chronological age of each individual and fit the methylation rates at multiple sites, and express the problem as a linear least squares and solve it in polynomial time. For the UPM case, the search space is larger as we are fitting both the epigenetic age of each individual as well as the rates for each site, yet we succeed to reduce the problem to the space of individuals and polynomial in the more significant space—the methylated sites. We first tested our algorithm on simulated data to elucidate the factors affecting the identification of the pacemaker model. We find that, provided with enough data, our algorithm is capable of identifying a pacemaker even when a weak signal is present in the data. Based on these results, we applied our method to DNA methylation data from human blood from individuals of various ages. Although the improvement in variance across sites between the UPM and MC was small, the results suggest that the existence of a pacemaker is highly significant. The PaceMaker results also suggest a decay in the rate of change in DNA methylation with age. DNA methylation is an important component of the epigenetic code that defines and maintains the state of cells. Recently, it has been found that certain sites in the genome undergo methylation changes at different rates during aging. The seminal work of Steve Horvath found that the methylation of a couple hundred CpG sites could be linearly combined to accurately predict the age of an individual in a number of tissues. Such a pattern resembles the Molecular Clock (MC) concept prevailing in molecular evolution, which suggests that there are sites in the genome that change linearly with age. In this work, we adapt the Universal PaceMaker (UPM) model to the setting of DNA methylation changes during aging. UPM relaxes the rate constancy of MC and was found to provide a better statistical explanation for genome evolution across the entire tree of life. This adaptation requires the solution of a complex optimization problem. Nevertheless, in a series of observations we show that the problem can be solved efficiently under the MC model and slightly less efficiently under the UPM model. This allows us to solve problems of non-trivial size. We chose as a proof of concept to analyze DNA methylation data collected from the blood of humans of different ages. Our results show that, similarly to genome evolution, the UPM provided an improvement of about 2% in the fit to the data. The statistical significance of this improvement is very high. Although tested on a small data set, this improvement demonstrates that the UPM more accurately captures age related DNA methylation changes than the MC model.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| | - Bridgett M. vonHoldt
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Matteo Pellegrini
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|
21
|
Duchêne S, Foster CSP, Ho SYW. Estimating the number and assignment of clock models in analyses of multigene datasets. Bioinformatics 2016; 32:1281-5. [DOI: 10.1093/bioinformatics/btw005] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 01/04/2016] [Indexed: 11/14/2022] Open
|
22
|
Adato O, Ninyo N, Gophna U, Snir S. Detecting Horizontal Gene Transfer between Closely Related Taxa. PLoS Comput Biol 2015; 11:e1004408. [PMID: 26439115 PMCID: PMC4595140 DOI: 10.1371/journal.pcbi.1004408] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 06/20/2015] [Indexed: 01/12/2023] Open
Abstract
Horizontal gene transfer (HGT), the transfer of genetic material between organisms, is crucial for genetic innovation and the evolution of genome architecture. Existing HGT detection algorithms rely on a strong phylogenetic signal distinguishing the transferred sequence from ancestral (vertically derived) genes in its recipient genome. Detecting HGT between closely related species or strains is challenging, as the phylogenetic signal is usually weak and the nucleotide composition is normally nearly identical. Nevertheless, there is a great importance in detecting HGT between congeneric species or strains, especially in clinical microbiology, where understanding the emergence of new virulent and drug-resistant strains is crucial, and often time-sensitive. We developed a novel, self-contained technique named Near HGT, based on the synteny index, to measure the divergence of a gene from its native genomic environment and used it to identify candidate HGT events between closely related strains. The method confirms candidate transferred genes based on the constant relative mutability (CRM). Using CRM, the algorithm assigns a confidence score based on “unusual” sequence divergence. A gene exhibiting exceptional deviations according to both synteny and mutability criteria, is considered a validated HGT product. We first employed the technique to a set of three E. coli strains and detected several highly probable horizontally acquired genes. We then compared the method to existing HGT detection tools using a larger strain data set. When combined with additional approaches our new algorithm provides richer picture and brings us closer to the goal of detecting all newly acquired genes in a particular strain. The transfer of genetic material between organisms, usually denoted as horizontal (or lateral) gene transfer (HGT or LGT), is a prime mechanism in microbial evolution and responsible for genetic innovation and the evolution of genome architecture. Detecting HGT between closely related species or strains is imperative as drug-resistant pathogenic strains most often acquire their virulence from closely related bacteria. The proposed method combines two evolutionary signals that were not employed in the past for this task. One is the synteny index (SI), measuring the loss of synteny in an organism, and the other is a novel concept—constant relative mutability (CRM), maintaining that genes preserve their relative evolution rate along linages (although the latter ones may each change). We show both in simulation and real biological data that the method is sound and, in the cases examined, provides stronger sensitivity than existing methods. We therefore believe this novel approach represents a significant advance, for the first time enabling the detection of previously ignored HGT events that will bring us closer to the goal of detecting all newly acquired genes in a particular strain. Availability: The method is publicly available at http://research.haifa.ac.il/~ssagi/software/nearHGT.zip
Collapse
Affiliation(s)
- Orit Adato
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Noga Ninyo
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Uri Gophna
- Department of Molecular Microbiology and Biotechnology Tel Aviv University, Tel-Aviv, Israel
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
- * E-mail:
| |
Collapse
|
23
|
Faure G, Koonin EV. Universal distribution of mutational effects on protein stability, uncoupling of protein robustness from sequence evolution and distinct evolutionary modes of prokaryotic and eukaryotic proteins. Phys Biol 2015; 12:035001. [PMID: 25927823 DOI: 10.1088/1478-3975/12/3/035001] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Robustness to destabilizing effects of mutations is thought of as a key factor of protein evolution. The connections between two measures of robustness, the relative core size and the computationally estimated effect of mutations on protein stability (ΔΔG), protein abundance and the selection pressure on protein-coding genes (dN/dS) were analyzed for the organisms with a large number of available protein structures including four eukaryotes, two bacteria and one archaeon. The distribution of the effects of mutations in the core on protein stability is universal and indistinguishable in eukaryotes and bacteria, centered at slightly destabilizing amino acid replacements, and with a heavy tail of more strongly destabilizing replacements. The distribution of mutational effects in the hyperthermophilic archaeon Thermococcus gammatolerans is significantly shifted toward strongly destabilizing replacements which is indicative of stronger constraints that are imposed on proteins in hyperthermophiles. The median effect of mutations is strongly, positively correlated with the relative core size, in evidence of the congruence between the two measures of protein robustness. However, both measures show only limited correlations to the expression level and selection pressure on protein-coding genes. Thus, the degree of robustness reflected in the universal distribution of mutational effects appears to be a fundamental, ancient feature of globular protein folds whereas the observed variations are largely neutral and uncoupled from short term protein evolution. A weak anticorrelation between protein core size and selection pressure is observed only for surface residues in prokaryotes but a stronger anticorrelation is observed for all residues in eukaryotic proteins. This substantial difference between proteins of prokaryotes and eukaryotes is likely to stem from the demonstrable higher compactness of prokaryotic proteins.
Collapse
Affiliation(s)
- Guilhem Faure
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
24
|
Duchêne S, Ho SYW. Mammalian genome evolution is governed by multiple pacemakers. ACTA ACUST UNITED AC 2015; 31:2061-5. [PMID: 25725495 DOI: 10.1093/bioinformatics/btv121] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Accepted: 02/20/2015] [Indexed: 11/14/2022]
Abstract
UNLABELLED Genomic evolution is shaped by a dynamic combination of mutation, selection and genetic drift. These processes lead to evolutionary rate variation across loci and among lineages. In turn, interactions between these two forms of rate variation can produce residual effects, whereby the pattern of among-lineage rate heterogeneity varies across loci. The nature of rate variation is encapsulated in the pacemaker models of genome evolution, which differ in the degree of importance assigned to residual effects: none (Universal Pacemaker), some (Multiple Pacemaker) or total (Degenerate Multiple Pacemaker). Here we use a phylogenetic method to partition the rate variation across loci, allowing comparison of these pacemaker models. Our analysis of 431 genes from 29 mammalian taxa reveals that rate variation across these genes can be explained by 13 pacemakers, consistent with the Multiple Pacemaker model. We find no evidence that these pacemakers correspond to gene function. Our results have important consequences for understanding the factors driving genomic evolution and for molecular-clock analyses. AVAILABILITY AND IMPLEMENTATION ClockstaR-G is freely available for download from github (https://github.com/sebastianduchene/clockstarg).
Collapse
Affiliation(s)
- Sebastián Duchêne
- School of Biological Sciences, University of Sydney, Sydney, NSW 2006, Australia
| | - Simon Y W Ho
- School of Biological Sciences, University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
25
|
Snir S. On the number of genomic pacemakers: a geometric approach. Algorithms Mol Biol 2014; 9:26. [PMID: 25648755 PMCID: PMC4301663 DOI: 10.1186/s13015-014-0026-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 11/11/2014] [Indexed: 11/13/2022] Open
Abstract
The universal pacemaker (UPM) model extends the classical molecular clock (MC) model, by allowing each gene, in addition to its individual intrinsic rate as in the MC, to accelerate or decelerate according to the universal pacemaker. Under UPM, the relative evolutionary rates of all genes remain nearly constant whereas the absolute rates can change arbitrarily. It was shown on several taxa groups spanning the entire tree of life that the UPM model describes the evolutionary process better than the MC model. In this work we provide a natural generalization to the UPM model that we denote multiple pacemakers (MPM). Under the MPM model every gene is still affected by a single pacemaker, however the number of pacemakers is not confined to one. Such a model induces a partition over the gene set where all the genes in one part are affected by the same pacemaker and task is to identify the pacemaker partition, or in other words, finding for each gene its associated pacemaker. We devise a novel heuristic procedure, relying on statistical and geometrical tools, to solve the problem and demonstrate by simulation that this approach can cope satisfactorily with considerable noise and realistic problem sizes. We applied this procedure to a set of over 2000 genes in 100 prokaryotes and demonstrated the significant existence of two pacemakers.
Collapse
|
26
|
Ho SYW, Duchêne S. Molecular-clock methods for estimating evolutionary rates and timescales. Mol Ecol 2014; 23:5947-65. [DOI: 10.1111/mec.12953] [Citation(s) in RCA: 225] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Revised: 09/29/2014] [Accepted: 09/30/2014] [Indexed: 11/29/2022]
Affiliation(s)
- Simon Y. W. Ho
- School of Biological Sciences; University of Sydney; Sydney NSW 2006 Australia
| | - Sebastián Duchêne
- School of Biological Sciences; University of Sydney; Sydney NSW 2006 Australia
| |
Collapse
|
27
|
Ho SYW. The changing face of the molecular evolutionary clock. Trends Ecol Evol 2014; 29:496-503. [PMID: 25086668 DOI: 10.1016/j.tree.2014.07.004] [Citation(s) in RCA: 100] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Revised: 07/03/2014] [Accepted: 07/08/2014] [Indexed: 11/30/2022]
Abstract
The molecular clock has played an important role in biological research, both as a description of the evolutionary process and as a tool for inferring evolutionary timescales. Genomic data have provided valuable insights into the molecular clock, allowing the patterns and causes of evolutionary rate variation to be characterized in increasing detail. I explain how genome sequences offer exciting opportunities for estimating the timescale of the Tree of Life. I describe the different approaches that have been used to deal with the computational and statistical challenges encountered in molecular clock analyses of genomic data. Finally, I offer a perspective on the future of molecular clocks, highlighting some of the key limitations and the most promising research directions.
Collapse
Affiliation(s)
- Simon Y W Ho
- School of Biological Sciences, University of Sydney, Sydney, NSW, Australia.
| |
Collapse
|
28
|
Snir S, Wolf YI, Koonin EV. Universal pacemaker of genome evolution in animals and fungi and variation of evolutionary rates in diverse organisms. Genome Biol Evol 2014; 6:1268-78. [PMID: 24812293 PMCID: PMC4079209 DOI: 10.1093/gbe/evu091] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Gene evolution is traditionally considered within the framework of the molecular clock (MC) model whereby each gene is characterized by an approximately constant rate of evolution. Recent comparative analysis of numerous phylogenies of prokaryotic genes has shown that a different model of evolution, denoted the Universal PaceMaker (UPM), which postulates conservation of relative, rather than absolute evolutionary rates, yields a better fit to the phylogenetic data. Here, we show that the UPM model is a better fit than the MC for genome wide sets of phylogenetic trees from six species of Drosophila and nine species of yeast, with extremely high statistical significance. Unlike the prokaryotic phylogenies that include distant organisms and multiple horizontal gene transfers, these are simple data sets that cover groups of closely related organisms and consist of gene trees with the same topology as the species tree. The results indicate that both lineage-specific and gene-specific rates are important in genome evolution but the lineage-specific contribution is greater. Similar to the MC, the gene evolution rates under the UPM are strongly overdispersed, approximately 2-fold compared with the expectation from sampling error alone. However, we show that neither Drosophila nor yeast genes form distinct clusters in the tree space. Thus, the gene-specific deviations from the UPM, although substantial, are uncorrelated and most likely depend on selective factors that are largely unique to individual genes. Thus, the UPM appears to be a key feature of genome evolution across the history of cellular life.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary and Environmental Biology and The Institute of Evolution, University of Haifa, Israel
| | - Yuri I Wolf
- National Center for Biotechnology Information, NLM, National Institutes of Health, Bethesda, MD
| | - Eugene V Koonin
- National Center for Biotechnology Information, NLM, National Institutes of Health, Bethesda, MD
| |
Collapse
|
29
|
Wolf YI, Snir S, Koonin EV. Stability along with extreme variability in core genome evolution. Genome Biol Evol 2013; 5:1393-402. [PMID: 23821522 PMCID: PMC3730350 DOI: 10.1093/gbe/evt098] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The shape of the distribution of evolutionary distances between orthologous genes in pairs of closely related genomes is universal throughout the entire range of cellular life forms. The near invariance of this distribution across billions of years of evolution can be accounted for by the Universal Pace Maker (UPM) model of genome evolution that yields a significantly better fit to the phylogenetic data than the Molecular Clock (MC) model. Unlike the MC, the UPM model does not assume constant gene-specific evolutionary rates but rather postulates that, in each evolving lineage, the evolutionary rates of all genes change (approximately) in unison although the pacemakers of different lineages are not necessarily synchronized. Here, we dissect the nearly constant evolutionary rate distribution by comparing the genome-wide relative rates of evolution of individual genes in pairs or triplets of closely related genomes from diverse bacterial and archaeal taxa. We show that, although the gene-specific relative rate is an important feature of genome evolution that explains more than half of the variance of the evolutionary distances, the ranges of relative rate variability are extremely broad even for universal genes. Because of this high variance, the gene-specific rate is a poor predictor of the conservation rank for any gene in any particular lineage.
Collapse
Affiliation(s)
- Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | |
Collapse
|
30
|
Puigbò P, Wolf YI, Koonin EV. Seeing the Tree of Life behind the phylogenetic forest. BMC Biol 2013; 11:46. [PMID: 23587361 PMCID: PMC3626908 DOI: 10.1186/1741-7007-11-46] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Accepted: 04/12/2013] [Indexed: 02/08/2023] Open
Affiliation(s)
- Pere Puigbò
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | |
Collapse
|
31
|
|