1
|
Santourlidis S. Phylo-Epigenetics in Phylogeny Analyses and Evolution. Genes (Basel) 2024; 15:1198. [PMID: 39336789 PMCID: PMC11430929 DOI: 10.3390/genes15091198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/09/2024] [Accepted: 09/10/2024] [Indexed: 09/30/2024] Open
Abstract
Long-standing, continuous blurring and controversies in the field of phylogenetic interspecies relations, associated with insufficient explanations for dynamics and variability of speeds of evolution in mammals, hint at a crucial missing link. It has been suggested that transgenerational epigenetic inheritance and the concealed mechanisms behind play a distinct role in mammalian evolution. Here, a comprehensive sequence alignment approach in hominid species, i.e., Homo sapiens, Homo neanderthalensis, Denisovan human, Pan troglodytes, Pan paniscus, Gorilla gorilla, and Pongo pygmaeus, comprising conserved CpG islands of housekeeping genes, uncover evidence for a distinct variability of CpG dinucleotides. Applying solely these evolutionary consistent and inconsistent CpG sites in a classic phylogenetic analysis, calibrated by the divergence time point of the common chimpanzee (P. troglodytes) and the bonobo or pygmy chimpanzee (P. paniscus), a "phylo-epigenetic" tree has been generated, which precisely recapitulates branch points and branch lengths, i.e., divergence events and relations, as they have been broadly suggested in the current literature, based on comprehensive molecular phylogenomics and fossil records of many decades. It is suggested here that CpG dinucleotide changes at CpG islands are of superior importance for evolutionary developments. These changes are successfully inherited through the germ line, determining emerging methylation profiles, and they are a central component of transgenerational epigenetic inheritance. It is hidden in the DNA, what will happen on it later.
Collapse
Affiliation(s)
- Simeon Santourlidis
- Epigenetics Laboratory for Human Health and Longevity, Institute of Transplantation Diagnostics and Cell Therapeutics, Medical Faculty, Heinrich-Heine University Duesseldorf, Moorenstr. 5, 40225 Duesseldorf, Germany
| |
Collapse
|
2
|
Paradis E, Claramunt S, Brown J, Schliep K. Confidence intervals in molecular dating by maximum likelihood. Mol Phylogenet Evol 2023; 178:107652. [PMID: 36306994 DOI: 10.1016/j.ympev.2022.107652] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Revised: 10/11/2022] [Accepted: 10/19/2022] [Indexed: 11/06/2022]
Abstract
Molecular dating has been widely used to infer the times of past evolutionary events using molecular sequences. This paper describes three bootstrap methods to infer confidence intervals under a penalized likelihood framework. The basic idea is to use data pseudoreplicates to infer uncertainty in the branch lengths of a phylogeny reconstructed with molecular sequences. The three specific bootstrap methods are nonparametric (direct tree bootstrapping), semiparametric (rate smoothing), and parametric (Poisson simulation). Our extensive simulation study showed that the three methods perform generally well under a simple strict clock model of molecular evolution; however, the results were less positive with data simulated using an uncorrelated or a correlated relaxed clock model. Several factors impacted, possibly in interaction, the performance of the confidence intervals. Increasing the number of calibration points had a positive effect, as well as increasing the sequence length or the number of sequences although both latter effects depended on the model of evolution. A case study is presented with a molecular phylogeny of the Felidae (Mammalia: Carnivora). A comparison was made with a Bayesian analysis: the results were very close in terms of confidence intervals and there was no marked tendency for an approach to produce younger or older bounds compared to the other.
Collapse
Affiliation(s)
| | - Santiago Claramunt
- Department of Natural History, Royal Ontario Museum, Toronto, ON 5S2C6, Canada
| | - Joseph Brown
- Department of Natural History, Royal Ontario Museum, Toronto, ON 5S2C6, Canada
| | - Klaus Schliep
- Institute of Computational Biotechnology, Technology University Graz, Austria
| |
Collapse
|
3
|
Lamarca AP, Mello B, Schrago CG. The performance of outgroup-free rooting under evolutionary radiations. Mol Phylogenet Evol 2022; 169:107434. [PMID: 35143961 DOI: 10.1016/j.ympev.2022.107434] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 01/07/2022] [Accepted: 01/25/2022] [Indexed: 11/18/2022]
Abstract
Tree rooting implies a temporal dimension to phylogenies. Only after defining the position of the root node is that the ancestral-descendant relationship between branches can be fully deduced. Rooting has been usually carried out by employing evolutionarily close outgroup lineages, which is a drawback when these lineages are unavailable or unknown. Alternatively, outgroup-free rooting methods were proposed, which rely on the constancy of evolutionary rates to varying degrees. In this work we analyzed the performance of two of these methods, the midpoint rooting (MPR) and the minimal ancestor deviation (MAD), in rooting topologies evolved under challenging scenarios of fast evolutionary radiations derived from empirical data, characterized by short internal branches near the crown node. Considering all branch length combinations investigated, both methods exhibited average success rates below 50%, although MAD slightly outperformed MPR. Moreover, tree balance significantly impacted the relative performance of the methods. We found that, in four-taxa unrooted trees, the outcome of whether both methodologies will correctly root the tree can be roughly predicted by two simple dimensionless metrics: the coefficient of variation of the external branch lengths, and the ratio between the internal branch length to the total sum of branch lengths, which were employed to devise a general linear model that allowed calculating the probability of correct placing the root node for any four-taxa tree. We predicted that the performance of both outgroup-free rooting methods on loci representing the placental mammal radiation ranged between 50% and 75%.
Collapse
Affiliation(s)
| | - Beatriz Mello
- Department of Genetics, Federal University of Rio de Janeiro, RJ, Brazil
| | - Carlos G Schrago
- Department of Genetics, Federal University of Rio de Janeiro, RJ, Brazil.
| |
Collapse
|
4
|
Spectrum of Protein Location in Proteomes Captures Evolutionary Relationship Between Species. J Mol Evol 2021; 89:544-553. [PMID: 34328525 PMCID: PMC8379119 DOI: 10.1007/s00239-021-10022-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Accepted: 07/16/2021] [Indexed: 11/10/2022]
Abstract
The native subcellular location (also referred to as localization or cellular compartment) of a protein is the one in which it acts most frequently; it is one aspect of protein function. Do ten eukaryotic model organisms differ in their location spectrum, i.e., the fraction of its proteome in each of seven major cellular compartments? As experimental annotations of locations remain biased and incomplete, we need prediction methods to answer this question. After systematic bias corrections, the complete but faulty prediction methods appeared to be more appropriate to compare location spectra between species than the incomplete more accurate experimental data. This work compared the location spectra for ten eukaryotes: Homo sapiens (human), Gorilla gorilla (gorilla), Pan troglodytes (chimpanzee), Mus musculus (mouse), Rattus norvegicus (rat), Drosophila melanogaster (fruit/vinegar fly), Anopheles gambiae (African malaria mosquito), Caenorhabitis elegans (nematode), Saccharomyces cerevisiae (baker’s yeast), and Schizosaccharomyces pombe (fission yeast). The two largest classes were predicted to be the nucleus and the cytoplasm together accounting for 47–62% of all proteins, while 7–21% of the proteins were predicted in the plasma membrane and 4–15% to be secreted. Overall, the predicted location spectra were largely similar. However, in detail, the differences sufficed to plot trees (UPGMA) and 2D (PCA) maps relating the ten organisms using a simple Euclidean distance in seven states (location classes). The relations based on the simple predicted location spectra captured aspects of cross-species comparisons usually revealed only by much more detailed evolutionary comparisons. Most interestingly, known phylogenetic relations were reproduced better by paralog-only than by ortholog-only trees.
Collapse
|
5
|
Kumar PS, Dabdoub SM, Ganesan SM. Probing periodontal microbial dark matter using metataxonomics and metagenomics. Periodontol 2000 2020; 85:12-27. [PMID: 33226714 DOI: 10.1111/prd.12349] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Our view of the periodontal microbial community has been shaped by a century or more of cultivation-based and microscopic investigations. While these studies firmly established the infection-mediated etiology of periodontal diseases, it was apparent from the very early days that periodontal microbiology suffered from what Staley and Konopka described as the "great plate count anomaly", in that these culturable bacteria were only a minor part of what was visible under the microscope. For nearly a century, much effort has been devoted to finding the right tools to investigate this uncultivated majority, also known as "microbial dark matter". The discovery that DNA was an effective tool to "see" microbial dark matter was a significant breakthrough in environmental microbiology, and oral microbiologists were among the earliest to capitalize on these advances. By identifying the order in which nucleotides are arranged in a stretch of DNA (DNA sequencing) and creating a repository of these sequences, sequence databases were created. Computational tools that used probability-driven analysis of these sequences enabled the discovery of new and unsuspected species and ascribed novel functions to these species. This review will trace the development of DNA sequencing as a quantitative, open-ended, comprehensive approach to characterize microbial communities in their native environments, and explore how this technology has shifted traditional dogmas on how the oral microbiome promotes health and its role in disease causation and perpetuation.
Collapse
Affiliation(s)
- Purnima S Kumar
- Department of Periodontology, College of Dentistry, The Ohio State University, Columbus, Ohio, USA
| | - Shareef M Dabdoub
- Department of Periodontology, College of Dentistry, The Ohio State University, Columbus, Ohio, USA
| | - Sukirth M Ganesan
- Department of Periodontics, College of Dentistry and Dental Clinics, The University of Iowa, Iowa City, Iowa, USA
| |
Collapse
|
6
|
Epigenetic pacemaker: closed form algebraic solutions. BMC Genomics 2020; 21:257. [PMID: 32299339 PMCID: PMC7161103 DOI: 10.1186/s12864-020-6606-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Background DNA methylation is widely used as a biomarker in crucial medical applications as well as for human age prediction of very high accuracy. This biomarker is based on the methylation status of several hundred CpG sites. In a recent line of publications we have adapted a versatile concept from evolutionary biology - the Universal Pacemaker (UPM) - to the setting of epigenetic aging and denoted it the Epigenetic PaceMaker (EPM). The EPM, as opposed to other epigenetic clocks, is not confined to specific pattern of aging, and the epigenetic age of the individual is inferred independently of other individuals. This allows an explicit modeling of aging trends, in particular non linear relationship between chronological and epigenetic age. In one of these recent works, we have presented an algorithmic improvement based on a two-step conditional expectation maximization (CEM) algorithm to arrive at a critical point on the likelihood surface. The algorithm alternates between a time step and a site step while advancing on the likelihood surface. Results Here we introduce non trivial improvements to these steps that are essential for analyzing data sets of realistic magnitude in a manageable time and space. These structural improvements are based on insights from linear algebra and symbolic algebra tools, providing us greater understanding of the degeneracy of the complex problem space. This understanding in turn, leads to the complete elimination of the bottleneck of cumbersome matrix multiplication and inversion, yielding a fast closed form solution in both steps of the CEM.In the experimental results part, we compare the CEM algorithm over several data sets and demonstrate the speedup obtained by the closed form solutions. Our results support the theoretical analysis of this improvement. Conclusions These improvements enable us to increase substantially the scale of inputs analyzed by the method, allowing us to apply the new approach to data sets that could not be analyzed before.
Collapse
|
7
|
Snir S, Pellegrini M. An epigenetic pacemaker is detected via a fast conditional expectation maximization algorithm. Epigenomics 2019; 10:695-706. [PMID: 29979108 DOI: 10.2217/epi-2017-0130] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
AIM DNA methylation has proven to be a remarkably accurate biomarker for human age, allowing the prediction of chronological age to within a couple of years. Recently, we proposed that the Universal PaceMaker (UPM), a flexible paradigm for modeling evolution, could be applied to epigenetic aging. Nevertheless, application to real data was restricted to small datasets for technical limitations. MATERIALS & METHODS We partition the set of variables into to two subsets and optimize the likelihood function on each set separately. This yields an extremely efficient Conditional Expectation Maximization algorithm, alternating between the two sets while increasing the overall likelihood. RESULTS Using the technique, we could reanalyze datasets of larger magnitude and show significant advantage to the UPM approach. CONCLUSION The UPM more faithfully models epigenetic aging than the time linear approach while methylated sites accelerate and decelerate jointly.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, 3498838, Israel
| | - Matteo Pellegrini
- Deptartment of Molecular, Cell & Developmental Biology, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
8
|
Abstract
Kimura's neutral theory argued that positive selection was not responsible for an appreciable fraction of molecular substitutions. Correspondingly, quantitative analysis reveals that the vast majority of substitutions in cancer genomes are not detectably under selection. Insights from the somatic evolution of cancer reveal that beneficial substitutions in cancer constitute a small but important fraction of the molecular variants. The molecular evolution of cancer community will benefit by incorporating the neutral theory of molecular evolution into their understanding and analysis of cancer evolution-and accepting the use of tractable, predictive models, even when there is some evidence that they are not perfect.
Collapse
Affiliation(s)
| | - Jeffrey P Townsend
- Department of Biostatistics, Yale University, New Haven, CT
- Program in Computational Biology and Bioinformatics
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT
| |
Collapse
|
9
|
Caccone A, Powell JR. DNA DIVERGENCE AMONG HOMINOIDS. Evolution 2017; 43:925-942. [PMID: 28564151 DOI: 10.1111/j.1558-5646.1989.tb02540.x] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/1988] [Accepted: 03/05/1989] [Indexed: 10/19/2022]
Abstract
We have determined the degree of single-copy DNA divergence among the extant members of the Hominoidea employing the technique of DNA-DNA hybridization. The species studied include humans, two species of chimpanzees, gorillas, two subspecies of orangutans, and two species of gibbons; as an outgroup we have used a member of the Old World monkeys (Cercopithecidae), the baboon. Our methods are different from those previously used and allow us to control for two factors other than base-pair mismatch that can affect the thermal stability of DNA duplexes: the base composition and duplex length. In addition, we have studied more than one individual for most species and thus are able to assess the effect of intraspecific variation on phylogenetic conclusions. The results indicate that the closest extant relatives of humans are the chimpanzees. Gorillas are the next closest, followed by orangutans and gibbons. This result is strongly supported statistically, as there is virtually no overlap in measurements between different taxa. Our conclusions are in agreement with a growing amount of molecular evidence supporting this pattern of relatedness. The data behave as a reasonably good molecular clock, and we do not see an indication of slowdown in molecular evolution in the clade containing humans and African apes, contrary to what has been documented for protein-coding regions. Because of the clocklike nature of the results, we have estimated that the divergence of humans and chimpanzees occurred about 6-8 million years ago. Results from orangutans indicate that the Borneo and Sumatra populations are genetically distinct, about as different as the named species of chimpanzees.
Collapse
Affiliation(s)
- Adalgisa Caccone
- Department of Biology, Yale University, P.O. Box 6666, New Haven, CT, 06511
| | - Jeffrey R Powell
- Department of Biology, Yale University, P.O. Box 6666, New Haven, CT, 06511
| |
Collapse
|
10
|
Chao L, Carr DE. THE MOLECULAR CLOCK AND THE RELATIONSHIP BETWEEN POPULATION SIZE AND GENERATION TIME. Evolution 2017; 47:688-690. [PMID: 28568720 DOI: 10.1111/j.1558-5646.1993.tb02124.x] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/1992] [Accepted: 08/14/1992] [Indexed: 12/01/2022]
Affiliation(s)
- Lin Chao
- Department of Zoology, MD, 20742, USA
| | - David E Carr
- Department of Botany, University of Maryland, College Park, MD, 20742, USA
| |
Collapse
|
11
|
Snir S, vonHoldt BM, Pellegrini M. A Statistical Framework to Identify Deviation from Time Linearity in Epigenetic Aging. PLoS Comput Biol 2016; 12:e1005183. [PMID: 27835646 PMCID: PMC5106012 DOI: 10.1371/journal.pcbi.1005183] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 10/05/2016] [Indexed: 01/09/2023] Open
Abstract
In multiple studies DNA methylation has proven to be an accurate biomarker of age. To develop these biomarkers, the methylation of multiple CpG sites is typically linearly combined to predict chronological age. By contrast, in this study we apply the Universal PaceMaker (UPM) model to investigate changes in DNA methylation during aging. The UPM was initially developed to study rate acceleration/deceleration in sequence evolution. Rather than identifying which linear combinations of sites predicts age, the UPM models the rates of change of multiple CpG sites, as well as their starting methylation levels, and estimates the age of each individual to optimize the model fit. We refer to the estimated age as the “epigenetic age”, which is in contrast to the known chronological age of each individual. We construct a statistical framework and devise an algorithm to determine whether a genomic pacemaker is in effect (i.e rates of change vary with age). The decision is made by comparing two competing likelihood based models, the molecular clock (MC) and UPM. For the molecular clock model, we use the known chronological age of each individual and fit the methylation rates at multiple sites, and express the problem as a linear least squares and solve it in polynomial time. For the UPM case, the search space is larger as we are fitting both the epigenetic age of each individual as well as the rates for each site, yet we succeed to reduce the problem to the space of individuals and polynomial in the more significant space—the methylated sites. We first tested our algorithm on simulated data to elucidate the factors affecting the identification of the pacemaker model. We find that, provided with enough data, our algorithm is capable of identifying a pacemaker even when a weak signal is present in the data. Based on these results, we applied our method to DNA methylation data from human blood from individuals of various ages. Although the improvement in variance across sites between the UPM and MC was small, the results suggest that the existence of a pacemaker is highly significant. The PaceMaker results also suggest a decay in the rate of change in DNA methylation with age. DNA methylation is an important component of the epigenetic code that defines and maintains the state of cells. Recently, it has been found that certain sites in the genome undergo methylation changes at different rates during aging. The seminal work of Steve Horvath found that the methylation of a couple hundred CpG sites could be linearly combined to accurately predict the age of an individual in a number of tissues. Such a pattern resembles the Molecular Clock (MC) concept prevailing in molecular evolution, which suggests that there are sites in the genome that change linearly with age. In this work, we adapt the Universal PaceMaker (UPM) model to the setting of DNA methylation changes during aging. UPM relaxes the rate constancy of MC and was found to provide a better statistical explanation for genome evolution across the entire tree of life. This adaptation requires the solution of a complex optimization problem. Nevertheless, in a series of observations we show that the problem can be solved efficiently under the MC model and slightly less efficiently under the UPM model. This allows us to solve problems of non-trivial size. We chose as a proof of concept to analyze DNA methylation data collected from the blood of humans of different ages. Our results show that, similarly to genome evolution, the UPM provided an improvement of about 2% in the fit to the data. The statistical significance of this improvement is very high. Although tested on a small data set, this improvement demonstrates that the UPM more accurately captures age related DNA methylation changes than the MC model.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| | - Bridgett M. vonHoldt
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Matteo Pellegrini
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|
12
|
Tang HB, Ouyang K, Rao GB, Ma L, Zhong H, Bai A, Qin S, Chen F, Lin J, Cao Y, Liao YJ, Zhang J, Wu J. Characterization of Complete Genome Sequences of a Porcine Endogenous Retrovirus Isolated From China Bama Minipig Reveals an Evolutionary Time Earlier Than That of Isolates From European Minipigs. Transplant Proc 2016; 48:222-8. [PMID: 26915872 DOI: 10.1016/j.transproceed.2015.12.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Accepted: 12/10/2015] [Indexed: 10/22/2022]
Abstract
BACKGROUND A porcine endogenous retroviruses (PERV) isolate, PERV-A-BM, was isolated from a Guangxi Bama minipig in China. METHODS To understand its genetic variation and evolution, the complete PERV-A-BM genome sequences were determined and compared with isolates from different Sus scrofa breeds and porcine cell lines. A total of 69 nucleotide substitutions were found in the full-length genome, including 26 non-synonymous mutations. RESULTS Phylogenetic trees based on the complete genome sequence as well as the gag, pol, and env gene sequences from 21 PERV isolates demonstrated that the PERV-A-BM was closely related to the EF133960 isolate from Chinese Wuzhishan miniature pigs inbred in Hainan, China, and distantly related to strains isolated from European-born pigs. CONCLUSIONS The estimation of age in the proviral PERV-A-BM integrating into the host genome reveals that the age of PERV-A-BM is at least 8.3 × 10(6) years, an evolutionary time earlier than that of isolates from European-born pigs.
Collapse
Affiliation(s)
- H-B Tang
- Guangxi Veterinary Research Institute, Nanning, Guangxi, China
| | - K Ouyang
- Guangxi Veterinary Research Institute, Nanning, Guangxi, China
| | - G-B Rao
- Guangxi Veterinary Research Institute, Nanning, Guangxi, China
| | - L Ma
- Guangxi Veterinary Research Institute, Nanning, Guangxi, China
| | - H Zhong
- School of Marine Sciences and Biotechnology, Guangxi University for Nationalities, Nanning, China
| | - A Bai
- Guangxi Veterinary Research Institute, Nanning, Guangxi, China
| | - S Qin
- Guangxi Veterinary Research Institute, Nanning, Guangxi, China
| | - F Chen
- Guangxi Veterinary Research Institute, Nanning, Guangxi, China
| | - J Lin
- Guangxi Veterinary Research Institute, Nanning, Guangxi, China
| | - Y Cao
- Guangxi Veterinary Research Institute, Nanning, Guangxi, China
| | - Y-J Liao
- School of Marine Sciences and Biotechnology, Guangxi University for Nationalities, Nanning, China
| | - J Zhang
- Laboratory for Viral Safety of National Centre of Biomedical Analysis, Institute of Transfusion Medicine, The Academy of Military Medical Sciences, Beijing, China
| | - J Wu
- Guangxi Veterinary Research Institute, Nanning, Guangxi, China.
| |
Collapse
|
13
|
O'Malley MA. Histories of molecules: Reconciling the past. STUDIES IN HISTORY AND PHILOSOPHY OF SCIENCE 2016; 55:69-83. [PMID: 26774071 DOI: 10.1016/j.shpsa.2015.09.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Revised: 09/07/2015] [Accepted: 09/08/2015] [Indexed: 06/05/2023]
Abstract
Molecular data and methods have become centrally important to evolutionary analysis, largely because they have enabled global phylogenetic reconstructions of the relationships between organisms in the tree of life. Often, however, molecular stories conflict dramatically with morphology-based histories of lineages. The evolutionary origin of animal groups provides one such case. In other instances, different molecular analyses have so far proved irreconcilable. The ancient and major divergence of eukaryotes from prokaryotic ancestors is an example of this sort of problem. Efforts to overcome these conflicts highlight the role models play in phylogenetic reconstruction. One crucial model is the molecular clock; another is that of 'simple-to-complex' modification. I will examine animal and eukaryote evolution against a backdrop of increasing methodological sophistication in molecular phylogeny, and conclude with some reflections on the nature of historical science in the molecular era of phylogeny.
Collapse
|
14
|
|
15
|
Snir S. On the number of genomic pacemakers: a geometric approach. Algorithms Mol Biol 2014; 9:26. [PMID: 25648755 PMCID: PMC4301663 DOI: 10.1186/s13015-014-0026-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 11/11/2014] [Indexed: 11/13/2022] Open
Abstract
The universal pacemaker (UPM) model extends the classical molecular clock (MC) model, by allowing each gene, in addition to its individual intrinsic rate as in the MC, to accelerate or decelerate according to the universal pacemaker. Under UPM, the relative evolutionary rates of all genes remain nearly constant whereas the absolute rates can change arbitrarily. It was shown on several taxa groups spanning the entire tree of life that the UPM model describes the evolutionary process better than the MC model. In this work we provide a natural generalization to the UPM model that we denote multiple pacemakers (MPM). Under the MPM model every gene is still affected by a single pacemaker, however the number of pacemakers is not confined to one. Such a model induces a partition over the gene set where all the genes in one part are affected by the same pacemaker and task is to identify the pacemaker partition, or in other words, finding for each gene its associated pacemaker. We devise a novel heuristic procedure, relying on statistical and geometrical tools, to solve the problem and demonstrate by simulation that this approach can cope satisfactorily with considerable noise and realistic problem sizes. We applied this procedure to a set of over 2000 genes in 100 prokaryotes and demonstrated the significant existence of two pacemakers.
Collapse
|
16
|
Arcila D, Alexander Pyron R, Tyler JC, Ortí G, Betancur-R R. An evaluation of fossil tip-dating versus node-age calibrations in tetraodontiform fishes (Teleostei: Percomorphaceae). Mol Phylogenet Evol 2014; 82 Pt A:131-45. [PMID: 25462998 DOI: 10.1016/j.ympev.2014.10.011] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Accepted: 10/14/2014] [Indexed: 10/24/2022]
Abstract
Time-calibrated phylogenies based on molecular data provide a framework for comparative studies. Calibration methods to combine fossil information with molecular phylogenies are, however, under active development, often generating disagreement about the best way to incorporate paleontological data into these analyses. This study provides an empirical comparison of the most widely used approach based on node-dating priors for relaxed clocks implemented in the programs BEAST and MrBayes, with two recently proposed improvements: one using a new fossilized birth-death process model for node dating (implemented in the program DPPDiv), and the other using a total-evidence or tip-dating method (implemented in MrBayes and BEAST). These methods are applied herein to tetraodontiform fishes, a diverse group of living and extinct taxa that features one of the most extensive fossil records among teleosts. Previous estimates of time-calibrated phylogenies of tetraodontiforms using node-dating methods reported disparate estimates for their age of origin, ranging from the late Jurassic to the early Paleocene (ca. 150-59Ma). We analyzed a comprehensive dataset with 16 loci and 210 morphological characters, including 131 taxa (95 extant and 36 fossil species) representing all families of fossil and extant tetraodontiforms, under different molecular clock calibration approaches. Results from node-dating methods produced consistently younger ages than the tip-dating approaches. The older ages inferred by tip dating imply an unlikely early-late Jurassic (ca. 185-119Ma) origin for this order and the existence of extended ghost lineages in their fossil record. Node-based methods, by contrast, produce time estimates that are more consistent with the stratigraphic record, suggesting a late Cretaceous (ca. 86-96Ma) origin. We show that the precision of clade age estimates using tip dating increases with the number of fossils analyzed and with the proximity of fossil taxa to the node under assessment. This study suggests that current implementations of tip dating may overestimate ages of divergence in calibrated phylogenies. It also provides a comprehensive phylogenetic framework for tetraodontiform systematics and future comparative studies.
Collapse
Affiliation(s)
- Dahiana Arcila
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, DC 20052, United States; Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, P.O. Box 37012, MRC 159, Washington, DC 20013, United States.
| | - R Alexander Pyron
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, DC 20052, United States
| | - James C Tyler
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, P.O. Box 37012, MRC 159, Washington, DC 20013, United States
| | - Guillermo Ortí
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, DC 20052, United States
| | - Ricardo Betancur-R
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, P.O. Box 37012, MRC 159, Washington, DC 20013, United States; Department of Biology, University of Puerto Rico - Río Piedras, P.O. Box 23360, San Juan 00931, Puerto Rico
| |
Collapse
|
17
|
Molnár J, Póti Á, Pipek O, Krzystanek M, Kanu N, Swanton C, Tusnády GE, Szallasi Z, Csabai I, Szüts D. The genome of the chicken DT40 bursal lymphoma cell line. G3 (BETHESDA, MD.) 2014; 4:2231-40. [PMID: 25227228 PMCID: PMC4232548 DOI: 10.1534/g3.114.013482] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Accepted: 09/08/2014] [Indexed: 01/23/2023]
Abstract
The chicken DT40 cell line is a widely used model system in the study of multiple cellular processes due to the efficiency of homologous gene targeting. The cell line was derived from a bursal lymphoma induced by avian leukosis virus infection. In this study we characterized the genome of the cell line using whole genome shotgun sequencing and single nucleotide polymorphism array hybridization. The results indicate that wild-type DT40 has a relatively normal karyotype, except for whole chromosome copy number gains, and no karyotype variability within stocks. In a comparison to two domestic chicken genomes and the Gallus gallus reference genome, we found no unique mutational processes shaping the DT40 genome except for a mild increase in insertion and deletion events, particularly deletions at tandem repeats. We mapped coding sequence mutations that are unique to the DT40 genome; mutations inactivating the PIK3R1 and ATRX genes likely contributed to the oncogenic transformation. In addition to a known avian leukosis virus integration in the MYC gene, we detected further integration sites that are likely to de-regulate gene expression. The new findings support the hypothesis that DT40 is a typical transformed cell line with a relatively intact genome; therefore, it is well-suited to the role of a model system for DNA repair and related processes. The sequence data generated by this study, including a searchable de novo genome assembly and annotated lists of mutated genes, will support future research using this cell line.
Collapse
Affiliation(s)
- János Molnár
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary
| | - Ádám Póti
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary
| | - Orsolya Pipek
- Department of Physics of Complex Systems, Eötvös Loránd University, H-1117 Budapest, Hungary
| | - Marcin Krzystanek
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Lyngby, Denmark
| | - Nnennaya Kanu
- Cancer Research UK London Research Institute, London, WCA2 3PX, United Kingdom
| | - Charles Swanton
- Cancer Research UK London Research Institute, London, WCA2 3PX, United Kingdom
| | - Gábor E Tusnády
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary
| | - Zoltan Szallasi
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Lyngby, Denmark Children's Hospital Informatics Program at the Harvard-Massachusetts Institutes of Technology Division of Health Sciences and Technology (CHIP@HST), Harvard Medical School, Boston, MA 02115
| | - István Csabai
- Department of Physics of Complex Systems, Eötvös Loránd University, H-1117 Budapest, Hungary
| | - Dávid Szüts
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary
| |
Collapse
|
18
|
Snir S, Wolf YI, Koonin EV. Universal pacemaker of genome evolution in animals and fungi and variation of evolutionary rates in diverse organisms. Genome Biol Evol 2014; 6:1268-78. [PMID: 24812293 PMCID: PMC4079209 DOI: 10.1093/gbe/evu091] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Gene evolution is traditionally considered within the framework of the molecular clock (MC) model whereby each gene is characterized by an approximately constant rate of evolution. Recent comparative analysis of numerous phylogenies of prokaryotic genes has shown that a different model of evolution, denoted the Universal PaceMaker (UPM), which postulates conservation of relative, rather than absolute evolutionary rates, yields a better fit to the phylogenetic data. Here, we show that the UPM model is a better fit than the MC for genome wide sets of phylogenetic trees from six species of Drosophila and nine species of yeast, with extremely high statistical significance. Unlike the prokaryotic phylogenies that include distant organisms and multiple horizontal gene transfers, these are simple data sets that cover groups of closely related organisms and consist of gene trees with the same topology as the species tree. The results indicate that both lineage-specific and gene-specific rates are important in genome evolution but the lineage-specific contribution is greater. Similar to the MC, the gene evolution rates under the UPM are strongly overdispersed, approximately 2-fold compared with the expectation from sampling error alone. However, we show that neither Drosophila nor yeast genes form distinct clusters in the tree space. Thus, the gene-specific deviations from the UPM, although substantial, are uncorrelated and most likely depend on selective factors that are largely unique to individual genes. Thus, the UPM appears to be a key feature of genome evolution across the history of cellular life.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary and Environmental Biology and The Institute of Evolution, University of Haifa, Israel
| | - Yuri I Wolf
- National Center for Biotechnology Information, NLM, National Institutes of Health, Bethesda, MD
| | - Eugene V Koonin
- National Center for Biotechnology Information, NLM, National Institutes of Health, Bethesda, MD
| |
Collapse
|
19
|
Ragan MA, Bernard G, Chan CX. Molecular phylogenetics before sequences: oligonucleotide catalogs as k-mer spectra. RNA Biol 2014; 11:176-85. [PMID: 24572375 PMCID: PMC4008546 DOI: 10.4161/rna.27505] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
From 1971 to 1985, Carl Woese and colleagues generated oligonucleotide catalogs of 16S/18S rRNAs from more than 400 organisms. Using these incomplete and imperfect data, Carl and his colleagues developed unprecedented insights into the structure, function, and evolution of the large RNA components of the translational apparatus. They recognized a third domain of life, revealed the phylogenetic backbone of bacteria (and its limitations), delineated taxa, and explored the tempo and mode of microbial evolution. For these discoveries to have stood the test of time, oligonucleotide catalogs must carry significant phylogenetic signal; they thus bear re-examination in view of the current interest in alignment-free phylogenetics based on k-mers. Here we consider the aims, successes, and limitations of this early phase of molecular phylogenetics. We computationally generate oligonucleotide sets (e-catalogs) from 16S/18S rRNA sequences, calculate pairwise distances between them based on D2 statistics, compute distance trees, and compare their performance against alignment-based and k-mer trees. Although the catalogs themselves were superseded by full-length sequences, this stage in the development of computational molecular biology remains instructive for us today.
Collapse
Affiliation(s)
- Mark A Ragan
- Institute for Molecular Bioscience, and ARC Centre of Excellence in Bioinformatics; The University of Queensland; Brisbane, QLD, Australia
| | - Guillaume Bernard
- Institute for Molecular Bioscience, and ARC Centre of Excellence in Bioinformatics; The University of Queensland; Brisbane, QLD, Australia
| | - Cheong Xin Chan
- Institute for Molecular Bioscience, and ARC Centre of Excellence in Bioinformatics; The University of Queensland; Brisbane, QLD, Australia
| |
Collapse
|
20
|
Wolf YI, Snir S, Koonin EV. Stability along with extreme variability in core genome evolution. Genome Biol Evol 2013; 5:1393-402. [PMID: 23821522 PMCID: PMC3730350 DOI: 10.1093/gbe/evt098] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The shape of the distribution of evolutionary distances between orthologous genes in pairs of closely related genomes is universal throughout the entire range of cellular life forms. The near invariance of this distribution across billions of years of evolution can be accounted for by the Universal Pace Maker (UPM) model of genome evolution that yields a significantly better fit to the phylogenetic data than the Molecular Clock (MC) model. Unlike the MC, the UPM model does not assume constant gene-specific evolutionary rates but rather postulates that, in each evolving lineage, the evolutionary rates of all genes change (approximately) in unison although the pacemakers of different lineages are not necessarily synchronized. Here, we dissect the nearly constant evolutionary rate distribution by comparing the genome-wide relative rates of evolution of individual genes in pairs or triplets of closely related genomes from diverse bacterial and archaeal taxa. We show that, although the gene-specific relative rate is an important feature of genome evolution that explains more than half of the variance of the evolutionary distances, the ranges of relative rate variability are extremely broad even for universal genes. Because of this high variance, the gene-specific rate is a poor predictor of the conservation rank for any gene in any particular lineage.
Collapse
Affiliation(s)
- Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | |
Collapse
|
21
|
Snir S, Wolf YI, Koonin EV. Universal pacemaker of genome evolution. PLoS Comput Biol 2012; 8:e1002785. [PMID: 23209393 PMCID: PMC3510094 DOI: 10.1371/journal.pcbi.1002785] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2012] [Accepted: 10/02/2012] [Indexed: 11/18/2022] Open
Abstract
A fundamental observation of comparative genomics is that the distribution of evolution rates across the complete sets of orthologous genes in pairs of related genomes remains virtually unchanged throughout the evolution of life, from bacteria to mammals. The most straightforward explanation for the conservation of this distribution appears to be that the relative evolution rates of all genes remain nearly constant, or in other words, that evolutionary rates of different genes are strongly correlated within each evolving genome. This correlation could be explained by a model that we denoted Universal PaceMaker (UPM) of genome evolution. The UPM model posits that the rate of evolution changes synchronously across genome-wide sets of genes in all evolving lineages. Alternatively, however, the correlation between the evolutionary rates of genes could be a simple consequence of molecular clock (MC). We sought to differentiate between the MC and UPM models by fitting thousands of phylogenetic trees for bacterial and archaeal genes to supertrees that reflect the dominant trend of vertical descent in the evolution of archaea and bacteria and that were constrained according to the two models. The goodness of fit for the UPM model was better than the fit for the MC model, with overwhelming statistical significance, although similarly to the MC, the UPM is strongly overdispersed. Thus, the results of this analysis reveal a universal, genome-wide pacemaker of evolution that could have been in operation throughout the history of life.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary and Environmental Biology and The Institute of Evolution, University of Haifa Mount Carmel, Haifa, Israel
| | | | | |
Collapse
|
22
|
Distinct co-evolution patterns of genes associated to DNA polymerase III DnaE and PolC. BMC Genomics 2012; 13:69. [PMID: 22333191 PMCID: PMC3814617 DOI: 10.1186/1471-2164-13-69] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Accepted: 02/14/2012] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Bacterial genomes displaying a strong bias between the leading and the lagging strand of DNA replication encode two DNA polymerases III, DnaE and PolC, rather than a single one. Replication is a highly unsymmetrical process, and the presence of two polymerases is therefore not unexpected. Using comparative genomics, we explored whether other processes have evolved in parallel with each polymerase. RESULTS Extending previous in silico heuristics for the analysis of gene co-evolution, we analyzed the function of genes clustering with dnaE and polC. Clusters were highly informative. DnaE co-evolves with the ribosome, the transcription machinery, the core of intermediary metabolism enzymes. It is also connected to the energy-saving enzyme necessary for RNA degradation, polynucleotide phosphorylase. Most of the proteins of this co-evolving set belong to the persistent set in bacterial proteomes, that is fairly ubiquitously distributed. In contrast, PolC co-evolves with RNA degradation enzymes that are present only in the A+T-rich Firmicutes clade, suggesting at least two origins for the degradosome. CONCLUSION DNA replication involves two machineries, DnaE and PolC. DnaE co-evolves with the core functions of bacterial life. In contrast PolC co-evolves with a set of RNA degradation enzymes that does not derive from the degradosome identified in gamma-Proteobacteria. This suggests that at least two independent RNA degradation pathways existed in the progenote community at the end of the RNA genome world.
Collapse
|
23
|
Hagen JB. Waiting for sequences: Morris Goodman, immunodiffusion experiments, and the origins of molecular anthropology. JOURNAL OF THE HISTORY OF BIOLOGY 2010; 43:697-725. [PMID: 20665076 DOI: 10.1007/s10739-009-9219-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
During the early 1960s, Morris Goodman used a variety of immunological tests to demonstrate the very close genetic relationships among humans, chimpanzees, and gorillas. Molecular anthropologists often point to this early research as a critical step in establishing their new specialty. Based on his molecular results, Goodman challenged the widely accepted taxonomic classification that separated humans from chimpanzees and gorillas in two separate families. His claim that chimpanzees and gorillas should join humans in family Hominidae sparked a well-known conflict with George Gaylord Simpson, Ernst Mayr, and other prominent evolutionary biologists. Less well known, but equally significant, were a series of disagreements between Goodman and other prominent molecular evolutionists concerning both methodological and theoretical issues. These included qualitative versus quantitative data, the role of natural selection, rates of evolution, and the reality of molecular clocks. These controversies continued throughout Goodman's career, even as he moved from immunological techniques to protein and DNA sequence analysis. This episode highlights the diversity of methods used by molecular evolutionists and the conflicting conclusions drawn from the data that these methods generated.
Collapse
Affiliation(s)
- Joel B Hagen
- Department of Biology, Radford University, Radford, VA, 24142, USA.
| |
Collapse
|
24
|
Sommer M. History in the gene: negotiations between molecular and organismal anthropology. JOURNAL OF THE HISTORY OF BIOLOGY 2008; 41:473-528. [PMID: 19244721 DOI: 10.1007/s10739-008-9150-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
In the advertising discourse of human genetic database projects, of genetic ancestry tracing companies, and in popular books on anthropological genetics, what I refer to as the anthropological gene and genome appear as documents of human history, by far surpassing the written record and oral history in scope and accuracy as archives of our past. How did macromolecules become "documents of human evolutionary history"? Historically, molecular anthropology, a term introduced by Emile Zuckerkandl in 1962 to characterize the study of primate phylogeny and human evolution on the molecular level, asserted its claim to the privilege of interpretation regarding hominoid, hominid, and human phylogeny and evolution vis-à-vis other historical sciences such as evolutionary biology, physical anthropology, and paleoanthropology. This process will be discussed on the basis of three key conferences on primate classification and evolution that brought together exponents of the respective fields and that were held in approximately ten-years intervals between the early 1960s and the 1980s. I show how the anthropological gene and genome gained their status as the most fundamental, clean, and direct records of historical information, and how the prioritizing of these epistemic objects was part of a complex involving the objectivity of numbers, logic, and mathematics, the objectivity of machines and instruments, and the objectivity seen to reside in the epistemic objects themselves.
Collapse
|
25
|
Affiliation(s)
- Naoyuki Takahata
- Graduate University for Advanced Studies (Sokendai), Hayama, Kanagawa 240-0193, Japan.
| |
Collapse
|
26
|
van Tuinen M, Ramakrishnan U, Hadly EA. Studying the effect of environmental change on biotic evolution: past genetic contributions, current work and future directions. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2004; 362:2795-2820. [PMID: 15539371 DOI: 10.1098/rsta.2004.1465] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Evolutionary geneticists currently face a major scientific opportunity when integrating across the rapidly increasing amount of genetic data and existing biological scenarios based on ecology, fossils or climate models. Although genetic data acquisition and analysis have improved tremendously, several limitations remain. Here, we discuss the feedback between history and genetic variation in the face of environmental change with increasing taxonomic and temporal scale, as well as the major challenges that lie ahead. In particular, we focus on recent developments in two promising genetic methods, those of 'phylochronology' and 'molecular clocks'. With the advent of ancient DNA techniques, we can now directly sample the recent past. We illustrate this amazing and largely untapped utility of ancient DNA extracted from accurately dated localities with documented environmental changes. Innovative statistical analyses of these genetic data expose the direct effect of recent environmental change on genetic endurance, or maintenance of genetic variation. The 'molecular clock' (assumption of a linear relationship between genetic distance and evolutionary time) has been used extensively in phylogenetic studies to infer time and correlation between lineage divergence time and concurrent environmental change. Several studies at both population and species scale support a persuasive relationship between particular perturbation events and time of biotic divergence. However, we are still a way from gleaning an overall pattern to this relationship, which is a prerequisite to ultimately understanding the mechanisms by which past environments have shaped the evolutionary trajectory. Current obstacles include as-yet undecided reasons behind the frequent discrepancy between molecular and fossil time estimates, and the frequent lack of consideration of extensive confidence intervals around time estimates. We suggest that use and interpretation of both ancient DNA and molecular clocks is most effective when results are synthesized with palaeontological (fossil) and ecological (life history) information.
Collapse
Affiliation(s)
- Marcel van Tuinen
- Department of Biological Sciences, Gilbert Hall, Stanford University, Stanford, CA 94305-5020, USA.
| | | | | |
Collapse
|
27
|
Tönjes RR, Niebert M. Relative age of proviral porcine endogenous retrovirus sequences in Sus scrofa based on the molecular clock hypothesis. J Virol 2003; 77:12363-8. [PMID: 14581574 PMCID: PMC254287 DOI: 10.1128/jvi.77.22.12363-12368.2003] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2003] [Accepted: 08/10/2003] [Indexed: 11/20/2022] Open
Abstract
Porcine endogenous retroviruses (PERV) are discussed as putative infectious agents in xenotransplantation. PERV classes A, B, and C harbor different envelope proteins. Two different types of long terminal repeat (LTR) structures exist, of which both are present only in PERV-A. One type of LTR contains a distinct repeat structure in U3, while the other is repeatless, conferring a lower level of transcriptional activity. Since the different LTR structures are distributed unequally among the proviruses and, apparently, PERV is the only virus harboring two different LTR structures, we were interested in determining which LTR is the ancestor. Replication-competent viruses can still be found today, suggesting an evolutionary recent origin. Our studies revealed that the age of PERV is at most 7.6 x 10(6) years, whereas the repeatless LTR type evolved approximately 3.4 x 10(6) years ago, being the phylogenetically younger structure. The age determined for PERV correlates with the time of separation between pigs (Suidae, Sus scrofa) and their closest relatives, American-born peccaries (Tayassuidae, Pecari tajacu), 7.4 x 10(6) years ago.
Collapse
|
28
|
Abstract
The further evolution of informational molecular sequences should depend on the number of viable alternatives possible for the sequences as set by selection, the unrepaired mutation rate, and time. Most biomolecular clocks are based on Kimura's nearly neutral mutation random-drift hypothesis. This clock assumes that informational sequences are in equilibrium, i.e., the nucleotides mutate at a uniform rate and the number of nucleotides unconstrained by selection remains constant. Correcting for deviations from these assumptions should produce a more accurate clock. Informational molecules probably formed from polynucleotides having some other function such as nitrogen or nucleotide storage, thus being initially functionally unselected. At any time the rate of development of functionality in a protein may be expected to be proportional to the number of viable alternatives of sequence in its potentially interacting regions. Assuming the rate of unrepaired mutations is constant, these clocks should exponentially slow as they evolve, each with a different rate toward individual equilibria. Also if the degree of selection changes, its clock rate should change. For a more precise clock two approaches are suggested to estimate these time dependent changes in evolutionary rate. An improved clock could improve estimation of phylogeny and put a time scale on that phylogeny.
Collapse
Affiliation(s)
- Kenneth W Foster
- Physics Department, Syracuse University, Syracuse, NY 13244-1130, USA.
| |
Collapse
|
29
|
Pazos F, Valencia A. Similarity of phylogenetic trees as indicator of protein-protein interaction. PROTEIN ENGINEERING 2001; 14:609-14. [PMID: 11707606 DOI: 10.1093/protein/14.9.609] [Citation(s) in RCA: 303] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Deciphering the network of protein interactions that underlines cellular operations has become one of the main tasks of proteomics and computational biology. Recently, a set of bioinformatics approaches has emerged for the prediction of possible interactions by combining sequence and genomic information. Even though the initial results are very promising, the current methods are still far from perfect. We propose here a new way of discovering possible protein-protein interactions based on the comparison of the evolutionary distances between the sequences of the associated protein families, an idea based on previous observations of correspondence between the phylogenetic trees of associated proteins in systems such as ligands and receptors. Here, we extend the approach to different test sets, including the statistical evaluation of their capacity to predict protein interactions. To demonstrate the possibilities of the system to perform large-scale predictions of interactions, we present the application to a collection of more than 67 000 pairs of E.coli proteins, of which 2742 are predicted to correspond to interacting proteins.
Collapse
Affiliation(s)
- F Pazos
- Protein Design Group, CNB-CSIC, Cantoblanco, E-28049 Madrid, Spain
| | | |
Collapse
|
30
|
Abstract
Bioinformatics is often described as being in its infancy, but computers emerged as important tools in molecular biology during the early 1960s. A decade before DNA sequencing became feasible, computational biologists focused on the rapidly accumulating data from protein biochemistry. Without the benefits of super computers or computer networks, these scientists laid important conceptual and technical foundations for bioinformatics today.
Collapse
Affiliation(s)
- J B Hagen
- Department of Biology, Radford University, Radford, Virginia 24142, USA.
| |
Collapse
|
31
|
Kumar S, Mitnik C, Valente G, Floyd-Smith G. Expansion and molecular evolution of the interferon-induced 2'-5' oligoadenylate synthetase gene family. Mol Biol Evol 2000; 17:738-50. [PMID: 10779534 DOI: 10.1093/oxfordjournals.molbev.a026352] [Citation(s) in RCA: 46] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The mammalian 2'-5' oligoadenylate synthetases (2'-5'OASs) are enzymes that are crucial in the interferon-induced antiviral response. They catalyze the polymerization of ATP into 2'-5'-linked oligoadenylates which activate a constitutively expressed latent endonuclease, RNaseL, to block viral replication at the level of mRNA degradation. A molecular evolutionary analysis of available OAS sequences suggests that the vertebrate genes are members of a multigene family with its roots in the early history of tetrapods. The modern mammalian 2'-5'OAS genes underwent successive gene duplication events resulting in three size classes of enzymes, containing one, two, or three homologous domains. Expansion of the OAS gene family occurred by whole-gene duplications to increase gene content and by domain couplings to produce the multidomain genes. Evolutionary analyses show that the 2'-5'OAS genes in rodents underwent gene duplications as recently as 11 MYA and predict the existence of additional undiscovered OAS genes in mammals.
Collapse
Affiliation(s)
- S Kumar
- Department of Biology and Molecular and Cellular Biology Program, Arizona State University, Tempe 85287-1501, USA.
| | | | | | | |
Collapse
|
32
|
Abstract
A timescale is necessary for estimating rates of molecular and morphological change in organisms and for interpreting patterns of macroevolution and biogeography. Traditionally, these times have been obtained from the fossil record, where the earliest representatives of two lineages establish a minimum time of divergence of these lineages. The clock-like accumulation of sequence differences in some genes provides an alternative method by which the mean divergence time can be estimated. Estimates from single genes may have large statistical errors, but multiple genes can be studied to obtain a more reliable estimate of divergence time. However, until recently, the number of genes available for estimation of divergence time has been limited. Here we present divergence-time estimates for mammalian orders and major lineages of vertebrates, from an analysis of 658 nuclear genes. The molecular times agree with most early (Palaeozoic) and late (Cenozoic) fossil-based times, but indicate major gaps in the Mesozoic fossil record. At least five lineages of placental mammals arose more than 100 million years ago, and most of the modern orders seem to have diversified before the Cretaceous/Tertiary extinction of the dinosaurs.
Collapse
Affiliation(s)
- S Kumar
- Department of Biology and Institute of Molecular Evolutionary Genetics, Pennsylvania State University, University Park 16802, USA
| | | |
Collapse
|
33
|
Kobayashi M, Satoh N. Early evolution of the Metazoa: an inference from the elongation factor-1alpha. PROGRESS IN MOLECULAR AND SUBCELLULAR BIOLOGY 1998; 19:177-85. [PMID: 15898192 DOI: 10.1007/978-3-642-48745-3_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Affiliation(s)
- M Kobayashi
- Department of Zoology, Graduate School of Science, Kyoto University, Kyoto 606-01, Japan
| | | |
Collapse
|
34
|
Abstract
The nature of weak selection differs between coding and non-coding regions. Coding regions contain genetic information, whereas most non-coding regions do not have any information. Genetic information may be regarded as interaction systems, and the NK model of Kauffman was analysed. This model assumes that each amino acid makes a fitness contribution that depends on the amino acid and on K other amino acids among the N that make the protein. Through simulations, it was found that there are numerous nearly-neutral mutations under this model. Therefore, evolution is rapid in small populations, and slow in large populations. The variance of the evolutionary rate is not quite as large as data indicate under the model, and additional factors, such as environmental change or population-size fluctuation, need to be considered. Weak selection at non-coding regions may come from chromosome organization, and may be regional in character, which differs from that at coding regions. The problem of genetic load is thought to disappear in these circumstances.
Collapse
Affiliation(s)
- T Ohta
- Department of Population Genetics, National Institute of Genetics, Mishima, Shizuoka-ken, Japan.
| |
Collapse
|
35
|
Abstract
Through comparative studies of DNA sequences it has become possible to test the neutral and the selection theories of molecular evolution. The separate estimation of the numbers of synonymous and non-synonymous substitutions is one of the most powerful tools for detecting selection. The patterns on the average and variance of these two types of substitutions of mammalian genes turned out to be in accord with the slightly deleterious or the nearly neutral mutation theory for non-synonymous changes. Interactive systems at the amino acid level were suggested to be responsible for such nearly neutral or very weak selection. An attractive model is the NK model of Kauffman, which assumes that each amino acid makes a fitness contribution that depends upon the amino acid and upon K other amino acids among the N that make the protein. It is known that the fitness landscape is very rugged for K > or = 2. Population genetic analysis of this model suggest that protein evolution obeys the nearly neutral theory and that random genetic drift is important. In other words, evolution becomes rapid in small populations because proportion of near-neutrality increases among new mutations, and proteins as interactive systems evolve by shifting through random genetic drift on the multipeaked fitness landscape.
Collapse
Affiliation(s)
- T Ohta
- Department of Population Genetics, National Institute of Genetics, Mishima, Japan
| |
Collapse
|
36
|
Evans DL, Mansel RE. Molecular evolution and secondary structural conservation in the B-cell lymphoma leukemia 2 (bcl-2) family of proto-oncogene products. J Mol Evol 1995; 41:775-83. [PMID: 8587122 DOI: 10.1007/bf00173157] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
The nature of the bcl-2 family of proto-oncogenes was analyzed by sequence alignment, secondary structure prediction, and phylogenetic techniques. Phylogenies were inferred from both the nucleic acid and amino acid sequences of the human, murine, rat, and chicken sequences for BCL-2 and BCL-X, human MCL1, murine A1, the nematode Caenorhabditis elegans and Caenorhabditis briggsiae ced-9 proteins, and the sequences BHRF1 from Epstein-Barr and LMW5-HL from African swine fever viruses. Both sequence alignment and secondary structure prediction techniques supported the conservation of both the overall secondary structure and the carboxy-terminal transmembrane domain in all members of the family. All the treeing methods employed (distance matrix, maximum likelihood, and parsimony) supported a tree in which the proapoptotic proteins BCL-2 and BCL-X represent the most recent additions to the group. All the trees also indicated that the viral proteins BHRF1 and LMW-HL arose from a common ancestor, an ancestor they shared in common with the pro-apoptotic control protein BAX, indicating that this function of BAX evolved only recently. The most ancient branches are represented by the nematode ced-9 protein and by the control genes MCL1 and A1, which in the treeing methods employed represent separate lineages within the most ancient grouping. These results demonstrate the evolution of a highly conserved family of developmental control genes from nematode to man--genes that encode proteins essential for normal development but which are highly conserved in terms of predicted structure and possible cellular localization. The evolutionary analysis also indicates that the family may be even larger than originally predicted and that other members are waiting to be discovered.
Collapse
Affiliation(s)
- D L Evans
- Department of Surgery, University of Wales College of Medicine, Heath Park, Cardiff, United Kingdom
| | | |
Collapse
|
37
|
Abstract
A physical interpretation of the Topal-Fresco [Nature 263, 285 (1976)] model for spontaneous base substitutions suggests that hydrogen-bonded DNA protons satisfy the criteria for a classical noninteracting isolated system. Accessible states for duplex G-C protons include the keto-amino state and the six complementary enol-imine isomers. Hydrogen-bonded enol and imine protons occupy symmetric double-minima created by the two sets of indistinguishable electron lone pairs and a single proton belonging to each enol-imine end group. These protons will consequently participate in coupled quantum mechanical flip-flop, tunneling back and forth between symmetric energy wells. This results in a quantum mixing of proton energy states where the lowest energy state will be a linear combination of available G-C isomers. The resulting conclusion is that metastable keto-amino G-C protons will populate accessible enol-imine stationary states at rates governed by quantum laws of statistical equilibrium, consistent with achieving the lowest energy condition for duplex G-C protons. Enol-imine G-C stationary states are bound more tightly, of the order of 3 to 12 kcal/mol, which requires a modified mode of Topal-Fresco replication that will inhibit reequilibration of enol and imine G and C template isomers and, thus, promote the formation of complementary mispairs. The model is demonstrated on time-dependent base substitutions expressed by T4 phage DNA systems where data are consistent with model explanations, including the prediction that time-dependent evolutionary transversion sites will exhibit both G-C-to-T-A and G-C-to-C-G transversions at replication, due to proton flip-flop alteration of G template genetic specificity. The observation that A-T sites are resistant to time-dependent evolutionary base substitutions, expressed exclusively at G-C sites, allows codons to be classified as either evolutionary sensitive (16 codons) or evolutionary resistant (8 codons). These criteria provide possible explanations for expansion properties of the CGG fragile X sequences. Enol-imine G-C stationary states appear to have been misdiagnosed as deamination of cytosine and oxidation of guanine to 8-hydroxy-guanine.
Collapse
Affiliation(s)
- W G Cooper
- International Physics Health & Energy, Inc., Houston, Texas 77030, USA
| |
Collapse
|
38
|
Abstract
Chimpanzee, tamarin, and marmoset interleukin-3 (IL-3) genes were cloned, sequenced, and expressed. Western blot analysis demonstrated that functional genes were isolated. IL-3 sequences were compared with those of mouse, rat, rhesus monkey, gibbon, and man. Multiple alignment of the IL-3 coding regions showed that only a few regions had been conserved during mammalian evolution, which are likely associated with functional domains of the IL-3 protein. Substitution rates for the various lineages were calculated and the numbers of synonymous and nonsynonymous substitutions were estimated separately. Distance matrices of the IL-3 coding regions were used to construct phylogenetic trees which revealed large differences in IL-3 evolution rate as well as a more rapid substitution rate for rodents and a rate slowdown during hominoid evolution. Extremes were rhesus monkey IL-3, which accumulated few synonymous substitutions, and gibbon IL-3, which had almost exclusively synonymous substitutions. In rhesus monkey IL-3, nonsynonymous substitutions outnumbered synonymous substitutions, which could not be readily explained by a random process of substitutions. We assume that during evolution of IL-3, the majority of the amino acid replacements and the impaired interspecies functional cross-reactivity originate from selection mechanisms with the most likely selective force being the structure of the heterodimeric IL.3 cell-surface receptor. Insight into IL-3 architecture and structural analysis of the IL-3 receptor are needed to analyze the unusually fast evolution of IL-3 in more detail.
Collapse
Affiliation(s)
- H Burger
- Department of Medical Oncology, Dr. Daniel den Hoed Cancer Center/Dijkzigt, University Hospital Rotterdam, The Netherlands
| | | | | | | |
Collapse
|
39
|
Creti R, Ceccarelli E, Bocchetta M, Sanangelantoni AM, Tiboni O, Palm P, Cammarano P. Evolution of translational elongation factor (EF) sequences: reliability of global phylogenies inferred from EF-1 alpha(Tu) and EF-2(G) proteins. Proc Natl Acad Sci U S A 1994; 91:3255-9. [PMID: 8159735 PMCID: PMC43555 DOI: 10.1073/pnas.91.8.3255] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
The EF-2 coding genes of the Archaea Pyrococcus woesei and Desulfurococcus mobilis were cloned and sequenced. Global phylogenies were inferred by alternative tree-making methods from available EF-2(G) sequence data and contrasted with phylogenies constructed from the more conserved but shorter EF-1 alpha(Tu) sequences. Both the monophyly (sensu Hennig) of Archaea and their subdivision into the kingdoms Crenarchaeota and Euryarchaeota are consistently inferred by analysis of EF-2(G) sequences, usually at a high bootstrap confidence level. In contrast, EF-1 alpha(Tu) phylogenies tend to be inconsistent with one another and show low bootstrap confidence levels. While evolutionary distance and DNA maximum parsimony analyses of EF-1 alpha(Tu) sequences do show archaeal monophyly, protein parsimony and DNA maximum-likelihood analyses of these data do not. In no case, however, do any of the tree topologies inferred from EF-1 alpha(Tu) sequence analyses receive significant bootstrap support.
Collapse
Affiliation(s)
- R Creti
- Istituto Pasteur-Fondazione Cenci-Bolognetti, Dipt. Biopatologia Umana, Università di Roma I, Policlinico Umberto I., Italy
| | | | | | | | | | | | | |
Collapse
|
40
|
Klenk HP, Zillig W. DNA-dependent RNA polymerase subunit B as a tool for phylogenetic reconstructions: branching topology of the archaeal domain. J Mol Evol 1994; 38:420-32. [PMID: 8007009 DOI: 10.1007/bf00163158] [Citation(s) in RCA: 46] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The branching topology of the archaeal (archaebacterial) domain was inferred from sequence comparisons of the largest subunit (B) of DNA-dependent RNA polymerases (RNAP). Both the nucleic acid sequences of the genes coding for RNAP subunit B and the amino acid sequences of the derived gene products were used for phylogenetic reconstructions. Individual analysis of the three nucleotide positions of codons revealed significant inequalities with respect to guanosine and cytosine (GC) content and evolutionary rates. Only the nucleotides at the second codon positions were found to be unbiased by varied GC contents and sufficiently conserved for reliable phylogenetic reconstructions. A decision matrix was used for the combination of the results of distance matrix, maximum parsimony, and maximum likelihood methods. For this purpose the original results (sums of squares, steps, and logarithms of likelihoods) were transformed into comparable effective values and analyzed with methods known from the theory of statistical decisions. Phylogenetic invariants and statistical analysis with resampling techniques (bootstrap and jackknife) confirmed the preferred branching topology, which is significantly different from the topology known from phylogenetic trees based on 16S rRNA sequences. The preferred topology reconstructed by this analysis shows a common stem for the Methanococcales and Methanobacteriales and a separation of the thermophilic sulfur archaea from the methanogens and halophiles. The latter coincides with a unique phylogenetic location of a characteristic splitting event replacing the largest RNAP subunit of thermophilic sulfur archaea by two fragments in methanogens and halophiles. This topology is in good agreement with physiological and structural differences between the various archaea and demonstrates RNAP to be a suitable phylogenetic marker molecule.
Collapse
Affiliation(s)
- H P Klenk
- Max-Planck-Institut für Biochemie, Martinsried, Germany
| | | |
Collapse
|
41
|
Palm P, Schleper C, Arnold-Ammer I, Holz I, Meier T, Lottspeich F, Zillig W. The DNA-dependent RNA-polymerase of Thermotoga maritima; characterisation of the enzyme and the DNA-sequence of the genes for the large subunits. Nucleic Acids Res 1993; 21:4904-8. [PMID: 8177738 PMCID: PMC311404 DOI: 10.1093/nar/21.21.4904] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
An improved purification procedure for Thermotoga maritima RNA-polymerase holoenzyme was developed. The enzyme is highly active with poly dAT or T7 phage DNA as template. DNA gyrase was found to be a side product of this RNA-polymerase purification. The genes for the large subunits beta and beta' of RNA-polymerase were cloned and sequenced. The phylogenetic position of T.maritima within the bacterial domain was determined by various methods. It is the lowest bacterial offspring but slightly higher than the chloroplasts.
Collapse
Affiliation(s)
- P Palm
- Max Planck Institut für Biochemie, Martinsried, Germany
| | | | | | | | | | | | | |
Collapse
|
42
|
Müller-Schmid A, Ganss B, Gorr T, Hoffmann W. Molecular analysis of ependymins from the cerebrospinal fluid of the orders Clupeiformes and Salmoniformes: no indication for the existence of an euteleost infradivision. J Mol Evol 1993; 36:578-85. [PMID: 8350351 DOI: 10.1007/bf00556362] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Ependymins represent the predominant protein constituents in the cerebrospinal fluid of many teleost fish and they are synthesized in meningeal fibroblasts. Here, we present the ependymin sequences from the herring (Clupea harengus) and the pike (Esox lucius). A comparison of ependymin homologous sequences from three different orders of teleost fish (Salmoniformes, Cypriniformes, and Clupeiformes) revealed the highest similarity between Clupeiformes and Cypriniformes. This result is unexpected because it does not reflect current systematics, in which Clupeiformes belong to a separate infradivision (Clupeomorpha) than Salmoniformes and Cypriniformes (Euteleostei). Furthermore, in Salmoniformes the evolutionary rate of ependymins seems to be accelerated mainly on the protein level. However, considering these inconstant rates, neither neighbor-joining trees nor DNA parsimony methods gave any indication that a separate euteleost infradivision exists.
Collapse
Affiliation(s)
- A Müller-Schmid
- Max-Planck-Institut für Psychiatrie, Abteilung Neurochemie, Martinsried, Federal Republic of Germany
| | | | | | | |
Collapse
|
43
|
Tiboni O, Cammarano P, Sanangelantoni AM. Cloning and sequencing of the gene encoding glutamine synthetase I from the archaeum Pyrococcus woesei: anomalous phylogenies inferred from analysis of archaeal and bacterial glutamine synthetase I sequences. J Bacteriol 1993; 175:2961-9. [PMID: 8098326 PMCID: PMC204614 DOI: 10.1128/jb.175.10.2961-2969.1993] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
The gene glnA encoding glutamine synthetase I (GSI) from the archaeum Pyrococcus woesei was cloned and sequenced with the Sulfolobus solfataricus glnA gene as the probe. An operon reading frame of 448 amino acids was identified within a DNA segment of 1,528 bp. The encoded protein was 49% identical with the GSI of Methanococcus voltae and exhibited conserved regions characteristic of the GSI family. The P. woesei GSI was aligned with available homologs from other archaea (S. solfataricus, M. voltae) and with representative sequences from cyanobacteria, proteobacteria, and gram-positive bacteria. Phylogenetic trees were constructed from both the amino acid and the nucleotide sequence alignments. In accordance with the sequence similarities, archaeal and bacterial sequences did not segregate on a phylogeny. On the basis of sequence signatures, the GSI trees could be subdivided into two ensembles. One encompassed the GSI of cyanobacteria and proteobacteria, but also that of the high-G + C gram-positive bacterium Streptomyces coelicolor (all of which are regulated by the reversible adenylylation of the enzyme subunits); the other embraced the GSI of the three archaea as well as that of the low-G + C gram-positive bacteria (Clostridium acetobutilycum, Bacillus subtilis) and Thermotoga maritima (none of which are regulated by subunit adenylylation). The GSIs of the Thermotoga and the Bacillus-Clostridium lineages shared a direct common ancestor with that of P. woesei and the methanogens and were unrelated to their homologs from cyanobacteria, proteobacteria, and S. coelicolor. The possibility is presented that the GSI gene arose among the archaea and was then laterally transferred from some early methanogen to a Thermotoga-like organism. However, the relationship of the cyanobacterial-proteobacterial GSIs to the Thermotoga GSI and the GSI of low-G+C gram-positive bacteria remains unexplained.
Collapse
Affiliation(s)
- O Tiboni
- Dipartimento Genetica e Microbiologia A. Buzzati-Traverso, Università di Pavia, Italy
| | | | | |
Collapse
|
44
|
Lepiniec L, Keryer E, Philippe H, Gadal P, Crétin C. Sorghum phosphoenolpyruvate carboxylase gene family: structure, function and molecular evolution. PLANT MOLECULAR BIOLOGY 1993; 21:487-502. [PMID: 8443342 DOI: 10.1007/bf00028806] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Although housekeeping functions have been shown for the phosphoenolpyruvate carboxylase (EC 4.1.1.31, PEPC) in plants and in prokaryotes, PEPC is mainly known for its specific role in the primary photosynthetic CO2 fixation in C4 and CAM plants. We have shown that in Sorghum, a monocotyledonous C4 plant, the enzyme is encoded in the nucleus by a small multigene family. Here we report the entire nucleotide sequence (7.5 kb) of the third member (CP21) that completes the structure of the Sorghum PEPC gene family. Nucleotide composition, CpG islands and GC content of the three Sorghum PEPC genes are analysed with respect to their possible implications in the regulation of expression. A study of structure/function and phylogenetic relationships based on the compilation of all PEPC sequences known so far is presented. Data demonstrated that: (1) the different forms of plant PEPC have very similar primary structures, functional and regulatory properties, (2) neither apparent amino acid sequences nor phylogenetic relationships are specific for the C4 and CAM PEPCs and (3) expression of the different genes coding for the Sorghum PEPC isoenzymes is differently regulated (i.e. by light, nitrogen source) in a spatial and temporal manner. These results suggest that the main distinguishing feature between plant PEPCs is to be found at the level of genes expression rather than in their primary structure.
Collapse
Affiliation(s)
- L Lepiniec
- Laboratoire de Physiologie Végétale Moléculaire (URA-CNRS, 1128), Université de Paris-Sud, Orsay, France
| | | | | | | | | |
Collapse
|
45
|
van de Peer Y, Neefs JM, de Rijk P, de Wachter R. Evolution of eukaryotes as deduced from small ribosomal subunit RNA sequences. BIOCHEM SYST ECOL 1993. [DOI: 10.1016/0305-1978(93)90008-f] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
46
|
Cammarano P, Palm P, Creti R, Ceccarelli E, Sanangelantoni AM, Tiboni O. Early evolutionary relationships among known life forms inferred from elongation factor EF-2/EF-G sequences: phylogenetic coherence and structure of the archaeal domain. J Mol Evol 1992; 34:396-405. [PMID: 1602493 DOI: 10.1007/bf00162996] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Phylogenies were inferred from both the gene and the protein sequences of the translational elongation factor termed EF-2 (for Archaea and Eukarya) and EF-G (for Bacteria). All treeing methods used (distance-matrix, maximum likelihood, and parsimony), including evolutionary parsimony, support the archaeal tree and disprove the "eocyte tree" (i.e., the polyphyly and paraphyly of the Archaea). Distance-matrix trees derived from both the amino acid and the DNA sequence alignments (first and second codon positions) showed the Archaea to be a monophyletic-holophyletic grouping whose deepest bifurcation divides a Sulfolobus branch from a branch comprising Methanococcus, Halobacterium, and Thermoplasma. Bootstrapped distance-matrix treeing confirmed the monophyly-holophyly of Archaea in 100% of the samples and supported the bifurcation of Archaea into a Sulfolobus branch and a methanogen-halophile branch in 97% of the samples. Similar phylogenies were inferred by maximum likelihood and by maximum (protein and DNA) parsimony. DNA parsimony trees essentially identical to those inferred from first and second codon positions were derived from alternative DNA data sets comprising either the first or the second position of each codon. Bootstrapped DNA parsimony supported the monophyly-holophyly of Archaea in 100% of the bootstrap samples and confirmed the division of Archaea into a Sulfolobus branch and a methanogen-halophile branch in 93% of the bootstrap samples. Distance-matrix and maximum likelihood treeing under the constraint that branch lengths must be consistent with a molecular clock placed the root of the universal tree between the Bacteria and the bifurcation of Archaea and Eukarya. The results support the division of Archaea into the kingdoms Crenarchaeota (corresponding to the Sulfolobus branch and Euryarchaeota). This division was not confirmed by evolutionary parsimony, which identified Halobacterium rather than Sulfolobus as the deepest offspring within the Archaea.
Collapse
Affiliation(s)
- P Cammarano
- Istituto Pasteur-Fondazione Cenci Bolognetti, Dipartimento di Biopatologia Umana, Università di Roma, La Sapienza, Roma, Italy
| | | | | | | | | | | |
Collapse
|
47
|
|
48
|
Abstract
The distribution of functions within genomes of higher organisms relative to processes that lead to the spread of mutations in populations is examined in its general outlines. A number of points are enumerated that collectively put in question the concept of junk DNA: the plausible compatibility of DNA function with rapid substitution rates; the likelihood of superimposed functions along much of eukaryotic DNA; the potential for a merely conditional functionality in sequence repeats; the apparent adoption of macromolecular waste as a strategy for maintaining a function without selective grooming of individual sequence repeats that carry out the function; the likely requirement that any DNA sequence must be "polite" vis-'a-vis (compatible with) functional sequences in its genomic environment; the existence in germ-cell lineages of selective constraints that are not apparent in populations of individuals; and the fact that DNA techtonics - the appearance and disappearance of genomic DNA - are not incompatible with function. It is pointed out that the inverse correlation between functional constraints and rates of substitution cannot be claimed to be pillar of the neutral theory, because it is also predicted from a selectionist viewpoint. The dispensability of functional structures is brought into relation with the concept of reproductive sufficiency the survivability of genotypes in the absence of fitter alleles.
Collapse
Affiliation(s)
- E Zuckerkandl
- Linus Pauling Institute of Science and Medicine, Palo Alto, CA 94306
| |
Collapse
|
49
|
Shapiro SG. Uniformity in the nonsynonymous substitution rates of embryonic beta-globin genes of several vertebrate species. J Mol Evol 1991; 32:122-7. [PMID: 1901091 DOI: 10.1007/bf02515384] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The nucleotide substitution rate in structural portions of the embryonic beta-globin genes of placental mammals is lower than that for the adult beta-globin genes. This difference occurs entirely within the class of substitutions that result in nonsynonymous (replacement) differences between these genes, and therefore represents a constraint on the structure of the mammalian embryonic beta-globin proteins relative to the adult proteins (Shapiro et al. 1983; Hardison 1984). A similar effect has also been observed in marsupial mammals (Koop and Goodman 1988). In an effort to determine whether the observed rates are evidence of a uniform degree of selective constraint on the embryonic beta-globin genes, analyses were performed that compared replacement substitution rates. The analyses reveal that embryonic beta-globin genes appear to have been fixing replacement substitutions at nearly the same average rate not only in placental and marsupial mammals but in avian and amphibian species as well. In contrast, the adult beta-globin genes from these organisms appear to have a more variable rate of replacement substitution with an especially low rate for birds. In the chicken (Gallus gallus), the adult beta-globin gene replacement substitution rate appears to be lower than the embryonic replacement substitution rate.
Collapse
Affiliation(s)
- S G Shapiro
- Department of Zoology, University of Maryland, College Park 20742
| |
Collapse
|
50
|
Pesole G, Bozzetti MP, Lanave C, Preparata G, Saccone C. Glutamine synthetase gene evolution: a good molecular clock. Proc Natl Acad Sci U S A 1991; 88:522-6. [PMID: 1671172 PMCID: PMC50843 DOI: 10.1073/pnas.88.2.522] [Citation(s) in RCA: 85] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Glutamine synthetase (EC 6.3.1.2) gene evolution in various animals, plants, and bacteria was evaluated by a general stationary Markov model. The evolutionary process proved to be unexpectedly regular even for a time span as long as that between the divergence of prokaryotes from eukaryotes. This enabled us to draw phylogenetic trees for species whose phylogeny cannot be easily reconstructed from the fossil record. Our calculation of the times of divergence of the various organelle-specific enzymes led us to hypothesize that the pea and bean chloroplast genes for these enzymes originated from the duplication of nuclear genes as a result of the different metabolic needs of the various species. Our data indicate that the duplication of plastid glutamine synthetase genes occurred long after the endosymbiotic events that produced the organelles themselves.
Collapse
Affiliation(s)
- G Pesole
- Dipartimento di Biochimica e Biologia Molecolare, Universitá di Bari, Italy
| | | | | | | | | |
Collapse
|