1
|
Abstract
In the early 1980s, DNA sequencing became a routine and the increasing computing power opened the door to reconstruct molecular phylogenies using probabilistic approaches. DNA sequence alignments provided a large number of positions containing phylogenetic information, which could be extracted using explicit statistical models that described the mutation process using appropriate parameters. Consequently, an active quest started for building increasingly improved (more realistic) statistical models of nucleotide substitution. The simplest model assumed that nucleotide frequencies were in equilibrium and one single category of substitutions. Subsequent models allowed either unequal nucleotide frequencies or separate rates for transitions and transversions. The HKY85 model (Hasegawa et al. in J Mol Evol 22:160, 1985) combined elegantly both options into a single model, which became one of the most useful ones and has been the choice in many molecular phylogenetic studies ever since. The use of improved substitution models such as HKY85 allows reconstructing more accurate and reliable phylogenies, which in turn provide robust frameworks for understanding how biological diversity evolved and for performing a wealth of comparative studies in different disciplines such as ecology, biogeography, developmental biology, biochemistry, genomics, epidemiology, and biomedicine.
Collapse
Affiliation(s)
- Rafael Zardoya
- Departamento de Biodiversidad y Biología Evolutiva, Museo Nacional de Ciencias Naturales (MNCN-CSIC), José Gutiérrez Abascal, 2, 28006, Madrid, Spain.
| |
Collapse
|
2
|
Abstract
Background SARS-CoV-2 is a novel coronavirus that causes COVID-19 infection, with a closest known relative found in bats. For this virus, hundreds of genomes have been sequenced. This data provides insights into SARS-CoV-2 adaptations, determinants of pathogenicity and mutation patterns. A comparison between patterns of mutations that occurred before and after SARS-CoV-2 jumped to human hosts may reveal important evolutionary consequences of zoonotic transmission. Methods We used publically available complete genomes of SARS-CoV-2 to calculate relative frequencies of single nucleotide variations. These frequencies were compared with relative substitutions frequencies between SARS-CoV-2 and related animal coronaviruses. A similar analysis was performed for human coronaviruses SARS-CoV and HKU1. Results We found a 9-fold excess of G–U transversions among SARS-CoV-2 mutations over relative substitution frequencies between SARS-CoV-2 and a close relative coronavirus from bats (RaTG13). This suggests that mutation patterns of SARS-CoV-2 have changed after transmission to humans. The excess of G–U transversions was much smaller in a similar analysis for SARS-CoV and non-existent for HKU1. Remarkably, we did not find a similar excess of complementary C–A mutations in SARS-CoV-2. We discuss possible explanations for these observations.
Collapse
Affiliation(s)
- Alexander Y Panchin
- Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Yuri V Panchin
- Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
3
|
Guo C, McDowell IC, Nodzenski M, Scholtens DM, Allen AS, Lowe WL, Reddy TE. Transversions have larger regulatory effects than transitions. BMC Genomics 2017; 18:394. [PMID: 28525990 PMCID: PMC5438547 DOI: 10.1186/s12864-017-3785-4] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 05/10/2017] [Indexed: 12/30/2022] Open
Abstract
Background Transversions (Tv’s) are more likely to alter the amino acid sequence of proteins than transitions (Ts’s), and local deviations in the Ts:Tv ratio are indicative of evolutionary selection on genes. Whether the two different types of mutations have different effects in non-protein-coding sequences remains unknown. Genetic variants primarily impact gene expression by disrupting the binding of transcription factors (TFs) and other DNA-binding proteins. Because Tv’s cause larger changes in the shape of a DNA backbone, we hypothesized that Tv’s would have larger impacts on TF binding and gene expression. Results Here, we provide multiple lines of evidence demonstrating that Tv’s have larger impacts on regulatory DNA including analyses of TF binding motifs and allele-specific TF binding. In these analyses, we observed a depletion of Tv’s within TF binding motifs and TF binding sites. Using massively parallel population-scale reporter assays, we also provided empirical evidence that Tv’s have larger effects than Ts’s on the activity of human gene regulatory elements. Conclusions Tv’s are more likely to disrupt TF binding, resulting in larger changes in gene expression. Although the observed differences are small, these findings represent a novel, fundamental property of regulatory variation. Understanding the features of functional non-coding variation could be valuable for revealing the genetic underpinnings of complex traits and diseases in future studies. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3785-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Cong Guo
- Center for Genomic and Computational Biology, Duke University Medical School, Durham, NC, 27710, USA.,University Program in Genetics and Genomics, Duke University, Durham, NC, 27710, USA
| | - Ian C McDowell
- Center for Genomic and Computational Biology, Duke University Medical School, Durham, NC, 27710, USA.,Program in Computational Biology and Bioinformatics, Duke University, Durham, NC, 27710, USA
| | - Michael Nodzenski
- Department of Preventive Medicine, Division of Biostatistics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Denise M Scholtens
- Department of Preventive Medicine, Division of Biostatistics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Andrew S Allen
- Center for Statistical Genetics and Genomics, Duke University Durham, North Carolina, 27710, USA.,Department of Biostatistics and Bioinformatics, Duke University Medical School, Durham, NC, 27710, USA
| | - William L Lowe
- Division of Endocrinology, Metabolism and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Timothy E Reddy
- Center for Genomic and Computational Biology, Duke University Medical School, Durham, NC, 27710, USA. .,Department of Biostatistics and Bioinformatics, Duke University Medical School, Durham, NC, 27710, USA. .,Present Address: Biostatistics & Bioinformatics, 101 Science Dr., 2347 CIEMAS, Durham, NC, 27708, USA.
| |
Collapse
|