1
|
Zhou ZJ, Yang CH, Ye SB, Yu XW, Qiu Y, Ge XY. VirusRecom: an information-theory-based method for recombination detection of viral lineages and its application on SARS-CoV-2. Brief Bioinform 2023; 24:6886420. [PMID: 36567622 DOI: 10.1093/bib/bbac513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 10/08/2022] [Accepted: 10/27/2022] [Indexed: 12/27/2022] Open
Abstract
Genomic recombination is an important driving force for viral evolution, and recombination events have been reported for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) during the Coronavirus Disease 2019 pandemic, which significantly alter viral infectivity and transmissibility. However, it is difficult to identify viral recombination, especially for low-divergence viruses such as SARS-CoV-2, since it is hard to distinguish recombination from in situ mutation. Herein, we applied information theory to viral recombination analysis and developed VirusRecom, a program for efficiently screening recombination events on viral genome. In principle, we considered a recombination event as a transmission process of ``information'' and introduced weighted information content (WIC) to quantify the contribution of recombination to a certain region on viral genome; then, we identified the recombination regions by comparing WICs of different regions. In the benchmark using simulated data, VirusRecom showed a good balance between precision and recall compared to two competing tools, RDP5 and 3SEQ. In the detection of SARS-CoV-2 XE, XD and XF recombinants, VirusRecom providing more accurate positions of recombination regions than RDP5 and 3SEQ. In addition, we encapsulated the VirusRecom program into a command-line-interface software for convenient operation by users. In summary, we developed a novel approach based on information theory to identify viral recombination within highly similar sequences, providing a useful tool for monitoring viral evolution and epidemic control.
Collapse
Affiliation(s)
- Zhi-Jian Zhou
- Hunan Provincial Key Laboratory of Medical Virology, Institute of Pathogen Biology and Immunology, College of Biology, Hunan University, 27 Tianma Rd., Changsha, Hunan, 410012, China
| | - Chen-Hui Yang
- Hunan Provincial Key Laboratory of Medical Virology, Institute of Pathogen Biology and Immunology, College of Biology, Hunan University, 27 Tianma Rd., Changsha, Hunan, 410012, China
| | - Sheng-Bao Ye
- Hunan Provincial Key Laboratory of Medical Virology, Institute of Pathogen Biology and Immunology, College of Biology, Hunan University, 27 Tianma Rd., Changsha, Hunan, 410012, China
| | - Xiao-Wei Yu
- Hunan Provincial Key Laboratory of Medical Virology, Institute of Pathogen Biology and Immunology, College of Biology, Hunan University, 27 Tianma Rd., Changsha, Hunan, 410012, China.,Hunan Prevention and Treatment Institute for Occupational Diseases, 162 Xinjian W. Rd., Changsha, Hunan, 410000, China
| | - Ye Qiu
- Hunan Provincial Key Laboratory of Medical Virology, Institute of Pathogen Biology and Immunology, College of Biology, Hunan University, 27 Tianma Rd., Changsha, Hunan, 410012, China
| | - Xing-Yi Ge
- Hunan Provincial Key Laboratory of Medical Virology, Institute of Pathogen Biology and Immunology, College of Biology, Hunan University, 27 Tianma Rd., Changsha, Hunan, 410012, China
| |
Collapse
|
2
|
Immunoglobulin heavy constant gamma gene evolution is modulated by both the divergent and birth-and-death evolutionary models. Primates 2022; 63:611-625. [DOI: 10.1007/s10329-022-01019-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 08/31/2022] [Indexed: 11/27/2022]
|
3
|
Evolutionary genomic relationships and coupling in MK-STYX and STYX pseudophosphatases. Sci Rep 2022; 12:4139. [PMID: 35264672 PMCID: PMC8907265 DOI: 10.1038/s41598-022-07943-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 02/28/2022] [Indexed: 11/08/2022] Open
Abstract
The dual specificity phosphatase (DUSP) family has catalytically inactive members, called pseudophosphatases. They have mutations in their catalytic motifs that render them enzymatically inactive. This study analyzes the significance of two pseudophosphatases, MK-STYX [MAPK (mitogen-activated protein kinase phosphoserine/threonine/tyrosine-binding protein]) and STYX (serine/threonine/tyrosine-interacting protein), throughout their evolution and provides measurements and comparison of their evolutionary conservation. Phylogenetic trees were constructed to show any deviation from various species evolutionary paths. Data was collected on a large set of proteins that have either one of the two domains of MK-STYX, the DUSP domain or the cdc-25 homology (CH2) /rhodanese-like domain. The distance between species pairs for MK-STYX or STYX and Ka/Ks ratio were calculated. In addition, both pseudophosphatases were ranked among a large set of related proteins, including the active homologs of MK-STYX, MKP (MAPK phosphatase)-1 and MKP-3. MK-STYX had one of the highest species-species protein distances and was under weaker purifying selection pressure than most proteins with its domains. In contrast, the protein distances of STYX were lower than 82% of the DUSP-containing proteins and was under one of the strongest purifying selection pressures. However, there was similar selection pressure on the N-terminal sequences of MK-STYX, STYX, MKP-1, and MKP-3. We next perform statistical coupling analysis, a process that reveals interconnected regions within the proteins. We find that while MKP-1,-3, and STYX all have 2 functional units (sectors), MK-STYX only has one, and that MK-STYX is similar to MKP-3 in the evolutionary coupling of the active site and KIM domain. Within those two domains, the mean coupling is also most similar for MK-STYX and MKP-3. This study reveals striking distinctions between the evolutionary patterns of MK-STYX and STYX, suggesting a very specific role for each pseudophosphatase, further highlighting the relevance of these atypical members of DUSP as signaling regulators. Therefore, our study provides computational evidence and evolutionary reasons to further explore the properties of pseudophosphatases, in particular MK-STYX and STYX.
Collapse
|
4
|
Duarte MA, Fernandes CR, Heckel G, da Luz Mathias M, Bastos-Silveira C. Variation and Selection in the Putative Sperm-Binding Region of ZP3 in Muroid Rodents: A Comparison between Cricetids and Murines. Genes (Basel) 2021; 12:genes12091450. [PMID: 34573431 PMCID: PMC8469249 DOI: 10.3390/genes12091450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 09/15/2021] [Accepted: 09/16/2021] [Indexed: 11/16/2022] Open
Abstract
In mammals, the zona pellucida glycoprotein 3 (ZP3) is considered a primary sperm receptor of the oocyte and is hypothesized to be involved in reproductive isolation. We investigated patterns of diversity and selection in the putative sperm-binding region (pSBR) of mouse ZP3 across Cricetidae and Murinae, two hyperdiverse taxonomic groups within muroid rodents. In murines, the pSBR is fairly conserved, in particular the serine-rich stretch containing the glycosylation sites proposed as essential for sperm binding. In contrast, cricetid amino acid sequences of the pSBR were much more variable and the serine-rich motif, typical of murines, was generally substantially modified. Overall, our results suggest a general lack of species specificity of the pSBR across the two muroid families. We document statistical evidence of positive selection acting on exons 6 and 7 of ZP3 and identified several amino acid sites that are likely targets of selection, with most positively selected sites falling within or adjacent to the pSBR.
Collapse
Affiliation(s)
- Margarida Alexandra Duarte
- Champalimaud Centre for the Uknown, Champalimaud Research, Champalimaud Foundation, Avenida Brasília, 1400-038 Lisboa, Portugal
- Museu Nacional de História Natural e da Ciência, Departamento de Zoologia e Antropologia, Universidade de Lisboa, Rua da Escola Politécnica, 58, Lisboa, 1250-102 Lisboa, Portugal
- Departamento de Biologia Animal, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal;
- Centro de Estudos de Ambiente e Mar, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal
- Correspondence:
| | - Carlos Rodríguez Fernandes
- cE3c-Centre for Ecology, Evolution and Environmental Changes, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal; (C.R.F.); (C.B.-S.)
- Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal
| | - Gerald Heckel
- Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, CH-3012 Bern, Switzerland;
- SIB Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Amphipole, CH-1015 Lausanne, Switzerland
| | - Maria da Luz Mathias
- Departamento de Biologia Animal, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal;
- Centro de Estudos de Ambiente e Mar, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal
| | - Cristiane Bastos-Silveira
- cE3c-Centre for Ecology, Evolution and Environmental Changes, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal; (C.R.F.); (C.B.-S.)
| |
Collapse
|
5
|
Arenas M. ProteinEvolverABC: coestimation of recombination and substitution rates in protein sequences by approximate Bayesian computation. Bioinformatics 2021; 38:58-64. [PMID: 34450622 PMCID: PMC8696103 DOI: 10.1093/bioinformatics/btab617] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 07/24/2021] [Accepted: 08/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The evolutionary processes of mutation and recombination, upon which selection operates, are fundamental to understand the observed molecular diversity. Unlike nucleotide sequences, the estimation of the recombination rate in protein sequences has been little explored, neither implemented in evolutionary frameworks, despite protein sequencing methods are largely used. RESULTS In order to accommodate this need, here I present a computational framework, called ProteinEvolverABC, to jointly estimate recombination and substitution rates from alignments of protein sequences. The framework implements the approximate Bayesian computation approach, with and without regression adjustments and includes a variety of substitution models of protein evolution, demographics and longitudinal sampling. It also implements several nuisance parameters such as heterogeneous amino acid frequencies and rate of change among sites and, proportion of invariable sites. The framework produces accurate coestimation of recombination and substitution rates under diverse evolutionary scenarios. As illustrative examples of usage, I applied it to several viral protein families, including coronaviruses, showing heterogeneous substitution and recombination rates. AVAILABILITY AND IMPLEMENTATION ProteinEvolverABC is freely available from https://github.com/miguelarenas/proteinevolverabc, includes a graphical user interface for helping the specification of the input settings, extensive documentation and ready-to-use examples. Conveniently, the simulations can run in parallel on multicore machines. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
6
|
Ricaurte-Contreras LA, Lovera A, Moreno-Pérez DA, Bohórquez MD, Suárez CF, Gutiérrez-Vásquez E, Cuy-Chaparro L, Garzón-Ospina D, Patarroyo MA. Two 20-Residue-Long Peptides Derived from Plasmodium vivax Merozoite Surface Protein 10 EGF-Like Domains Are Involved in Binding to Human Reticulocytes. Int J Mol Sci 2021; 22:ijms22041609. [PMID: 33562650 PMCID: PMC7915351 DOI: 10.3390/ijms22041609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 01/21/2021] [Accepted: 02/02/2021] [Indexed: 11/30/2022] Open
Abstract
Plasmodium parasites’ invasion of their target cells is a complex, multi-step process involving many protein-protein interactions. Little is known about how complex the interaction with target cells is in Plasmodium vivax and few surface molecules related to reticulocytes’ adhesion have been described to date. Natural selection, functional and structural analysis were carried out on the previously described vaccine candidate P. vivax merozoite surface protein 10 (PvMSP10) for evaluating its role during initial contact with target cells. It has been shown here that the recombinant carboxyl terminal region (rPvMSP10-C) bound to adult human reticulocytes but not to normocytes, as validated by two different protein-cell interaction assays. Particularly interesting was the fact that two 20-residue-long regions (388DKEECRCRANYMPDDSVDYF407 and 415KDCSKENGNCDVNAECSIDK434) were able to inhibit rPvMSP10-C binding to reticulocytes and rosette formation using enriched target cells. These peptides were derived from PvMSP10 epidermal growth factor (EGF)-like domains (precisely, from a well-defined electrostatic zone) and consisted of regions having the potential of being B- or T-cell epitopes. These findings provide evidence, for the first time, about the fragments governing PvMSP10 binding to its target cells, thus highlighting the importance of studying them for inclusion in a P. vivax antimalarial vaccine.
Collapse
Affiliation(s)
- Laura Alejandra Ricaurte-Contreras
- Molecular Biology and Immunology Department, Fundación Instituto de Inmunología de Colombia (FIDIC), Carrera 50#26-20, Bogotá 111321, Colombia; (L.A.R.-C.); (A.L.); (D.A.M.-P.); (M.D.B.); (E.G.-V.); (L.C.-C.); (D.G.-O.)
- MSc Programme in Microbiology, Universidad Nacional de Colombia, Carrera 45#26-85, Bogotá 111321, Colombia
| | - Andrea Lovera
- Molecular Biology and Immunology Department, Fundación Instituto de Inmunología de Colombia (FIDIC), Carrera 50#26-20, Bogotá 111321, Colombia; (L.A.R.-C.); (A.L.); (D.A.M.-P.); (M.D.B.); (E.G.-V.); (L.C.-C.); (D.G.-O.)
| | - Darwin Andrés Moreno-Pérez
- Molecular Biology and Immunology Department, Fundación Instituto de Inmunología de Colombia (FIDIC), Carrera 50#26-20, Bogotá 111321, Colombia; (L.A.R.-C.); (A.L.); (D.A.M.-P.); (M.D.B.); (E.G.-V.); (L.C.-C.); (D.G.-O.)
| | - Michel David Bohórquez
- Molecular Biology and Immunology Department, Fundación Instituto de Inmunología de Colombia (FIDIC), Carrera 50#26-20, Bogotá 111321, Colombia; (L.A.R.-C.); (A.L.); (D.A.M.-P.); (M.D.B.); (E.G.-V.); (L.C.-C.); (D.G.-O.)
| | - Carlos Fernando Suárez
- Biomathematics Department, Fundación Instituto de Inmunología de Colombia (FIDIC), Carrera 50#26-20, Bogotá 111321, Colombia;
| | - Elizabeth Gutiérrez-Vásquez
- Molecular Biology and Immunology Department, Fundación Instituto de Inmunología de Colombia (FIDIC), Carrera 50#26-20, Bogotá 111321, Colombia; (L.A.R.-C.); (A.L.); (D.A.M.-P.); (M.D.B.); (E.G.-V.); (L.C.-C.); (D.G.-O.)
| | - Laura Cuy-Chaparro
- Molecular Biology and Immunology Department, Fundación Instituto de Inmunología de Colombia (FIDIC), Carrera 50#26-20, Bogotá 111321, Colombia; (L.A.R.-C.); (A.L.); (D.A.M.-P.); (M.D.B.); (E.G.-V.); (L.C.-C.); (D.G.-O.)
| | - Diego Garzón-Ospina
- Molecular Biology and Immunology Department, Fundación Instituto de Inmunología de Colombia (FIDIC), Carrera 50#26-20, Bogotá 111321, Colombia; (L.A.R.-C.); (A.L.); (D.A.M.-P.); (M.D.B.); (E.G.-V.); (L.C.-C.); (D.G.-O.)
| | - Manuel Alfonso Patarroyo
- Molecular Biology and Immunology Department, Fundación Instituto de Inmunología de Colombia (FIDIC), Carrera 50#26-20, Bogotá 111321, Colombia; (L.A.R.-C.); (A.L.); (D.A.M.-P.); (M.D.B.); (E.G.-V.); (L.C.-C.); (D.G.-O.)
- Health Sciences Division, Main Campus, Universidad Santo Tomás, Carrera 9#51-11, Bogotá 110231, Colombia
- Microbiology Department, Faculty of Medicine, Universidad Nacional de Colombia, Carrera 45#26-85, Bogotá 111321, Colombia
- Correspondence:
| |
Collapse
|
7
|
Del Amparo R, Branco C, Arenas J, Vicens A, Arenas M. Analysis of selection in protein-coding sequences accounting for common biases. Brief Bioinform 2021; 22:6105943. [PMID: 33479739 DOI: 10.1093/bib/bbaa431] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/17/2020] [Accepted: 12/22/2020] [Indexed: 12/16/2022] Open
Abstract
The evolution of protein-coding genes is usually driven by selective processes, which favor some evolutionary trajectories over others, optimizing the subsequent protein stability and activity. The analysis of selection in this type of genetic data is broadly performed with the metric nonsynonymous/synonymous substitution rate ratio (dN/dS). However, most of the well-established methodologies to estimate this metric make crucial assumptions, such as lack of recombination or invariable codon frequencies along genes, which can bias the estimation. Here, we review the most relevant biases in the dN/dS estimation and provide a detailed guide to estimate this metric using state-of-the-art procedures that account for such biases, along with illustrative practical examples and recommendations. We also discuss the traditional interpretation of the estimated dN/dS emphasizing the importance of considering complementary biological information such as the role of the observed substitutions on the stability and function of proteins. This review is oriented to help evolutionary biologists that aim to accurately estimate selection in protein-coding sequences.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Jesús Arenas
- Unit of Microbiology and Immunology, University of Zaragoza, 50013 Zaragoza, Spain
| | - Alberto Vicens
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
8
|
Silva Pereira S, de Almeida Castilho Neto KJG, Duffy CW, Richards P, Noyes H, Ogugo M, Rogério André M, Bengaly Z, Kemp S, Teixeira MMG, Machado RZ, Jackson AP. Variant antigen diversity in Trypanosoma vivax is not driven by recombination. Nat Commun 2020; 11:844. [PMID: 32051413 PMCID: PMC7015903 DOI: 10.1038/s41467-020-14575-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 01/18/2020] [Indexed: 11/09/2022] Open
Abstract
African trypanosomes (Trypanosoma) are vector-borne haemoparasites that survive in the vertebrate bloodstream through antigenic variation of their Variant Surface Glycoprotein (VSG). Recombination, or rather segmented gene conversion, is fundamental in Trypanosoma brucei for both VSG gene switching and for generating antigenic diversity during infections. Trypanosoma vivax is a related, livestock pathogen whose VSG lack structures that facilitate gene conversion in T. brucei and mechanisms underlying its antigenic diversity are poorly understood. Here we show that species-wide VSG repertoire is broadly conserved across diverse T. vivax clinical strains and has limited antigenic repertoire. We use variant antigen profiling, coalescent approaches and experimental infections to show that recombination plays little role in diversifying T. vivax VSG sequences. These results have immediate consequences for both the current mechanistic model of antigenic variation in African trypanosomes and species differences in virulence and transmission, requiring reconsideration of the wider epidemiology of animal African trypanosomiasis.
Collapse
Affiliation(s)
- Sara Silva Pereira
- Department of Infection Biology, Institute of Infection and Global Health, University of Liverpool, 146 Brownlow Hill, Liverpool, L3 5RF, UK
| | - Kayo J G de Almeida Castilho Neto
- Department of Veterinary Pathology, Faculty of Agrarian and Veterinary Sciences, São Paulo State University (UNESP), Jaboticabal, SP, Brazil
| | - Craig W Duffy
- Department of Infection Biology, Institute of Infection and Global Health, University of Liverpool, 146 Brownlow Hill, Liverpool, L3 5RF, UK
| | - Peter Richards
- Department of Infection Biology, Institute of Infection and Global Health, University of Liverpool, 146 Brownlow Hill, Liverpool, L3 5RF, UK
| | - Harry Noyes
- Institute of Integrative Biology, University of Liverpool, Biosciences Building, Crown Street, Liverpool, L69 7ZB, UK
| | - Moses Ogugo
- Livestock Genetic Programme, International Livestock Research Institute, 30709 Naivasha Road, Nairobi, Kenya
| | - Marcos Rogério André
- Department of Veterinary Pathology, Faculty of Agrarian and Veterinary Sciences, São Paulo State University (UNESP), Jaboticabal, SP, Brazil
| | - Zakaria Bengaly
- International Research Centre for Livestock Development in the Sub-humid Zone (CIRDES), No. 559, rue 5-31 angle, Avenue du Gouverneur Louveau, Bobo-Dioulasso, Burkina Faso
| | - Steve Kemp
- Livestock Genetic Programme, International Livestock Research Institute, 30709 Naivasha Road, Nairobi, Kenya
| | - Marta M G Teixeira
- Department of Parasitology, Institute of Biomedical Sciences, University of Sao Paulo, Avenue Professor Lineu Prestes, 1374 Cidade Universitaria, Sao Paulo, SP, 05508-000, Brazil
| | - Rosangela Z Machado
- Department of Veterinary Pathology, Faculty of Agrarian and Veterinary Sciences, São Paulo State University (UNESP), Jaboticabal, SP, Brazil
| | - Andrew P Jackson
- Department of Infection Biology, Institute of Infection and Global Health, University of Liverpool, 146 Brownlow Hill, Liverpool, L3 5RF, UK.
| |
Collapse
|
9
|
Del Amparo R, Vicens A, Arenas M. The influence of heterogeneous codon frequencies along sequences on the estimation of molecular adaptation. Bioinformatics 2020; 36:430-436. [PMID: 31304972 DOI: 10.1093/bioinformatics/btz558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 07/08/2019] [Accepted: 07/11/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The nonsynonymous/synonymous substitution rate ratio (dN/dS) is a commonly used parameter to quantify molecular adaptation in protein-coding data. It is known that the estimation of dN/dS can be biased if some evolutionary processes are ignored. In this concern, common ML methods to estimate dN/dS assume invariable codon frequencies among sites, despite this characteristic is rare in nature, and it could bias the estimation of this parameter. RESULTS Here we studied the influence of variable codon frequencies among genetic regions on the estimation of dN/dS. We explored scenarios varying the number of genetic regions that differ in codon frequencies, the amount of variability of codon frequencies among regions and the nucleotide frequencies at each codon position among regions. We found that ignoring heterogeneous codon frequencies among regions overall leads to underestimation of dN/dS and the bias increases with the level of heterogeneity of codon frequencies. Interestingly, we also found that varying nucleotide frequencies among regions at the first or second codon position leads to underestimation of dN/dS while variation at the third codon position leads to overestimation of dN/dS. Next, we present a methodology to reduce this bias based on the analysis of partitions presenting similar codon frequencies and we applied it to analyze four real datasets. We conclude that accounting for heterogeneous codon frequencies along sequences is required to obtain realistic estimates of molecular adaptation through this relevant evolutionary parameter. AVAILABILITY AND IMPLEMENTATION The applied frameworks for the computer simulations of protein-coding data and estimation of molecular adaptation are SGWE and PAML, respectively. Both are publicly available and referenced in the study. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Roberto Del Amparo
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| | - Alberto Vicens
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
10
|
Yoshizaki S, Akahori H, Umemura T, Terada T, Takashima Y, Muto Y. Genome-wide analyses reveal genes subject to positive selection in Toxoplasma gondii. Gene 2019; 699:73-79. [PMID: 30858136 DOI: 10.1016/j.gene.2019.03.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2018] [Revised: 03/05/2019] [Accepted: 03/06/2019] [Indexed: 10/27/2022]
Abstract
Toxoplasma gondii is an important protozoan pathogen that infects many wild and domestic animals and causes infections in immunocompromised humans. However, there has been little investigation of the molecular evolutionary trajectories of this pathogenic protozoa using comparative genomics data. Here, we employed a comparative evolutionary genomics approach to identify genes that are under site- and lineage-specific positive selection in nine strains of T. gondii, including two closely related species, Neospora caninum and Hammondia hammondi. Based on the analyses of five coccidian core genomes, 4.5% of the 5788 core genome genes showed strong signals for positive selection in the site model. In addition, the branch-site model analyses in the nine T. gondii core genomes indicated that 2 to 20 genes underwent significant positive selection along each lineage leading to T. gondii strains. Many of the protein products encoded by the positively selected genes are secretory or surface proteins that have previously been implicated in host pathogenesis. The adaptive changes in these positively selected genes might be related to dynamic interactions between the host immune systems and might play a crucial role in the infection and pathogenic processes of T. gondii.
Collapse
Affiliation(s)
- Sumio Yoshizaki
- United Graduate School of Drug Discovery and Medical Information Sciences, Gifu University, 1-1, Yanagido, Gifu 501-1193, Japan; Department of Nursing, Heisei College of Health Sciences, 180 Kurono, Gifu 501-1131, Japan
| | - Hiromichi Akahori
- Department of Functional Bioscience, Gifu University School of Medicine, 1-1, Yanagido, Gifu 501-1193, Japan
| | - Toshiaki Umemura
- Graduate School of Medicine and Pharmaceutical Sciences, University of Toyama, 2630 Sugitani, Toyama 930-0194, Japan
| | - Tomoyoshi Terada
- United Graduate School of Drug Discovery and Medical Information Sciences, Gifu University, 1-1, Yanagido, Gifu 501-1193, Japan; Department of Functional Bioscience, Gifu University School of Medicine, 1-1, Yanagido, Gifu 501-1193, Japan
| | - Yasuhiro Takashima
- Department of Veterinary Parasitology, Faculty of Applied Biological Sciences, Gifu University, 1-1 Yanagido, Gifu 501-1193, Japan; Center for Highly Advanced Integration of Nano and Life Sciences, Gifu University (G-CHAIN), 1-1 Yanagido, Gifu 501-1193, Japan
| | - Yoshinori Muto
- United Graduate School of Drug Discovery and Medical Information Sciences, Gifu University, 1-1, Yanagido, Gifu 501-1193, Japan; Department of Functional Bioscience, Gifu University School of Medicine, 1-1, Yanagido, Gifu 501-1193, Japan.
| |
Collapse
|
11
|
The Influence of Protein Stability on Sequence Evolution: Applications to Phylogenetic Inference. Methods Mol Biol 2019; 1851:215-231. [PMID: 30298399 DOI: 10.1007/978-1-4939-8736-8_11] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Phylogenetic inference from protein data is traditionally based on empirical substitution models of evolution that assume that protein sites evolve independently of each other and under the same substitution process. However, it is well known that the structural properties of a protein site in the native state affect its evolution, in particular the sequence entropy and the substitution rate. Starting from the seminal proposal by Halpern and Bruno, where structural properties are incorporated in the evolutionary model through site-specific amino acid frequencies, several models have been developed to tackle the influence of protein structure on sequence evolution. Here we describe stability-constrained substitution (SCS) models that explicitly consider the stability of the native state against both unfolded and misfolded states. One of them, the mean-field model, provides an independent sites approximation that can be readily incorporated in maximum likelihood methods of phylogenetic inference, including ancestral sequence reconstruction. Next, we describe its validation with simulated and real proteins and its limitations and advantages with respect to empirical models that lack site specificity. We finally provide guidelines and recommendations to analyze protein data accounting for stability constraints, including computer simulations and inferences of protein evolution based on maximum likelihood. Some practical examples are included to illustrate these procedures.
Collapse
|
12
|
Selecting among Alternative Scenarios of Human Evolution by Simulated Genetic Gradients. Genes (Basel) 2018; 9:genes9100506. [PMID: 30340387 PMCID: PMC6210830 DOI: 10.3390/genes9100506] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 10/11/2018] [Accepted: 10/16/2018] [Indexed: 11/16/2022] Open
Abstract
Selecting among alternative scenarios of human evolution is nowadays a common methodology to investigate the history of our species. This strategy is usually based on computer simulations of genetic data under different evolutionary scenarios, followed by a fitting of the simulated data with the real data. A recent trend in the investigation of ancestral evolutionary processes of modern humans is the application of genetic gradients as a measure of fitting, since evolutionary processes such as range expansions, range contractions, and population admixture (among others) can lead to different genetic gradients. In addition, this strategy allows the analysis of the genetic causes of the observed genetic gradients. Here, we review recent findings on the selection among alternative scenarios of human evolution based on simulated genetic gradients, including pros and cons. First, we describe common methodologies to simulate genetic gradients and apply them to select among alternative scenarios of human evolution. Next, we review previous studies on the influence of range expansions, population admixture, last glacial period, and migration with long-distance dispersal on genetic gradients for some regions of the world. Finally, we discuss this analytical approach, including technical limitations, required improvements, and advice. Although here we focus on human evolution, this approach could be extended to study other species.
Collapse
|
13
|
Diaz F, Allan CW, Matzkin LM. Positive selection at sites of chemosensory genes is associated with the recent divergence and local ecological adaptation in cactophilic Drosophila. BMC Evol Biol 2018; 18:144. [PMID: 30236055 PMCID: PMC6148956 DOI: 10.1186/s12862-018-1250-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2018] [Accepted: 08/20/2018] [Indexed: 11/25/2022] Open
Abstract
Background Adaptation to new hosts in phytophagous insects often involves mechanisms of host recognition by genes of sensory pathways. Most often the molecular evolution of sensory genes has been explained in the context of the birth-and-death model. The role of positive selection is less understood, especially associated with host adaptation and specialization. Here we aim to contribute evidence for this latter hypothesis by considering the case of Drosophila mojavensis, a species with an evolutionary history shaped by multiple host shifts in a relatively short time scale, and its generalist sister species, D. arizonae. Results We used a phylogenetic and population genetic analysis framework to test for positive selection in a subset of four chemoreceptor genes, one gustatory receptor (Gr) and three odorant receptors (Or), for which their expression has been previously associated with host shifts. We found strong evidence of positive selection at several amino acid sites in all genes investigated, most of which exhibited changes predicted to cause functional effects in these transmembrane proteins. A significant portion of the sites identified as evolving positively were largely found in the cytoplasmic region, although a few were also present in the extracellular domains. Conclusions The pattern of substitution observed suggests that some of these changes likely had an effect on signal transduction as well as odorant recognition and protein-protein interactions. These findings support the role of positive selection in shaping the pattern of variation at chemosensory receptors, both during the specialization onto one or a few related hosts, but as well as during the evolution and adaptation of generalist species into utilizing several hosts. Electronic supplementary material The online version of this article (10.1186/s12862-018-1250-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Fernando Diaz
- Department of Entomology, University of Arizona, Tucson, AZ, 85721, USA
| | - Carson W Allan
- Department of Entomology, University of Arizona, Tucson, AZ, 85721, USA
| | - Luciano M Matzkin
- Department of Entomology, University of Arizona, Tucson, AZ, 85721, USA. .,Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA. .,BIO5 Institute, University of Arizona, Tucson, AZ, 85721, USA.
| |
Collapse
|
14
|
Camargo-Ayala PA, Garzón-Ospina D, Moreno-Pérez DA, Ricaurte-Contreras LA, Noya O, Patarroyo MA. On the Evolution and Function of Plasmodium vivax Reticulocyte Binding Surface Antigen ( pvrbsa). Front Genet 2018; 9:372. [PMID: 30250483 PMCID: PMC6139305 DOI: 10.3389/fgene.2018.00372] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 08/23/2018] [Indexed: 12/28/2022] Open
Abstract
The RBSA protein is encoded by a gene described in Plasmodium species having tropism for reticulocytes. Since this protein is antigenic in natural infections and can bind to target cells, it has been proposed as a potential candidate for an anti-Plasmodium vivax vaccine. However, genetic diversity (a challenge which must be overcome for ensuring fully effective vaccine design) has not been described at this locus. Likewise, the minimum regions mediating specific parasite-host interaction have not been determined. This is why the rbsa gene’s evolutionary history is being here described, as well as the P. vivax rbsa (pvrbsa) genetic diversity and the specific regions mediating parasite adhesion to reticulocytes. Unlike what has previously been reported, rbsa was also present in several parasite species belonging to the monkey-malaria clade; paralogs were also found in Plasmodium parasites invading reticulocytes. The pvrbsa locus had less diversity than other merozoite surface proteins where natural selection and recombination were the main evolutionary forces involved in causing the observed polymorphism. The N-terminal end (PvRBSA-A) was conserved and under functional constraint; consequently, it was expressed as recombinant protein for binding assays. This protein fragment bound to reticulocytes whilst the C-terminus, included in recombinant PvRBSA-B (which was not under functional constraint), did not. Interestingly, two PvRBSA-A-derived peptides were able to inhibit protein binding to reticulocytes. Specific conserved and functionally important peptides within PvRBSA-A could thus be considered when designing a fully-effective vaccine against P. vivax.
Collapse
Affiliation(s)
- Paola Andrea Camargo-Ayala
- Department of Molecular Biology and Immunology, Fundación Instituto de Inmunología de Colombia (FIDIC), Bogotá, Colombia.,Microbiology Postgraduate Programme, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Diego Garzón-Ospina
- Department of Molecular Biology and Immunology, Fundación Instituto de Inmunología de Colombia (FIDIC), Bogotá, Colombia.,PhD Programme in Biomedical and Biological Sciences, Universidad del Rosario, Bogotá, Colombia
| | - Darwin Andrés Moreno-Pérez
- Department of Molecular Biology and Immunology, Fundación Instituto de Inmunología de Colombia (FIDIC), Bogotá, Colombia.,Livestock Sciences Faculty, Universidad de Ciencias Aplicadas y Ambientales, Bogotá, Colombia
| | | | - Oscar Noya
- Instituto de Medicina Tropical, Facultad de Medicina, Universidad Central de Venezuela, Caracas, Venezuela
| | - Manuel A Patarroyo
- Department of Molecular Biology and Immunology, Fundación Instituto de Inmunología de Colombia (FIDIC), Bogotá, Colombia.,School of Medicine and Health Sciences, Universidad del Rosario, Bogotá, Colombia
| |
Collapse
|
15
|
Pérez-Losada M, Arenas M, Castro-Nallar E. Microbial sequence typing in the genomic era. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2018; 63:346-359. [PMID: 28943406 PMCID: PMC5908768 DOI: 10.1016/j.meegid.2017.09.022] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Revised: 09/18/2017] [Accepted: 09/19/2017] [Indexed: 12/18/2022]
Abstract
Next-generation sequencing (NGS), also known as high-throughput sequencing, is changing the field of microbial genomics research. NGS allows for a more comprehensive analysis of the diversity, structure and composition of microbial genes and genomes compared to the traditional automated Sanger capillary sequencing at a lower cost. NGS strategies have expanded the versatility of standard and widely used typing approaches based on nucleotide variation in several hundred DNA sequences and a few gene fragments (MLST, MLVA, rMLST and cgMLST). NGS can now accommodate variation in thousands or millions of sequences from selected amplicons to full genomes (WGS, NGMLST and HiMLST). To extract signals from high-dimensional NGS data and make valid statistical inferences, novel analytic and statistical techniques are needed. In this review, we describe standard and new approaches for microbial sequence typing at gene and genome levels and guidelines for subsequent analysis, including methods and computational frameworks. We also present several applications of these approaches to some disciplines, namely genotyping, phylogenetics and molecular epidemiology.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Ashburn, VA 20147, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal; Children's National Medical Center, Washington, DC 20010, USA.
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Eduardo Castro-Nallar
- Universidad Andrés Bello, Center for Bioinformatics and Integrative Biology, Facultad de Ciencias Biológicas, Santiago 8370146, Chile
| |
Collapse
|
16
|
Brown T, Didelot X, Wilson DJ, Maio ND. SimBac: simulation of whole bacterial genomes with homologous recombination. Microb Genom 2018; 2. [PMID: 27713837 PMCID: PMC5049688 DOI: 10.1099/mgen.0.000044] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Bacteria can exchange genetic material, or acquire genes found in the environment. This process, generally known as bacterial recombination, can have a strong impact on the evolution and phenotype of bacteria, for example causing the spread of antibiotic resistance across clades and species, but can also disrupt phylogenetic and transmission inferences. With the increasing affordability of whole genome sequencing, the need has emerged for an efficient simulator of bacterial evolution to test and compare methods for phylogenetic and population genetic inference, and for simulation-based estimation. We present SimBac, a whole-genome bacterial evolution simulator that is roughly two orders of magnitude faster than previous software and includes a more general model of bacterial evolution, allowing both within- and between-species homologous recombination. Since methods modelling bacterial recombination generally focus on only one of these two modes of recombination, the possibility to simulate both allows for a general and fair benchmarking. SimBac is available from https://github.com/tbrown91/SimBac and is distributed as open source under the terms of the GNU General Public Licence.
Collapse
Affiliation(s)
- Thomas Brown
- 1 Doctoral Training Centre, University of Oxford, Oxford, UK
| | - Xavier Didelot
- 2 Department of Infectious Disease Epidemiology, Imperial College, London, UK
| | - Daniel J Wilson
- 3 Institute for Emerging Infections, Oxford Martin School, Oxford, UK.,4 Nuffield Department of Medicine, University of Oxford, Oxford, UK.,5 Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Nicola De Maio
- 4 Nuffield Department of Medicine, University of Oxford, Oxford, UK.,3 Institute for Emerging Infections, Oxford Martin School, Oxford, UK
| |
Collapse
|
17
|
Sharbrough J, Luse M, Boore JL, Logsdon JM, Neiman M. Radical amino acid mutations persist longer in the absence of sex. Evolution 2018. [PMID: 29520921 DOI: 10.1111/evo.13465] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Harmful mutations are ubiquitous and inevitable, and the rate at which these mutations are removed from populations is a critical determinant of evolutionary fate. Closely related sexual and asexual taxa provide a particularly powerful setting to study deleterious mutation elimination because sexual reproduction should facilitate mutational clearance by reducing selective interference between sites and by allowing the production of offspring with different mutational complements than their parents. Here, we compared the rate of removal of conservative (i.e., similar biochemical properties) and radical (i.e., distinct biochemical properties) nonsynonymous mutations from mitochondrial genomes of sexual versus asexual Potamopyrgus antipodarum, a New Zealand freshwater snail characterized by coexisting and ecologically similar sexual and asexual lineages. Our analyses revealed that radical nonsynonymous mutations are cleared at higher rates than conservative changes and that sexual lineages eliminate radical changes more rapidly than asexual counterparts. These results are consistent with reduced efficacy of purifying selection in asexual lineages allowing harmful mutations to remain polymorphic longer than in sexual lineages. Together, these data illuminate some of the population-level processes contributing to mitochondrial mutation accumulation and suggest that mutation accumulation could influence the outcome of competition between sexual and asexual lineages.
Collapse
Affiliation(s)
- Joel Sharbrough
- Department of Biology, University of Iowa, Iowa City, Iowa 52242.,Department of Biology, Colorado State University, Fort Collins, Colorado 80523
| | - Meagan Luse
- Department of Biology, University of Iowa, Iowa City, Iowa 52242
| | - Jeffrey L Boore
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California 94720.,Providence St. Joseph Health and Institute for Systems Biology, Seattle, Washington 98109
| | - John M Logsdon
- Department of Biology, University of Iowa, Iowa City, Iowa 52242
| | - Maurine Neiman
- Department of Biology, University of Iowa, Iowa City, Iowa 52242
| |
Collapse
|
18
|
Zhao ZM, Campbell MC, Li N, Lee DSW, Zhang Z, Townsend JP. Detection of Regional Variation in Selection Intensity within Protein-Coding Genes Using DNA Sequence Polymorphism and Divergence. Mol Biol Evol 2018; 34:3006-3022. [PMID: 28962009 DOI: 10.1093/molbev/msx213] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Numerous approaches have been developed to infer natural selection based on the comparison of polymorphism within species and divergence between species. These methods are especially powerful for the detection of uniform selection operating across a gene. However, empirical analyses have demonstrated that regions of protein-coding genes exhibiting clusters of amino acid substitutions are subject to different levels of selection relative to other regions of the same gene. To quantify this heterogeneity of selection within coding sequences, we developed Model Averaged Site Selection via Poisson Random Field (MASS-PRF). MASS-PRF identifies an ensemble of intragenic clustering models for polymorphic and divergent sites. This ensemble of models is used within the Poisson Random Field framework to estimate selection intensity on a site-by-site basis. Using simulations, we demonstrate that MASS-PRF has high power to detect clusters of amino acid variants in small genic regions, can reliably estimate the probability of a variant occurring at each nucleotide site in sequence data and is robust to historical demographic trends and recombination. We applied MASS-PRF to human gene polymorphism derived from the 1,000 Genomes Project and divergence data from the common chimpanzee. On the basis of this analysis, we discovered striking regional variation in selection intensity, indicative of positive or negative selection, in well-defined domains of genes that have previously been associated with neurological processing, immunity, and reproduction. We suggest that amino acid-altering substitutions within these regions likely are or have been selectively advantageous in the human lineage, playing important roles in protein function.
Collapse
Affiliation(s)
- Zi-Ming Zhao
- Department of Biostatistics, Yale University, New Haven, CT
| | - Michael C Campbell
- Department of Biostatistics, Yale University, New Haven, CT.,Department of Biology, Howard University, Washington, DC
| | - Ning Li
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT
| | - Daniel S W Lee
- Department of Biostatistics, Yale University, New Haven, CT
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Jeffrey P Townsend
- Department of Biostatistics, Yale University, New Haven, CT.,Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT.,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT
| |
Collapse
|
19
|
Zhang W, Westerman E, Nitzany E, Palmer S, Kronforst MR. Tracing the origin and evolution of supergene mimicry in butterflies. Nat Commun 2017; 8:1269. [PMID: 29116078 PMCID: PMC5677128 DOI: 10.1038/s41467-017-01370-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 09/12/2017] [Indexed: 12/30/2022] Open
Abstract
Supergene mimicry is a striking phenomenon but we know little about the evolution of this trait in any species. Here, by studying genomes of butterflies from a recent radiation in which supergene mimicry has been isolated to the gene doublesex, we show that sexually dimorphic mimicry and female-limited polymorphism are evolutionarily related as a result of ancient balancing selection combined with independent origins of similar morphs in different lineages and secondary loss of polymorphism in other lineages. Evolutionary loss of polymorphism appears to have resulted from an interaction between natural selection and genetic drift. Furthermore, molecular evolution of the supergene is dominated not by adaptive protein evolution or balancing selection, but by extensive hitchhiking of linked variants on the mimetic dsx haplotype that occurred at the origin of mimicry. Our results suggest that chance events have played important and possibly opposing roles throughout the history of this classic example of adaptation.
Collapse
Affiliation(s)
- Wei Zhang
- Department of Ecology & Evolution, University of Chicago, Chicago, IL, 60637, USA
| | - Erica Westerman
- Department of Ecology & Evolution, University of Chicago, Chicago, IL, 60637, USA
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, 72701, USA
| | - Eyal Nitzany
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL, 60637, USA
| | - Stephanie Palmer
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL, 60637, USA
| | - Marcus R Kronforst
- Department of Ecology & Evolution, University of Chicago, Chicago, IL, 60637, USA.
| |
Collapse
|
20
|
Arenas M, Araujo NM, Branco C, Castelhano N, Castro-Nallar E, Pérez-Losada M. Mutation and recombination in pathogen evolution: Relevance, methods and controversies. INFECTION GENETICS AND EVOLUTION 2017; 63:295-306. [PMID: 28951202 DOI: 10.1016/j.meegid.2017.09.029] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 09/20/2017] [Accepted: 09/21/2017] [Indexed: 02/06/2023]
Abstract
Mutation and recombination drive the evolution of most pathogens by generating the genetic variants upon which selection operates. Those variants can, for example, confer resistance to host immune systems and drug therapies or lead to epidemic outbreaks. Given their importance, diverse evolutionary studies have investigated the abundance and consequences of mutation and recombination in pathogen populations. However, some controversies persist regarding the contribution of each evolutionary force to the development of particular phenotypic observations (e.g., drug resistance). In this study, we revise the importance of mutation and recombination in the evolution of pathogens at both intra-host and inter-host levels. We also describe state-of-the-art analytical methodologies to detect and quantify these two evolutionary forces, including biases that are often ignored in evolutionary studies. Finally, we present some of our former studies involving pathogenic taxa where mutation and recombination played crucial roles in the recovery of pathogenic fitness, the generation of interspecific genetic diversity, or the design of centralized vaccines. This review also illustrates several common controversies and pitfalls in the analysis and in the evaluation and interpretation of mutation and recombination outcomes.
Collapse
Affiliation(s)
- Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain; Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
| | - Natalia M Araujo
- Laboratory of Molecular Virology, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Brazil.
| | - Catarina Branco
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
| | - Nadine Castelhano
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
| | - Eduardo Castro-Nallar
- Universidad Andrés Bello, Center for Bioinformatics and Integrative Biology, Facultad de Ciencias Biológicas, Santiago, Chile.
| | - Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Ashburn, VA 20147, Washington, DC, United States; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal.
| |
Collapse
|
21
|
Abstract
Bacteria can exchange and acquire new genetic material from other organisms directly and via the environment. This process, known as bacterial recombination, has a strong impact on the evolution of bacteria, for example, leading to the spread of antibiotic resistance across clades and species, and to the avoidance of clonal interference. Recombination hinders phylogenetic and transmission inference because it creates patterns of substitutions (homoplasies) inconsistent with the hypothesis of a single evolutionary tree. Bacterial recombination is typically modeled as statistically akin to gene conversion in eukaryotes, i.e., using the coalescent with gene conversion (CGC). However, this model can be very computationally demanding as it needs to account for the correlations of evolutionary histories of even distant loci. So, with the increasing popularity of whole genome sequencing, the need has emerged for a faster approach to model and simulate bacterial genome evolution. We present a new model that approximates the coalescent with gene conversion: the bacterial sequential Markov coalescent (BSMC). Our approach is based on a similar idea to the sequential Markov coalescent (SMC)-an approximation of the coalescent with crossover recombination. However, bacterial recombination poses hurdles to a sequential Markov approximation, as it leads to strong correlations and linkage disequilibrium across very distant sites in the genome. Our BSMC overcomes these difficulties, and shows a considerable reduction in computational demand compared to the exact CGC, and very similar patterns in simulated data. We implemented our BSMC model within new simulation software FastSimBac. In addition to the decreased computational demand compared to previous bacterial genome evolution simulators, FastSimBac provides more general options for evolutionary scenarios, allowing population structure with migration, speciation, population size changes, and recombination hotspots. FastSimBac is available from https://bitbucket.org/nicofmay/fastsimbac, and is distributed as open source under the terms of the GNU General Public License. Lastly, we use the BSMC within an Approximate Bayesian Computation (ABC) inference scheme, and suggest that parameters simulated under the exact CGC can correctly be recovered, further showcasing the accuracy of the BSMC. With this ABC we infer recombination rate, mutation rate, and recombination tract length of Bacillus cereus from a whole genome alignment.
Collapse
Affiliation(s)
- Nicola De Maio
- Institute for Emerging Infections, Oxford Martin School, University of Oxford, Oxford, OX1 3PA, United Kingdom
- Nuffield Department of Medicine, University of Oxford, Oxford, OX1 3PA, United Kingdom
| | - Daniel J Wilson
- Institute for Emerging Infections, Oxford Martin School, University of Oxford, Oxford, OX1 3PA, United Kingdom
- Nuffield Department of Medicine, University of Oxford, Oxford, OX1 3PA, United Kingdom
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX1 3PA, United Kingdom
| |
Collapse
|
22
|
Goodwin ZA, de Guzman Strong C. Recent Positive Selection in Genes of the Mammalian Epidermal Differentiation Complex Locus. Front Genet 2017; 7:227. [PMID: 28119736 PMCID: PMC5222828 DOI: 10.3389/fgene.2016.00227] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 12/27/2016] [Indexed: 12/27/2022] Open
Abstract
The epidermal differentiation complex (EDC) is the most rapidly evolving locus in the human genome compared to that of the chimpanzee. Yet the EDC genes that are undergoing positive selection across mammals and in humans are not known. We sought to identify the positively selected genetic variants and determine the evolutionary events of the EDC using mammalian-wide and clade-specific branch- and branch-site likelihood ratio tests and a genetic algorithm (GA) branch test. Significant non-synonymous substitutions were found in filaggrin, SPRR4, LELP1, and S100A2 genes across 14 mammals. By contrast, we identified recent positive selection in SPRR4 in primates. Additionally, the GA branch test discovered lineage-specific evolution for distinct EDC genes occurring in each of the nodes in the 14-mammal phylogenetic tree. Multiple instances of positive selection for FLG, TCHHL1, SPRR4, LELP1, and S100A2 were noted among the primate branch nodes. Branch-site likelihood ratio tests further revealed positive selection in specific sites in SPRR4, LELP1, filaggrin, and repetin across 14 mammals. However, in addition to continuous evolution of SPRR4, site-specific positive selection was also found in S100A11, KPRP, SPRR1A, S100A7L2, and S100A3 in primates and filaggrin, filaggrin2, and S100A8 in great apes. Very recent human positive selection was identified in the filaggrin2 L41 site that was present in Neanderthal. Together, our results identifying recent positive selection in distinct EDC genes reveal an underappreciated evolution of epidermal skin barrier function in primates and humans.
Collapse
Affiliation(s)
- Zane A Goodwin
- Division of Dermatology, Department of Internal Medicine, Center for Pharmacogenomics and Center for the Study of Itch, Washington University School of Medicine, St. Louis MO, USA
| | - Cristina de Guzman Strong
- Division of Dermatology, Department of Internal Medicine, Center for Pharmacogenomics and Center for the Study of Itch, Washington University School of Medicine, St. Louis MO, USA
| |
Collapse
|
23
|
Chi PB, Chattopadhyay S, Lemey P, Sokurenko EV, Minin VN. Synonymous and nonsynonymous distances help untangle convergent evolution and recombination. Stat Appl Genet Mol Biol 2016; 14:375-89. [PMID: 26061623 DOI: 10.1515/sagmb-2014-0078] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
When estimating a phylogeny from a multiple sequence alignment, researchers often assume the absence of recombination. However, if recombination is present, then tree estimation and all downstream analyses will be impacted, because different segments of the sequence alignment support different phylogenies. Similarly, convergent selective pressures at the molecular level can also lead to phylogenetic tree incongruence across the sequence alignment. Current methods for detection of phylogenetic incongruence are not equipped to distinguish between these two different mechanisms and assume that the incongruence is a result of recombination or other horizontal transfer of genetic information. We propose a new recombination detection method that can make this distinction, based on synonymous codon substitution distances. Although some power is lost by discarding the information contained in the nonsynonymous substitutions, our new method has lower false positive probabilities than the comparable recombination detection method when the phylogenetic incongruence signal is due to convergent evolution. We apply our method to three empirical examples, where we analyze: (1) sequences from a transmission network of the human immunodeficiency virus, (2) tlpB gene sequences from a geographically diverse set of 38 Helicobacter pylori strains, and (3) hepatitis C virus sequences sampled longitudinally from one patient.
Collapse
|
24
|
Kelleher J, Etheridge AM, McVean G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Comput Biol 2016; 12:e1004842. [PMID: 27145223 PMCID: PMC4856371 DOI: 10.1371/journal.pcbi.1004842] [Citation(s) in RCA: 328] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 03/02/2016] [Indexed: 01/23/2023] Open
Abstract
A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods. Our understanding of the distribution of genetic variation in natural populations has been driven by mathematical models of the underlying biological and demographic processes. A key strength of such coalescent models is that they enable efficient simulation of data we might see under a variety of evolutionary scenarios. However, current methods are not well suited to simulating genome-scale data sets on hundreds of thousands of samples, which is essential if we are to understand the data generated by population-scale sequencing projects. Similarly, processing the results of large simulations also presents researchers with a major challenge, as it can take many days just to read the data files. In this paper we solve these problems by introducing a new way to represent information about the ancestral process. This new representation leads to huge gains in simulation speed and storage efficiency so that large simulations complete in minutes and the output files can be processed in seconds.
Collapse
Affiliation(s)
- Jerome Kelleher
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| | | | - Gilean McVean
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
25
|
Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection. Genetics 2016; 202:1449-72. [PMID: 26857628 DOI: 10.1534/genetics.115.177931] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 01/31/2016] [Indexed: 01/11/2023] Open
Abstract
Human immunodeficiency virus (HIV) is a rapidly evolving pathogen that causes chronic infections, so genetic diversity within a single infection can be very high. High-throughput "deep" sequencing can now measure this diversity in unprecedented detail, particularly since it can be performed at different time points during an infection, and this offers a potentially powerful way to infer the evolutionary dynamics of the intrahost viral population. However, population genomic inference from HIV sequence data is challenging because of high rates of mutation and recombination, rapid demographic changes, and ongoing selective pressures. In this article we develop a new method for inference using HIV deep sequencing data, using an approach based on importance sampling of ancestral recombination graphs under a multilocus coalescent model. The approach further extends recent progress in the approximation of so-called conditional sampling distributions, a quantity of key interest when approximating coalescent likelihoods. The chief novelties of our method are that it is able to infer rates of recombination and mutation, as well as the effective population size, while handling sampling over different time points and missing data without extra computational difficulty. We apply our method to a data set of HIV-1, in which several hundred sequences were obtained from an infected individual at seven time points over 2 years. We find mutation rate and effective population size estimates to be comparable to those produced by the software BEAST. Additionally, our method is able to produce local recombination rate estimates. The software underlying our method, Coalescenator, is freely available.
Collapse
|
26
|
Arenas M. Trends in substitution models of molecular evolution. Front Genet 2015; 6:319. [PMID: 26579193 PMCID: PMC4620419 DOI: 10.3389/fgene.2015.00319] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 10/09/2015] [Indexed: 11/13/2022] Open
Abstract
Substitution models of evolution describe the process of genetic variation through fixed mutations and constitute the basis of the evolutionary analysis at the molecular level. Almost 40 years after the development of first substitution models, highly sophisticated, and data-specific substitution models continue emerging with the aim of better mimicking real evolutionary processes. Here I describe current trends in substitution models of DNA, codon and amino acid sequence evolution, including advantages and pitfalls of the most popular models. The perspective concludes that despite the large number of currently available substitution models, further research is required for more realistic modeling, especially for DNA coding and amino acid data. Additionally, the development of more accurate complex models should be coupled with new implementations and improvements of methods and frameworks for substitution model selection and downstream evolutionary analysis.
Collapse
Affiliation(s)
- Miguel Arenas
- Institute of Molecular Pathology and Immunology of the University of Porto Porto, Portugal
| |
Collapse
|
27
|
Arenas M, Lorenzo-Redondo R, Lopez-Galindez C. Influence of mutation and recombination on HIV-1 in vitro fitness recovery. Mol Phylogenet Evol 2015; 94:264-70. [PMID: 26358613 DOI: 10.1016/j.ympev.2015.09.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 08/31/2015] [Accepted: 09/01/2015] [Indexed: 10/23/2022]
Abstract
The understanding of the evolutionary processes underlying HIV-1 fitness recovery is fundamental for HIV-1 pathogenesis, antiretroviral treatment and vaccine design. It is known that HIV-1 can present very high mutation and recombination rates, however the specific contribution of these evolutionary forces in the "in vitro" viral fitness recovery has not been simultaneously quantified. To this aim, we analyzed substitution, recombination and molecular adaptation rates in a variety of HIV-1 biological clones derived from a viral isolate after severe population bottlenecks and a number of large population cell culture passages. These clones presented an overall but uneven fitness gain, mean of 3-fold, respect to the initial passage values. We found a significant relationship between the fitness increase and the appearance and fixation of mutations. In addition, these fixed mutations presented molecular signatures of positive selection through the accumulation of non-synonymous substitutions. Interestingly, viral recombination correlated with fitness recovery in most of studied viral quasispecies. The genetic diversity generated by these evolutionary processes was positively correlated with the viral fitness. We conclude that HIV-1 fitness recovery can be derived from the genetic heterogeneity generated through both mutation and recombination, and under diversifying molecular adaptation. The findings also suggest nonrandom evolutionary pathways for in vitro fitness recovery.
Collapse
Affiliation(s)
- Miguel Arenas
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal; Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain.
| | - Ramon Lorenzo-Redondo
- Centro Nacional de Microbiología (CNM), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain.
| | - Cecilio Lopez-Galindez
- Centro Nacional de Microbiología (CNM), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain.
| |
Collapse
|
28
|
Ahn I, Jang JH, Kim HY, Lee JH, Son HS. A Visualization Tool for Calculating the Genetic Substitution Patterns Between Two Different Groups. Evol Bioinform Online 2015; 11:179-83. [PMID: 26279617 PMCID: PMC4517834 DOI: 10.4137/ebo.s28844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Revised: 06/14/2015] [Accepted: 06/22/2015] [Indexed: 12/03/2022] Open
Abstract
We developed simulation tool for influenza virus variation (SimFluVar), an analytics software for calculating genomic variation among members of the influenza virus group. This study is related to computational evolutionary biology and evolutionary bioinformatics. SimFluVar is an analytical tool that can be used to calculate codon substitution patterns of viral genes. Designed to compare a large number of nucleotide sequences, SimFluVar provides precise patterns of codon variations between two viral groups, especially for the influenza virus. SimFluVar also provides useful functions, such as editing and visualization of the result matrix. This new tool can be used to analyze codon variation patterns over time as well as to analyze the genomic differences between viruses obtained from different geographical locations. SimFluVar is developed in C++, and Java RCP is used as a distribution package. SimFluVar, including the associated documentation, manuals, and examples, is publicly available at http://lcbb.snu.ac.kr/simfluvar.
Collapse
Affiliation(s)
- Insung Ahn
- Biomedical Prediction Technology Laboratory, Korea Institute of Science and Technology Information, Yuseong-gu, Daejeon, Republic of Korea
| | - Jin-Hwa Jang
- Biomedical Prediction Technology Laboratory, Korea Institute of Science and Technology Information, Yuseong-gu, Daejeon, Republic of Korea. ; Laboratory of Computational Biology and Bioinformatics, Institute of Health and Environment, Graduate School of Public Health, Seoul National University, Gwanak-gu, Seoul, Republic of Korea
| | - Ha-Yeon Kim
- Laboratory of Computational Biology and Bioinformatics, Institute of Health and Environment, Graduate School of Public Health, Seoul National University, Gwanak-gu, Seoul, Republic of Korea
| | - Ji-Hae Lee
- Laboratory of Computational Biology and Bioinformatics, Institute of Health and Environment, Graduate School of Public Health, Seoul National University, Gwanak-gu, Seoul, Republic of Korea. ; Graduate Program in Bioinformatics, College of Natural Science, Seoul National University, Gwanak-gu, Seoul, Republic of Korea
| | - Hyeon Seok Son
- Laboratory of Computational Biology and Bioinformatics, Institute of Health and Environment, Graduate School of Public Health, Seoul National University, Gwanak-gu, Seoul, Republic of Korea. ; Graduate Program in Bioinformatics, College of Natural Science, Seoul National University, Gwanak-gu, Seoul, Republic of Korea
| |
Collapse
|
29
|
Jouet A, McMullan M, van Oosterhout C. The effects of recombination, mutation and selection on the evolution of the Rp1 resistance genes in grasses. Mol Ecol 2015; 24:3077-92. [PMID: 25907026 DOI: 10.1111/mec.13213] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2014] [Revised: 03/25/2015] [Accepted: 04/09/2015] [Indexed: 01/30/2023]
Abstract
Plant immune genes, or resistance genes, are involved in a co-evolutionary arms race with a diverse range of pathogens. In agronomically important grasses, such R genes have been extensively studied because of their role in pathogen resistance and in the breeding of resistant cultivars. In this study, we evaluate the importance of recombination, mutation and selection on the evolution of the R gene complex Rp1 of Sorghum, Triticum, Brachypodium, Oryza and Zea. Analyses show that recombination is widespread, and we detected 73 independent instances of sequence exchange, involving on average 1567 of 4692 nucleotides analysed (33.4%). We were able to date 24 interspecific recombination events and found that four occurred postspeciation, which suggests that genetic introgression took place between different grass species. Other interspecific events seemed to have been maintained over long evolutionary time, suggesting the presence of balancing selection. Significant positive selection (i.e. a relative excess of nonsynonymous substitutions (dN /dS >1)) was detected in 17-95 codons (0.42-2.02%). Recombination was significantly associated with areas with high levels of polymorphism but not with an elevated dN /dS ratio. Finally, phylogenetic analyses show that recombination results in a general overestimation of the divergence time (mean = 14.3%) and an alteration of the gene tree topology if the tree is not calibrated. Given that the statistical power to detect recombination is determined by the level of polymorphism of the amplicon as well as the number of sequences analysed, it is likely that many studies have underestimated the importance of recombination relative to the mutation rate.
Collapse
Affiliation(s)
- Agathe Jouet
- School of Environmental Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK.,The Sainsbury Laboratory, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Mark McMullan
- The Genome Analysis Center, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Cock van Oosterhout
- School of Environmental Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| |
Collapse
|
30
|
Pérez-Losada M, Arenas M, Galán JC, Palero F, González-Candelas F. Recombination in viruses: mechanisms, methods of study, and evolutionary consequences. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2015; 30:296-307. [PMID: 25541518 PMCID: PMC7106159 DOI: 10.1016/j.meegid.2014.12.022] [Citation(s) in RCA: 198] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2014] [Revised: 12/15/2014] [Accepted: 12/17/2014] [Indexed: 02/08/2023]
Abstract
Recombination is a pervasive process generating diversity in most viruses. It joins variants that arise independently within the same molecule, creating new opportunities for viruses to overcome selective pressures and to adapt to new environments and hosts. Consequently, the analysis of viral recombination attracts the interest of clinicians, epidemiologists, molecular biologists and evolutionary biologists. In this review we present an overview of three major areas related to viral recombination: (i) the molecular mechanisms that underlie recombination in model viruses, including DNA-viruses (Herpesvirus) and RNA-viruses (Human Influenza Virus and Human Immunodeficiency Virus), (ii) the analytical procedures to detect recombination in viral sequences and to determine the recombination breakpoints, along with the conceptual and methodological tools currently used and a brief overview of the impact of new sequencing technologies on the detection of recombination, and (iii) the major areas in the evolutionary analysis of viral populations on which recombination has an impact. These include the evaluation of selective pressures acting on viral populations, the application of evolutionary reconstructions in the characterization of centralized genes for vaccine design, and the evaluation of linkage disequilibrium and population structure.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Portugal; Computational Biology Institute, George Washington University, Ashburn, VA 20147, USA
| | - Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Juan Carlos Galán
- Servicio de Microbiología, Hospital Ramón y Cajal and Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain; CIBER en Epidemiología y Salud Pública, Spain
| | - Ferran Palero
- CIBER en Epidemiología y Salud Pública, Spain; Unidad Mixta Infección y Salud Pública, FISABIO-Universitat de València, Valencia, Spain
| | - Fernando González-Candelas
- CIBER en Epidemiología y Salud Pública, Spain; Unidad Mixta Infección y Salud Pública, FISABIO-Universitat de València, Valencia, Spain.
| |
Collapse
|
31
|
Arenas M, Lopes JS, Beaumont MA, Posada D. CodABC: a computational framework to coestimate recombination, substitution, and molecular adaptation rates by approximate Bayesian computation. Mol Biol Evol 2015; 32:1109-12. [PMID: 25577191 PMCID: PMC4379410 DOI: 10.1093/molbev/msu411] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The estimation of substitution and recombination rates can provide important insights into the molecular evolution of protein-coding sequences. Here, we present a new computational framework, called "CodABC," to jointly estimate recombination, substitution and synonymous and nonsynonymous rates from coding data. CodABC uses approximate Bayesian computation with and without regression adjustment and implements a variety of codon models, intracodon recombination, and longitudinal sampling. CodABC can provide accurate joint parameter estimates from recombining coding sequences, often outperforming maximum-likelihood methods based on more approximate models. In addition, CodABC allows for the inclusion of several nuisance parameters such as those representing codon frequencies, transition matrices, heterogeneity across sites or invariable sites. CodABC is freely available from http://code.google.com/p/codabc/, includes a GUI, extensive documentation and ready-to-use examples, and can run in parallel on multicore machines.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa," Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, Vigo, Spain
| | - Joao S Lopes
- Instituto Gulbenkian de Ciencia, Oeiras, Portugal
| | - Mark A Beaumont
- School of Mathematical Sciences and School of Biological Sciences, University of Bristol, University Walk, Bristol, United Kingdom
| | - David Posada
- Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, Vigo, Spain
| |
Collapse
|
32
|
Inouye M, Dashnow H, Raven LA, Schultz MB, Pope BJ, Tomita T, Zobel J, Holt KE. SRST2: Rapid genomic surveillance for public health and hospital microbiology labs. Genome Med 2014. [PMID: 25422674 DOI: 10.1186/s13073–014–0090–6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Rapid molecular typing of bacterial pathogens is critical for public health epidemiology, surveillance and infection control, yet routine use of whole genome sequencing (WGS) for these purposes poses significant challenges. Here we present SRST2, a read mapping-based tool for fast and accurate detection of genes, alleles and multi-locus sequence types (MLST) from WGS data. Using >900 genomes from common pathogens, we show SRST2 is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment. We include validation of SRST2 within a public health laboratory, and demonstrate its use for microbial genome surveillance in the hospital setting. In the face of rising threats of antimicrobial resistance and emerging virulence among bacterial pathogens, SRST2 represents a powerful tool for rapidly extracting clinically useful information from raw WGS data. Source code is available from http://katholt.github.io/srst2/.
Collapse
Affiliation(s)
- Michael Inouye
- Medical Systems Biology, Department of Pathology, The University of Melbourne, Parkville, Victoria Australia ; Department of Microbiology and Immunology, The University of Melbourne, Parkville, Victoria Australia
| | - Harriet Dashnow
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria 3010 Australia ; Victorian Life Sciences Computation Initiative, The University of Melbourne, 187 Grattan Street Carlton, Melbourne, Victoria Australia
| | - Lesley-Ann Raven
- Medical Systems Biology, Department of Pathology, The University of Melbourne, Parkville, Victoria Australia
| | - Mark B Schultz
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria 3010 Australia
| | - Bernard J Pope
- Victorian Life Sciences Computation Initiative, The University of Melbourne, 187 Grattan Street Carlton, Melbourne, Victoria Australia ; Department of Computing and Information Systems, The University of Melbourne, Parkville, Victoria Australia
| | - Takehiro Tomita
- Department of Microbiology and Immunology, The University of Melbourne, Parkville, Victoria Australia ; Microbiological Diagnostic Unit, The University of Melbourne, Parkville, Victoria Australia
| | - Justin Zobel
- Department of Computing and Information Systems, The University of Melbourne, Parkville, Victoria Australia
| | - Kathryn E Holt
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria 3010 Australia
| |
Collapse
|
33
|
Inouye M, Dashnow H, Raven LA, Schultz MB, Pope BJ, Tomita T, Zobel J, Holt KE. SRST2: Rapid genomic surveillance for public health and hospital microbiology labs. Genome Med 2014; 6:90. [PMID: 25422674 PMCID: PMC4237778 DOI: 10.1186/s13073-014-0090-6] [Citation(s) in RCA: 707] [Impact Index Per Article: 70.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Accepted: 10/16/2014] [Indexed: 01/06/2023] Open
Abstract
Rapid molecular typing of bacterial pathogens is critical for public health epidemiology, surveillance and infection control, yet routine use of whole genome sequencing (WGS) for these purposes poses significant challenges. Here we present SRST2, a read mapping-based tool for fast and accurate detection of genes, alleles and multi-locus sequence types (MLST) from WGS data. Using >900 genomes from common pathogens, we show SRST2 is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment. We include validation of SRST2 within a public health laboratory, and demonstrate its use for microbial genome surveillance in the hospital setting. In the face of rising threats of antimicrobial resistance and emerging virulence among bacterial pathogens, SRST2 represents a powerful tool for rapidly extracting clinically useful information from raw WGS data. Source code is available from http://katholt.github.io/srst2/.
Collapse
Affiliation(s)
- Michael Inouye
- Medical Systems Biology, Department of Pathology, The University of Melbourne, Parkville, Victoria Australia ; Department of Microbiology and Immunology, The University of Melbourne, Parkville, Victoria Australia
| | - Harriet Dashnow
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria 3010 Australia ; Victorian Life Sciences Computation Initiative, The University of Melbourne, 187 Grattan Street Carlton, Melbourne, Victoria Australia
| | - Lesley-Ann Raven
- Medical Systems Biology, Department of Pathology, The University of Melbourne, Parkville, Victoria Australia
| | - Mark B Schultz
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria 3010 Australia
| | - Bernard J Pope
- Victorian Life Sciences Computation Initiative, The University of Melbourne, 187 Grattan Street Carlton, Melbourne, Victoria Australia ; Department of Computing and Information Systems, The University of Melbourne, Parkville, Victoria Australia
| | - Takehiro Tomita
- Department of Microbiology and Immunology, The University of Melbourne, Parkville, Victoria Australia ; Microbiological Diagnostic Unit, The University of Melbourne, Parkville, Victoria Australia
| | - Justin Zobel
- Department of Computing and Information Systems, The University of Melbourne, Parkville, Victoria Australia
| | - Kathryn E Holt
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria 3010 Australia
| |
Collapse
|
34
|
Inferring phylogenies of evolving sequences without multiple sequence alignment. Sci Rep 2014; 4:6504. [PMID: 25266120 PMCID: PMC4179140 DOI: 10.1038/srep06504] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 09/10/2014] [Indexed: 12/25/2022] Open
Abstract
Alignment-free methods, in which shared properties of sub-sequences (e.g. identity or match length) are extracted and used to compute a distance matrix, have recently been explored for phylogenetic inference. However, the scalability and robustness of these methods to key evolutionary processes remain to be investigated. Here, using simulated sequence sets of various sizes in both nucleotides and amino acids, we systematically assess the accuracy of phylogenetic inference using an alignment-free approach, based on D2 statistics, under different evolutionary scenarios. We find that compared to a multiple sequence alignment approach, D2 methods are more robust against among-site rate heterogeneity, compositional biases, genetic rearrangements and insertions/deletions, but are more sensitive to recent sequence divergence and sequence truncation. Across diverse empirical datasets, the alignment-free methods perform well for sequences sharing low divergence, at greater computation speed. Our findings provide strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics.
Collapse
|
35
|
Benguigui M, Arenas M. Spatial and temporal simulation of human evolution. Methods, frameworks and applications. Curr Genomics 2014; 15:245-55. [PMID: 25132795 PMCID: PMC4133948 DOI: 10.2174/1389202915666140506223639] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2014] [Revised: 04/05/2014] [Accepted: 05/04/2014] [Indexed: 01/29/2023] Open
Abstract
Analyses of human evolution are fundamental to understand the current gradients of human diversity. In this concern, genetic samples collected from current populations together with archaeological data are the most important resources to study human evolution. However, they are often insufficient to properly evaluate a variety of evolutionary scenarios, leading to continuous debates and discussions. A commonly applied strategy consists of the use of computer simulations based on, as realistic as possible, evolutionary models, to evaluate alternative evolutionary scenarios through statistical correlations with the real data. Computer simulations can also be applied to estimate evolutionary parameters or to study the role of each parameter on the evolutionary process. Here we review the mainly used methods and evolutionary frameworks to perform realistic spatially explicit computer simulations of human evolution. Although we focus on human evolution, most of the methods and software we describe can also be used to study other species. We also describe the importance of considering spatially explicit models to better mimic human evolutionary scenarios based on a variety of phenomena such as range expansions, range shifts, range contractions, sex-biased dispersal, long-distance dispersal or admixtures of populations. We finally discuss future implementations to improve current spatially explicit simulations and their derived applications in human evolution.
Collapse
Affiliation(s)
- Macarena Benguigui
- Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| |
Collapse
|
36
|
Abstract
The tRNA adaptation index (tAI) is a widely used measure of the efficiency by which a coding sequence is recognized by the intra-cellular tRNA pool. This index includes among others weights that represent wobble interactions between codons and tRNA molecules. Currently, these weights are based only on the gene expression in Saccharomyces cerevisiae. However, the efficiencies of the different codon–tRNA interactions are expected to vary among different organisms. In this study, we suggest a new approach for adjusting the tAI weights to any target model organism without the need for gene expression measurements. Our method is based on optimizing the correlation between the tAI and a measure of codon usage bias. Here, we show that in non-fungal the new tAI weights predict protein abundance significantly better than the traditional tAI weights. The unique tRNA–codon adaptation weights computed for 100 different organisms exhibit a significant correlation with evolutionary distance. The reported results demonstrate the usefulness of the new measure in future genomic studies.
Collapse
Affiliation(s)
- Renana Sabi
- Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel
| | - Tamir Tuller
- Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel The Sagol School of Neuroscience, Tel-Aviv University, Tel-Aviv, Israel
| |
Collapse
|
37
|
Bielejec F, Lemey P, Carvalho LM, Baele G, Rambaut A, Suchard MA. πBUSS: a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios. BMC Bioinformatics 2014; 15:133. [PMID: 24885610 PMCID: PMC4020384 DOI: 10.1186/1471-2105-15-133] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2013] [Accepted: 04/24/2014] [Indexed: 01/12/2023] Open
Abstract
Background Simulated nucleotide or amino acid sequences are frequently used to assess the performance of phylogenetic reconstruction methods. BEAST, a Bayesian statistical framework that focuses on reconstructing time-calibrated molecular evolutionary processes, supports a wide array of evolutionary models, but lacked matching machinery for simulation of character evolution along phylogenies. Results We present a flexible Monte Carlo simulation tool, called πBUSS, that employs the BEAGLE high performance library for phylogenetic computations to rapidly generate large sequence alignments under complex evolutionary models. πBUSS sports a user-friendly graphical user interface (GUI) that allows combining a rich array of models across an arbitrary number of partitions. A command-line interface mirrors the options available through the GUI and facilitates scripting in large-scale simulation studies. πBUSS may serve as an easy-to-use, standard sequence simulation tool, but the available models and data types are particularly useful to assess the performance of complex BEAST inferences. The connection with BEAST is further strengthened through the use of a common extensible markup language (XML), allowing to specify also more advanced evolutionary models. To support simulation under the latter, as well as to support simulation and analysis in a single run, we also add the πBUSS core simulation routine to the list of BEAST XML parsers. Conclusions πBUSS offers a unique combination of flexibility and ease-of-use for sequence simulation under realistic evolutionary scenarios. Through different interfaces, πBUSS supports simulation studies ranging from modest endeavors for illustrative purposes to complex and large-scale assessments of evolutionary inference procedures. Applications are not restricted to the BEAST framework, or even time-measured evolutionary histories, and πBUSS can be connected to various other programs using standard input and output format.
Collapse
Affiliation(s)
- Filip Bielejec
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium.
| | | | | | | | | | | |
Collapse
|
38
|
Arenas M, Posada D. Simulation of genome-wide evolution under heterogeneous substitution models and complex multispecies coalescent histories. Mol Biol Evol 2014; 31:1295-301. [PMID: 24557445 PMCID: PMC3995339 DOI: 10.1093/molbev/msu078] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Genomic evolution can be highly heterogeneous. Here, we introduce a new framework to simulate genome-wide sequence evolution under a variety of substitution models that may change along the genome and the phylogeny, following complex multispecies coalescent histories that can include recombination, demographics, longitudinal sampling, population subdivision/species history, and migration. A key aspect of our simulation strategy is that the heterogeneity of the whole evolutionary process can be parameterized according to statistical prior distributions specified by the user. We used this framework to carry out a study of the impact of variable codon frequencies across genomic regions on the estimation of the genome-wide nonsynonymous/synonymous ratio. We found that both variable codon frequencies across genes and rate variation among sites and regions can lead to severe underestimation of the global dN/dS values. The program SGWE—Simulation of Genome-Wide Evolution—is freely available from http://code.google.com/p/sgwe-project/, including extensive documentation and detailed examples.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa," Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | | |
Collapse
|
39
|
Lacerda M, Moore PL, Ngandu NK, Seaman M, Gray ES, Murrell B, Krishnamoorthy M, Nonyane M, Madiga M, Wibmer CK, Sheward D, Bailer RT, Gao H, Greene KM, Karim SSA, Mascola JR, Korber BTM, Montefiori DC, Morris L, Williamson C, Seoighe C. Identification of broadly neutralizing antibody epitopes in the HIV-1 envelope glycoprotein using evolutionary models. Virol J 2013; 10:347. [PMID: 24295501 PMCID: PMC4220805 DOI: 10.1186/1743-422x-10-347] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 11/21/2013] [Indexed: 11/19/2022] Open
Abstract
Background Identification of the epitopes targeted by antibodies that can neutralize diverse HIV-1 strains can provide important clues for the design of a preventative vaccine. Methods We have developed a computational approach that can identify key amino acids within the HIV-1 envelope glycoprotein that influence sensitivity to broadly cross-neutralizing antibodies. Given a sequence alignment and neutralization titers for a panel of viruses, the method works by fitting a phylogenetic model that allows the amino acid frequencies at each site to depend on neutralization sensitivities. Sites at which viral evolution influences neutralization sensitivity were identified using Bayes factors (BFs) to compare the fit of this model to that of a null model in which sequences evolved independently of antibody sensitivity. Conformational epitopes were identified with a Metropolis algorithm that searched for a cluster of sites with large Bayes factors on the tertiary structure of the viral envelope. Results We applied our method to ID50 neutralization data generated from seven HIV-1 subtype C serum samples with neutralization breadth that had been tested against a multi-clade panel of 225 pseudoviruses for which envelope sequences were also available. For each sample, between two and four sites were identified that were strongly associated with neutralization sensitivity (2ln(BF) > 6), a subset of which were experimentally confirmed using site-directed mutagenesis. Conclusions Our results provide strong support for the use of evolutionary models applied to cross-sectional viral neutralization data to identify the epitopes of serum antibodies that confer neutralization breadth.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Cathal Seoighe
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, Galway, Ireland.
| | | |
Collapse
|
40
|
Coestimation of recombination, substitution and molecular adaptation rates by approximate Bayesian computation. Heredity (Edinb) 2013; 112:255-64. [PMID: 24149652 DOI: 10.1038/hdy.2013.101] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Revised: 08/22/2013] [Accepted: 09/17/2013] [Indexed: 11/08/2022] Open
Abstract
The estimation of parameters in molecular evolution may be biased when some processes are not considered. For example, the estimation of selection at the molecular level using codon-substitution models can have an upward bias when recombination is ignored. Here we address the joint estimation of recombination, molecular adaptation and substitution rates from coding sequences using approximate Bayesian computation (ABC). We describe the implementation of a regression-based strategy for choosing subsets of summary statistics for coding data, and show that this approach can accurately infer recombination allowing for intracodon recombination breakpoints, molecular adaptation and codon substitution rates. We demonstrate that our ABC approach can outperform other analytical methods under a variety of evolutionary scenarios. We also show that although the choice of the codon-substitution model is important, our inferences are robust to a moderate degree of model misspecification. In addition, we demonstrate that our approach can accurately choose the evolutionary model that best fits the data, providing an alternative for when the use of full-likelihood methods is impracticable. Finally, we applied our ABC method to co-estimate recombination, substitution and molecular adaptation rates from 24 published human immunodeficiency virus 1 coding data sets.
Collapse
|
41
|
Arenas M. The importance and application of the ancestral recombination graph. Front Genet 2013; 4:206. [PMID: 24133504 PMCID: PMC3796270 DOI: 10.3389/fgene.2013.00206] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2013] [Accepted: 09/24/2013] [Indexed: 11/13/2022] Open
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology “Severo Ochoa,” Consejo Superior de Investigaciones Científicas, Universidad Autónoma de MadridMadrid, Spain
| |
Collapse
|
42
|
Arenas M, Dos Santos HG, Posada D, Bastolla U. Protein evolution along phylogenetic histories under structurally constrained substitution models. ACTA ACUST UNITED AC 2013; 29:3020-8. [PMID: 24037213 DOI: 10.1093/bioinformatics/btt530] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Models of molecular evolution aim at describing the evolutionary processes at the molecular level. However, current models rarely incorporate information from protein structure. Conversely, structure-based models of protein evolution have not been commonly applied to simulate sequence evolution in a phylogenetic framework, and they often ignore relevant evolutionary processes such as recombination. A simulation evolutionary framework that integrates substitution models that account for protein structure stability should be able to generate more realistic in silico evolved proteins for a variety of purposes. RESULTS We developed a method to simulate protein evolution that combines models of protein folding stability, such that the fitness depends on the stability of the native state both with respect to unfolding and misfolding, with phylogenetic histories that can be either specified by the user or simulated with the coalescent under complex evolutionary scenarios, including recombination, demographics and migration. We have implemented this framework in a computer program called ProteinEvolver. Remarkably, comparing these models with empirical amino acid replacement models, we found that the former produce amino acid distributions closer to distributions observed in real protein families, and proteins that are predicted to be more stable. Therefore, we conclude that evolutionary models that consider protein stability and realistic evolutionary histories constitute a better approximation of the real evolutionary process.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology 'Severo Ochoa', Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain and Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | | | | | | |
Collapse
|
43
|
Phylogeny, spatio-temporal phylodynamics and evolutionary scenario of Torque teno sus virus 1 (TTSuV1) and 2 (TTSuV2) in wild boars: Fast dispersal and high genetic diversity. Vet Microbiol 2013; 166:200-13. [DOI: 10.1016/j.vetmic.2013.06.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2013] [Revised: 05/29/2013] [Accepted: 06/10/2013] [Indexed: 01/09/2023]
|
44
|
Abstract
Empirical proof of human mitochondrial DNA (mtDNA) recombination in somatic tissues was obtained in 2004; however, a lack of irrefutable evidence exists for recombination in human mtDNA at the population level. Our inability to demonstrate convincingly a signal of recombination in population data sets of human mtDNA sequence may be due, in part, to the ineffectiveness of current indirect tests. Previously, we tested some well-established indirect tests of recombination (linkage disequilibrium vs. distance using D' and r(2), Homoplasy Test, Pairwise Homoplasy Index, Neighborhood Similarity Score, and Max χ(2)) on sequence data derived from the only empirically confirmed case of human mtDNA recombination thus far and demonstrated that some methods were unable to detect recombination. Here, we assess the performance of these six well-established tests and explore what characteristics specific to human mtDNA sequence may affect their efficacy by simulating sequence under various parameters with levels of recombination (ρ) that vary around an empirically derived estimate for human mtDNA (population parameter ρ = 5.492). No test performed infallibly under any of our scenarios, and error rates varied across tests, whereas detection rates increased substantially with ρ values > 5.492. Under a model of evolution that incorporates parameters specific to human mtDNA, including rate heterogeneity, population expansion, and ρ = 5.492, successful detection rates are limited to a range of 7-70% across tests with an acceptable level of false-positive results: the neighborhood similarity score incompatibility test performed best overall under these parameters. Population growth seems to have the greatest impact on recombination detection probabilities across all models tested, likely due to its impact on sequence diversity. The implications of our findings on our current understanding of mtDNA recombination in humans are discussed.
Collapse
|
45
|
Behura SK, Severson DW. Nucleotide substitutions in dengue virus serotypes from Asian and American countries: insights into intracodon recombination and purifying selection. BMC Microbiol 2013; 13:37. [PMID: 23410119 PMCID: PMC3598932 DOI: 10.1186/1471-2180-13-37] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2012] [Accepted: 01/21/2013] [Indexed: 01/26/2023] Open
Abstract
Background Dengue virus (DENV) infection represents a significant public health problem in many subtropical and tropical countries. Although genetically closely related, the four serotypes of DENV differ in antigenicity for which cross protection among serotypes is limited. It is also believed that both multi-serotype infection as well as the evolution of viral antigenicity may have confounding effects in increased dengue epidemics. Numerous studies have been performed that investigated genetic diversity of DENV, but the precise mechanism(s) of dengue virus evolution are not well understood. Results We investigated genome-wide genetic diversity and nucleotide substitution patterns in the four serotypes among samples collected from different countries in Asia and Central and South America and sequenced as part of the Genome Sequencing Center for Infectious Diseases at the Broad Institute. We applied bioinformatics, statistical and coalescent simulation methods to investigate diversity of codon sequences of DENV samples representing the four serotypes. We show that fixation of nucleotide substitutions is more prominent among the inter-continental isolates (Asian and American) of serotypes 1, 2 and 3 compared to serotype 4 isolates (South and Central America) and are distributed in a non-random manner among the genes encoded by the virus. Nearly one third of the negatively selected sites are associated with fixed mutation sites within serotypes. Our results further show that of all the sites showing evidence of recombination, the majority (~84%) correspond to sites under purifying selection in the four serotypes. The analysis further shows that genetic recombination occurs within specific codons, albeit with low frequency (< 5% of all recombination sites) throughout the DENV genome of the four serotypes and reveals significant enrichment (p < 0.05) among sites under purifying selection in the virus. Conclusion The study provides the first evidence for intracodon recombination in DENV and suggests that within codons, genetic recombination has a significant role in maintaining extensive purifying selection of DENV in natural populations. Our study also suggests that fixation of beneficial mutations may lead to virus evolution via translational selection of specific sites in the DENV genome.
Collapse
Affiliation(s)
- Susanta K Behura
- Eck Institute for Global Health, Department of Biological Sciences, University of Notre Dame, 46556, Notre Dame, IN, USA
| | | |
Collapse
|
46
|
Arenas M. Computer programs and methodologies for the simulation of DNA sequence data with recombination. Front Genet 2013; 4:9. [PMID: 23378848 PMCID: PMC3561691 DOI: 10.3389/fgene.2013.00009] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Accepted: 01/17/2013] [Indexed: 11/13/2022] Open
Abstract
Computer simulations are useful in evolutionary biology for hypothesis testing, to verify analytical methods, to analyze interactions among evolutionary processes, and to estimate evolutionary parameters. In particular, the simulation of DNA sequences with recombination may help in understanding the role of recombination in diverse evolutionary questions, such as the genome structure. Consequently, plenty of computer simulators have been developed to simulate DNA sequence data with recombination. However, the choice of an appropriate tool, among all currently available simulators, is critical if recombination simulations are to be biologically meaningful. This review provides a practical survival guide to commonly used computer programs and methodologies for the simulation of coding and non-coding DNA sequences with recombination. It may help in the correct design of computer simulation experiments of recombination. In addition, the study includes a review of simulation studies investigating the impact of ignoring recombination when performing various evolutionary analyses, such as phylogenetic tree and ancestral sequence reconstructions. Alternative analytical methodologies accounting for recombination are also reviewed.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa," Consejo Superior de Investigaciones Científicas Madrid, Spain
| |
Collapse
|
47
|
Cadar D, Dán Á, Tombácz K, Lőrincz M, Kiss T, Becskei Z, Spînu M, Tuboly T, Cságola A. Phylogeny and evolutionary genetics of porcine parvovirus in wild boars. INFECTION GENETICS AND EVOLUTION 2012; 12:1163-71. [DOI: 10.1016/j.meegid.2012.04.020] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2012] [Revised: 04/19/2012] [Accepted: 04/21/2012] [Indexed: 10/28/2022]
|
48
|
Affiliation(s)
- Miguel Arenas
- Computational and Molecular Population Genetics Lab-CMPG, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland.
| |
Collapse
|
49
|
Abstract
Throughout the living world, genetic recombination and nucleotide substitution are the primary processes that create the genetic variation upon which natural selection acts. Just as analyses of substitution patterns can reveal a great deal about evolution, so too can analyses of recombination. Evidence of genetic recombination within the genomes of apparently asexual species can equate with evidence of cryptic sexuality. In sexually reproducing species, nonrandom patterns of sequence exchange can provide direct evidence of population subdivisions that prevent certain individuals from mating. Although an interesting topic in its own right, an important reason for analysing recombination is to account for its potentially disruptive influences on various phylogenetic-based molecular evolution analyses. Specifically, the evolutionary histories of recombinant sequences cannot be accurately described by standard bifurcating phylogenetic trees. Taking recombination into account can therefore be pivotal to the success of selection, molecular clock and various other analyses that require adequate modelling of shared ancestry and draw increased power from accurately inferred phylogenetic trees. Here, we review various computational approaches to studying recombination and provide guidelines both on how to gain insights into this important evolutionary process and on how it can be properly accounted for during molecular evolution studies.
Collapse
Affiliation(s)
- Darren P Martin
- Computational Biology Group, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | | | | |
Collapse
|
50
|
Parida L, Palamara PF, Javed A. A minimal descriptor of an ancestral recombinations graph. BMC Bioinformatics 2011; 12 Suppl 1:S6. [PMID: 21342589 PMCID: PMC3044314 DOI: 10.1186/1471-2105-12-s1-s6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ancestral Recombinations Graph (ARG) is a phylogenetic structure that encodes both duplication events, such as mutations, as well as genetic exchange events, such as recombinations: this captures the (genetic) dynamics of a population evolving over generations. RESULTS In this paper, we identify structure-preserving and samples-preserving core of an ARG G and call it the minimal descriptor ARG of G. Its structure-preserving characteristic ensures that all the branch lengths of the marginal trees of the minimal descriptor ARG are identical to that of G and the samples-preserving property asserts that the patterns of genetic variation in the samples of the minimal descriptor ARG are exactly the same as that of G. We also prove that even an unbounded G has a finite minimal descriptor, that continues to preserve certain (graph-theoretic) properties of G and for an appropriate class of ARGs, our estimate (Eqn 8) as well as empirical observation is that the expected reduction in the number of vertices is exponential. CONCLUSIONS Based on the definition of this lossless and bounded structure, we derive local properties of the vertices of a minimal descriptor ARG, which lend itself very naturally to the design of efficient sampling algorithms. We further show that a class of minimal descriptors, that of binary ARGs, models the standard coalescent exactly (Thm 6).
Collapse
Affiliation(s)
- Laxmi Parida
- Computational Genomics, IBM T J Watson Research, Yorktown, New York, USA.
| | | | | |
Collapse
|