1
|
Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
2
|
González-Vázquez LD, Arenas M. Molecular Evolution of SARS-CoV-2 during the COVID-19 Pandemic. Genes (Basel) 2023; 14:407. [PMID: 36833334 PMCID: PMC9956206 DOI: 10.3390/genes14020407] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 01/27/2023] [Accepted: 02/02/2023] [Indexed: 02/09/2023] Open
Abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) produced diverse molecular variants during its recent expansion in humans that caused different transmissibility and severity of the associated disease as well as resistance to monoclonal antibodies and polyclonal sera, among other treatments. In order to understand the causes and consequences of the observed SARS-CoV-2 molecular diversity, a variety of recent studies investigated the molecular evolution of this virus during its expansion in humans. In general, this virus evolves with a moderate rate of evolution, in the order of 10-3-10-4 substitutions per site and per year, which presents continuous fluctuations over time. Despite its origin being frequently associated with recombination events between related coronaviruses, little evidence of recombination was detected, and it was mostly located in the spike coding region. Molecular adaptation is heterogeneous among SARS-CoV-2 genes. Although most of the genes evolved under purifying selection, several genes showed genetic signatures of diversifying selection, including a number of positively selected sites that affect proteins relevant for the virus replication. Here, we review current knowledge about the molecular evolution of SARS-CoV-2 in humans, including the emergence and establishment of variants of concern. We also clarify relationships between the nomenclatures of SARS-CoV-2 lineages. We conclude that the molecular evolution of this virus should be monitored over time for predicting relevant phenotypic consequences and designing future efficient treatments.
Collapse
Affiliation(s)
- Luis Daniel González-Vázquez
- Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain
| |
Collapse
|
3
|
Del Amparo R, González-Vázquez LD, Rodríguez-Moure L, Bastolla U, Arenas M. Consequences of Genetic Recombination on Protein Folding Stability. J Mol Evol 2023; 91:33-45. [PMID: 36463317 PMCID: PMC9849154 DOI: 10.1007/s00239-022-10080-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 11/25/2022] [Indexed: 12/05/2022]
Abstract
Genetic recombination is a common evolutionary mechanism that produces molecular diversity. However, its consequences on protein folding stability have not attracted the same attention as in the case of point mutations. Here, we studied the effects of homologous recombination on the computationally predicted protein folding stability for several protein families, finding less detrimental effects than we previously expected. Although recombination can affect multiple protein sites, we found that the fraction of recombined proteins that are eliminated by negative selection because of insufficient stability is not significantly larger than the corresponding fraction of proteins produced by mutation events. Indeed, although recombination disrupts epistatic interactions, the mean stability of recombinant proteins is not lower than that of their parents. On the other hand, the difference of stability between recombined proteins is amplified with respect to the parents, promoting phenotypic diversity. As a result, at least one third of recombined proteins present stability between those of their parents, and a substantial fraction have higher or lower stability than those of both parents. As expected, we found that parents with similar sequences tend to produce recombined proteins with stability close to that of the parents. Finally, the simulation of protein evolution along the ancestral recombination graph with empirical substitution models commonly used in phylogenetics, which ignore constraints on protein folding stability, showed that recombination favors the decrease of folding stability, supporting the convenience of adopting structurally constrained models when possible for inferences of protein evolutionary histories with recombination.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Luis Daniel González-Vázquez
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Laura Rodríguez-Moure
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Ugo Bastolla
- Centre for Molecular Biology Severo Ochoa (CSIC-UAM), 28049 Madrid, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain ,Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain
| |
Collapse
|
4
|
Abstract
The reconstruction of genetic material of ancestral organisms constitutes a powerful application of evolutionary biology. A fundamental step in this inference is the ancestral sequence reconstruction (ASR), which can be performed with diverse methodologies implemented in computer frameworks. However, most of these methodologies ignore evolutionary properties frequently observed in microbes, such as genetic recombination and complex selection processes, that can bias the traditional ASR. From a practical perspective, here I review methodologies for the reconstruction of ancestral DNA and protein sequences, with particular focus on microbes, and including biases, recommendations, and software implementations. I conclude that microbial ASR is a complex analysis that should be carefully performed and that there is a need for methods to infer more realistic ancestral microbial sequences.
Collapse
Affiliation(s)
- Miguel Arenas
- Biomedical Research Center (CINBIO), University of Vigo, Vigo, Spain.
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain.
- Galicia Sur Health Research Institute (IIS Galicia Sur), Vigo, Spain.
| |
Collapse
|
5
|
Arenas M. ProteinEvolverABC: coestimation of recombination and substitution rates in protein sequences by approximate Bayesian computation. Bioinformatics 2021; 38:58-64. [PMID: 34450622 PMCID: PMC8696103 DOI: 10.1093/bioinformatics/btab617] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 07/24/2021] [Accepted: 08/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The evolutionary processes of mutation and recombination, upon which selection operates, are fundamental to understand the observed molecular diversity. Unlike nucleotide sequences, the estimation of the recombination rate in protein sequences has been little explored, neither implemented in evolutionary frameworks, despite protein sequencing methods are largely used. RESULTS In order to accommodate this need, here I present a computational framework, called ProteinEvolverABC, to jointly estimate recombination and substitution rates from alignments of protein sequences. The framework implements the approximate Bayesian computation approach, with and without regression adjustments and includes a variety of substitution models of protein evolution, demographics and longitudinal sampling. It also implements several nuisance parameters such as heterogeneous amino acid frequencies and rate of change among sites and, proportion of invariable sites. The framework produces accurate coestimation of recombination and substitution rates under diverse evolutionary scenarios. As illustrative examples of usage, I applied it to several viral protein families, including coronaviruses, showing heterogeneous substitution and recombination rates. AVAILABILITY AND IMPLEMENTATION ProteinEvolverABC is freely available from https://github.com/miguelarenas/proteinevolverabc, includes a graphical user interface for helping the specification of the input settings, extensive documentation and ready-to-use examples. Conveniently, the simulations can run in parallel on multicore machines. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
6
|
Abstract
Viral recombination is a major evolutionary mechanism driving adaptation processes, such as the ability of host-switching. Understanding global patterns of recombination could help to identify underlying mechanisms and to evaluate the potential risks of rapid adaptation. Conventional approaches (e.g., those based on linkage disequilibrium) are computationally demanding or even intractable when sequence alignments include hundreds of sequences, common in viral data sets. We present a comprehensive analysis of recombination across 30 genomic alignments from viruses infecting humans. In order to scale the analysis and avoid the computational limitations of conventional approaches, we apply newly developed topological data analysis methods able to infer recombination rates for large data sets. We show that viruses, such as ZEBOV and MARV, consistently displayed low levels of recombination, whereas high levels of recombination were observed in Sarbecoviruses, HBV, HEV, Rhinovirus A, and HIV. We observe that recombination is more common in positive single-stranded RNA viruses than in negatively single-stranded RNA ones. Interestingly, the comparison across multiple viruses suggests an inverse correlation between genome length and recombination rate. Positional analyses of recombination breakpoints along viral genomes, combined with our approach, detected at least 39 nonuniform patterns of recombination (i.e., cold or hotspots) in 18 viral groups. Among these, noteworthy hotspots are found in MERS-CoV and Sarbecoviruses (at spike, Nucleocapsid and ORF8). In summary, we have developed a fast pipeline to measure recombination that, combined with other approaches, has allowed us to find both common and lineage-specific patterns of recombination among viruses with potential relevance in viral adaptation.
Collapse
Affiliation(s)
- Juan Ángel Patiño-Galindo
- Program for Mathematical Genomics, Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, NY, USA
| | - Ioan Filip
- Program for Mathematical Genomics, Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, NY, USA
| | - Raul Rabadan
- Program for Mathematical Genomics, Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
7
|
Monteiro B, Arenas M, Prata MJ, Amorim A. Evolutionary dynamics of the human pseudoautosomal regions. PLoS Genet 2021; 17:e1009532. [PMID: 33872316 PMCID: PMC8084340 DOI: 10.1371/journal.pgen.1009532] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 04/29/2021] [Accepted: 04/06/2021] [Indexed: 01/19/2023] Open
Abstract
Recombination between the X and Y human sex chromosomes is limited to the two pseudoautosomal regions (PARs) that present quite distinct evolutionary origins. Despite the crucial importance for male meiosis, genetic diversity patterns and evolutionary dynamics of these regions are poorly understood. In the present study, we analyzed and compared the genetic diversity of the PAR regions using publicly available genomic sequences encompassing both PAR1 and PAR2. Comparisons were performed through allele diversities, linkage disequilibrium status and recombination frequencies within and between X and Y chromosomes. In agreement with previous studies, we confirmed the role of PAR1 as a male-specific recombination hotspot, but also observed similar characteristic patterns of diversity in both regions although male recombination occurs at PAR2 to a much lower extent (at least one recombination event at PAR1 and in ≈1% in normal male meioses at PAR2). Furthermore, we demonstrate that both PARs harbor significantly different allele frequencies between X and Y chromosomes, which could support that recombination is not sufficient to homogenize the pseudoautosomal gene pool or is counterbalanced by other evolutionary forces. Nevertheless, the observed patterns of diversity are not entirely explainable by sexually antagonistic selection. A better understanding of such processes requires new data from intergenerational transmission studies of PARs, which would be decisive on the elucidation of PARs evolution and their role in male-driven heterosomal aneuploidies.
Collapse
Affiliation(s)
- Bruno Monteiro
- Institute of Investigation and Innovation in Health (i3S). University of Porto, Porto, Portugal
- Institute of Molecular Pathology and Immunology (IPATIMUP), University of Porto, Porto, Portugal
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
- CINBIO (Biomedical Research Centre), University of Vigo, Vigo, Spain
| | - Maria João Prata
- Institute of Investigation and Innovation in Health (i3S). University of Porto, Porto, Portugal
- Institute of Molecular Pathology and Immunology (IPATIMUP), University of Porto, Porto, Portugal
- Faculty of Sciences, University of Porto, Porto, Portugal
- * E-mail:
| | - António Amorim
- Institute of Investigation and Innovation in Health (i3S). University of Porto, Porto, Portugal
- Institute of Molecular Pathology and Immunology (IPATIMUP), University of Porto, Porto, Portugal
- Faculty of Sciences, University of Porto, Porto, Portugal
| |
Collapse
|
8
|
Del Amparo R, Branco C, Arenas J, Vicens A, Arenas M. Analysis of selection in protein-coding sequences accounting for common biases. Brief Bioinform 2021; 22:6105943. [PMID: 33479739 DOI: 10.1093/bib/bbaa431] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/17/2020] [Accepted: 12/22/2020] [Indexed: 12/16/2022] Open
Abstract
The evolution of protein-coding genes is usually driven by selective processes, which favor some evolutionary trajectories over others, optimizing the subsequent protein stability and activity. The analysis of selection in this type of genetic data is broadly performed with the metric nonsynonymous/synonymous substitution rate ratio (dN/dS). However, most of the well-established methodologies to estimate this metric make crucial assumptions, such as lack of recombination or invariable codon frequencies along genes, which can bias the estimation. Here, we review the most relevant biases in the dN/dS estimation and provide a detailed guide to estimate this metric using state-of-the-art procedures that account for such biases, along with illustrative practical examples and recommendations. We also discuss the traditional interpretation of the estimated dN/dS emphasizing the importance of considering complementary biological information such as the role of the observed substitutions on the stability and function of proteins. This review is oriented to help evolutionary biologists that aim to accurately estimate selection in protein-coding sequences.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Jesús Arenas
- Unit of Microbiology and Immunology, University of Zaragoza, 50013 Zaragoza, Spain
| | - Alberto Vicens
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
9
|
Currat M, Arenas M, Quilodràn CS, Excoffier L, Ray N. SPLATCHE3: simulation of serial genetic data under spatially explicit evolutionary scenarios including long-distance dispersal. Bioinformatics 2020; 35:4480-4483. [PMID: 31077292 PMCID: PMC6821363 DOI: 10.1093/bioinformatics/btz311] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 04/18/2019] [Accepted: 04/29/2019] [Indexed: 01/25/2023] Open
Abstract
SUMMARY SPLATCHE3 simulates genetic data under a variety of spatially explicit evolutionary scenarios, extending previous versions of the framework. The new capabilities include long-distance migration, spatially and temporally heterogeneous short-scale migrations, alternative hybridization models, simulation of serial samples of genetic data and a large variety of DNA mutation models. These implementations have been applied independently to various studies, but grouped together in the current version. AVAILABILITY AND IMPLEMENTATION SPLATCHE3 is written in C++ and is freely available for non-commercial use from the website http://www.splatche.com/splatche3. It includes console versions for Linux, MacOs and Windows and a user-friendly GUI for Windows, as well as detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- Mathias Currat
- Laboratory of Anthropology, Genetics and Peopling History, Department of Genetics and Evolution - Anthropology Unit, University of Geneva, Geneva 1205, Switzerland.,Institute of Genetics and Genomics in Geneva (IGE3), University of Geneva, Geneva 1211, Switzerland
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, Vigo 36310, Spain.,Biomedical Research Center (CINBIO), University of Vigo, Vigo 36310, Spain
| | - Claudio S Quilodràn
- Laboratory of Anthropology, Genetics and Peopling History, Department of Genetics and Evolution - Anthropology Unit, University of Geneva, Geneva 1205, Switzerland
| | - Laurent Excoffier
- Computational and Molecular Population Genetics Laboratory, Institute of Ecology and Evolution, University of Bern, Bern 3012, Switzerland.,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Nicolas Ray
- Institute of Global Health, GeoHealth Group, University of Geneva, Geneva 1205, Switzerland.,Institute for Environmental Sciences, University of Geneva, Geneva 1205, Switzerland
| |
Collapse
|
10
|
Pérez-Losada M, Arenas M, Galán JC, Bracho MA, Hillung J, García-González N, González-Candelas F. High-throughput sequencing (HTS) for the analysis of viral populations. INFECTION GENETICS AND EVOLUTION 2020; 80:104208. [PMID: 32001386 DOI: 10.1016/j.meegid.2020.104208] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 01/21/2020] [Accepted: 01/24/2020] [Indexed: 12/12/2022]
Abstract
The development of High-Throughput Sequencing (HTS) technologies is having a major impact on the genomic analysis of viral populations. Current HTS platforms can capture nucleic acid variation across millions of genes for both selected amplicons and full viral genomes. HTS has already facilitated the discovery of new viruses, hinted new taxonomic classifications and provided a deeper and broader understanding of their diversity, population and genetic structure. Hence, HTS has already replaced standard Sanger sequencing in basic and applied research fields, but the next step is its implementation as a routine technology for the analysis of viruses in clinical settings. The most likely application of this implementation will be the analysis of viral genomics, because the huge population sizes, high mutation rates and very fast replacement of viral populations have demonstrated the limited information obtained with Sanger technology. In this review, we describe new technologies and provide guidelines for the high-throughput sequencing and genetic and evolutionary analyses of viral populations and metaviromes, including software applications. With the development of new HTS technologies, new and refurbished molecular and bioinformatic tools are also constantly being developed to process and integrate HTS data. These allow assembling viral genomes and inferring viral population diversity and dynamics. Finally, we also present several applications of these approaches to the analysis of viral clinical samples including transmission clusters and outbreak characterization.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain; Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain.
| | - Juan Carlos Galán
- Microbiology Service, Hospital Ramón y Cajal, Madrid, Spain; CIBER in Epidemiology and Public Health, Spain.
| | - Mª Alma Bracho
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain.
| | - Julia Hillung
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Neris García-González
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Fernando González-Candelas
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| |
Collapse
|
11
|
Del Amparo R, Vicens A, Arenas M. The influence of heterogeneous codon frequencies along sequences on the estimation of molecular adaptation. Bioinformatics 2020; 36:430-436. [PMID: 31304972 DOI: 10.1093/bioinformatics/btz558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 07/08/2019] [Accepted: 07/11/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The nonsynonymous/synonymous substitution rate ratio (dN/dS) is a commonly used parameter to quantify molecular adaptation in protein-coding data. It is known that the estimation of dN/dS can be biased if some evolutionary processes are ignored. In this concern, common ML methods to estimate dN/dS assume invariable codon frequencies among sites, despite this characteristic is rare in nature, and it could bias the estimation of this parameter. RESULTS Here we studied the influence of variable codon frequencies among genetic regions on the estimation of dN/dS. We explored scenarios varying the number of genetic regions that differ in codon frequencies, the amount of variability of codon frequencies among regions and the nucleotide frequencies at each codon position among regions. We found that ignoring heterogeneous codon frequencies among regions overall leads to underestimation of dN/dS and the bias increases with the level of heterogeneity of codon frequencies. Interestingly, we also found that varying nucleotide frequencies among regions at the first or second codon position leads to underestimation of dN/dS while variation at the third codon position leads to overestimation of dN/dS. Next, we present a methodology to reduce this bias based on the analysis of partitions presenting similar codon frequencies and we applied it to analyze four real datasets. We conclude that accounting for heterogeneous codon frequencies along sequences is required to obtain realistic estimates of molecular adaptation through this relevant evolutionary parameter. AVAILABILITY AND IMPLEMENTATION The applied frameworks for the computer simulations of protein-coding data and estimation of molecular adaptation are SGWE and PAML, respectively. Both are publicly available and referenced in the study. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Roberto Del Amparo
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| | - Alberto Vicens
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
12
|
Pérez-Losada M, Arenas M, Castro-Nallar E. Microbial sequence typing in the genomic era. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2018; 63:346-359. [PMID: 28943406 PMCID: PMC5908768 DOI: 10.1016/j.meegid.2017.09.022] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Revised: 09/18/2017] [Accepted: 09/19/2017] [Indexed: 12/18/2022]
Abstract
Next-generation sequencing (NGS), also known as high-throughput sequencing, is changing the field of microbial genomics research. NGS allows for a more comprehensive analysis of the diversity, structure and composition of microbial genes and genomes compared to the traditional automated Sanger capillary sequencing at a lower cost. NGS strategies have expanded the versatility of standard and widely used typing approaches based on nucleotide variation in several hundred DNA sequences and a few gene fragments (MLST, MLVA, rMLST and cgMLST). NGS can now accommodate variation in thousands or millions of sequences from selected amplicons to full genomes (WGS, NGMLST and HiMLST). To extract signals from high-dimensional NGS data and make valid statistical inferences, novel analytic and statistical techniques are needed. In this review, we describe standard and new approaches for microbial sequence typing at gene and genome levels and guidelines for subsequent analysis, including methods and computational frameworks. We also present several applications of these approaches to some disciplines, namely genotyping, phylogenetics and molecular epidemiology.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Ashburn, VA 20147, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal; Children's National Medical Center, Washington, DC 20010, USA.
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Eduardo Castro-Nallar
- Universidad Andrés Bello, Center for Bioinformatics and Integrative Biology, Facultad de Ciencias Biológicas, Santiago 8370146, Chile
| |
Collapse
|
13
|
Arenas M, Araujo NM, Branco C, Castelhano N, Castro-Nallar E, Pérez-Losada M. Mutation and recombination in pathogen evolution: Relevance, methods and controversies. INFECTION GENETICS AND EVOLUTION 2017; 63:295-306. [PMID: 28951202 DOI: 10.1016/j.meegid.2017.09.029] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 09/20/2017] [Accepted: 09/21/2017] [Indexed: 02/06/2023]
Abstract
Mutation and recombination drive the evolution of most pathogens by generating the genetic variants upon which selection operates. Those variants can, for example, confer resistance to host immune systems and drug therapies or lead to epidemic outbreaks. Given their importance, diverse evolutionary studies have investigated the abundance and consequences of mutation and recombination in pathogen populations. However, some controversies persist regarding the contribution of each evolutionary force to the development of particular phenotypic observations (e.g., drug resistance). In this study, we revise the importance of mutation and recombination in the evolution of pathogens at both intra-host and inter-host levels. We also describe state-of-the-art analytical methodologies to detect and quantify these two evolutionary forces, including biases that are often ignored in evolutionary studies. Finally, we present some of our former studies involving pathogenic taxa where mutation and recombination played crucial roles in the recovery of pathogenic fitness, the generation of interspecific genetic diversity, or the design of centralized vaccines. This review also illustrates several common controversies and pitfalls in the analysis and in the evaluation and interpretation of mutation and recombination outcomes.
Collapse
Affiliation(s)
- Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain; Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
| | - Natalia M Araujo
- Laboratory of Molecular Virology, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Brazil.
| | - Catarina Branco
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
| | - Nadine Castelhano
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
| | - Eduardo Castro-Nallar
- Universidad Andrés Bello, Center for Bioinformatics and Integrative Biology, Facultad de Ciencias Biológicas, Santiago, Chile.
| | - Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Ashburn, VA 20147, Washington, DC, United States; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal.
| |
Collapse
|
14
|
Castelhano N, Araujo NM, Arenas M. Heterogeneous recombination among Hepatitis B virus genotypes. INFECTION GENETICS AND EVOLUTION 2017; 54:486-490. [PMID: 28827173 DOI: 10.1016/j.meegid.2017.08.015] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Revised: 08/13/2017] [Accepted: 08/17/2017] [Indexed: 02/06/2023]
Abstract
The rapid evolution of Hepatitis B virus (HBV) through both evolutionary forces, mutation and recombination, allows this virus to generate a large variety of adapted variants at both intra and inter-host levels. It can, for instance, generate drug resistance or the diverse viral genotypes that currently exist in the HBV epidemics. Concerning the latter, it is known that recombination played a major role in the emergence and genetic diversification of novel genotypes. In this regard, the quantification of viral recombination in each genotype can provide relevant information to devise expectations about the evolutionary trends of the epidemic. Here we measured the amount of this evolutionary force by estimating global and local recombination rates in >4700 HBV complete genome sequences corresponding to nine (A to I) HBV genotypes. Counterintuitively, we found that genotype E presents extremely high levels of recombination, followed by genotypes B and C. On the other hand, genotype G presents the lowest level, where recombination is almost negligible. We discuss these findings in the light of known characteristics of these genotypes. Additionally, we present a phylogenetic network to depict the evolutionary history of the studied HBV genotypes. This network clearly classified all genotypes into specific groups and indicated that diverse pairs of genotypes are derived from a common ancestor (i.e., C-I, D-E and, F-H) although still the origin of this virus presented large uncertainty. Altogether we conclude that the amount of observed recombination is heterogeneous among HBV genotypes and that this heterogeneity can influence on the future expansion of the epidemic.
Collapse
Affiliation(s)
- Nadine Castelhano
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
| | - Natalia M Araujo
- Laboratory of Molecular Virology, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Brazil.
| | - Miguel Arenas
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal; Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain.
| |
Collapse
|
15
|
Gärtner K, Futschik A. Improved Versions of Common Estimators of the Recombination Rate. J Comput Biol 2016; 23:756-68. [PMID: 27409412 DOI: 10.1089/cmb.2016.0039] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The scaled recombination parameter [Formula: see text] is one of the key parameters, turning up frequently in population genetic models. Accurate estimates of [Formula: see text] are difficult to obtain, as recombination events do not always leave traces in the data. One of the most widely used approaches is composite likelihood. Here, we show that popular implementations of composite likelihood estimators can often be uniformly improved by optimizing the trade-off between bias and variance. The amount of possible improvement depends on parameters such as the sequence length, the sample size, and the mutation rate, and it can be considerable in some cases. It turns out that approximate Bayesian computation, with composite likelihood as a summary statistic, also leads to improved estimates, but now in terms of the posterior risk. Finally, we demonstrate a practical application on real data from Drosophila.
Collapse
Affiliation(s)
- Kerstin Gärtner
- 1 Vienna Graduate School of Population Genetics , Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | - Andreas Futschik
- 2 Department of Applied Statistics, Johannes Kepler University , Linz, Austria
| |
Collapse
|
16
|
Arenas M. Trends in substitution models of molecular evolution. Front Genet 2015; 6:319. [PMID: 26579193 PMCID: PMC4620419 DOI: 10.3389/fgene.2015.00319] [Citation(s) in RCA: 79] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 10/09/2015] [Indexed: 11/13/2022] Open
Abstract
Substitution models of evolution describe the process of genetic variation through fixed mutations and constitute the basis of the evolutionary analysis at the molecular level. Almost 40 years after the development of first substitution models, highly sophisticated, and data-specific substitution models continue emerging with the aim of better mimicking real evolutionary processes. Here I describe current trends in substitution models of DNA, codon and amino acid sequence evolution, including advantages and pitfalls of the most popular models. The perspective concludes that despite the large number of currently available substitution models, further research is required for more realistic modeling, especially for DNA coding and amino acid data. Additionally, the development of more accurate complex models should be coupled with new implementations and improvements of methods and frameworks for substitution model selection and downstream evolutionary analysis.
Collapse
Affiliation(s)
- Miguel Arenas
- Institute of Molecular Pathology and Immunology of the University of Porto Porto, Portugal
| |
Collapse
|
17
|
Arenas M, Lorenzo-Redondo R, Lopez-Galindez C. Influence of mutation and recombination on HIV-1 in vitro fitness recovery. Mol Phylogenet Evol 2015; 94:264-70. [PMID: 26358613 DOI: 10.1016/j.ympev.2015.09.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 08/31/2015] [Accepted: 09/01/2015] [Indexed: 10/23/2022]
Abstract
The understanding of the evolutionary processes underlying HIV-1 fitness recovery is fundamental for HIV-1 pathogenesis, antiretroviral treatment and vaccine design. It is known that HIV-1 can present very high mutation and recombination rates, however the specific contribution of these evolutionary forces in the "in vitro" viral fitness recovery has not been simultaneously quantified. To this aim, we analyzed substitution, recombination and molecular adaptation rates in a variety of HIV-1 biological clones derived from a viral isolate after severe population bottlenecks and a number of large population cell culture passages. These clones presented an overall but uneven fitness gain, mean of 3-fold, respect to the initial passage values. We found a significant relationship between the fitness increase and the appearance and fixation of mutations. In addition, these fixed mutations presented molecular signatures of positive selection through the accumulation of non-synonymous substitutions. Interestingly, viral recombination correlated with fitness recovery in most of studied viral quasispecies. The genetic diversity generated by these evolutionary processes was positively correlated with the viral fitness. We conclude that HIV-1 fitness recovery can be derived from the genetic heterogeneity generated through both mutation and recombination, and under diversifying molecular adaptation. The findings also suggest nonrandom evolutionary pathways for in vitro fitness recovery.
Collapse
Affiliation(s)
- Miguel Arenas
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal; Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain.
| | - Ramon Lorenzo-Redondo
- Centro Nacional de Microbiología (CNM), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain.
| | - Cecilio Lopez-Galindez
- Centro Nacional de Microbiología (CNM), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain.
| |
Collapse
|
18
|
Araujo NM. Hepatitis B virus intergenotypic recombinants worldwide: An overview. INFECTION GENETICS AND EVOLUTION 2015; 36:500-510. [PMID: 26299884 DOI: 10.1016/j.meegid.2015.08.024] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Revised: 08/11/2015] [Accepted: 08/19/2015] [Indexed: 02/07/2023]
Abstract
Novel variants generated by recombination events between different hepatitis B virus (HBV) genotypes have been increasingly documented worldwide, and the role of recombination in the evolutionary history of HBV is of significant research interest. In the present study, large-scale data retrieval and analysis on HBV intergenotypic recombinant genomes were performed. The geographical distribution of HBV recombinants as well as the molecular processes involved in recombination were examined. After review of published data, a total of 436 complete HBV sequences, previously identified as recombinants, were included in the recombination detection analysis. About 60% of HBV recombinants were B/C (n=179) and C/D (n=83) hybrids. A/B/C, A/C, A/C/G, A/D, A/E, A/G, B/C/U (U=unknown genotype), C/F, C/G, C/J, D/E, D/F, and F/G hybrids were additionally identified. HBV intergenotypic sequences were reported in almost all geographical regions with similar circulation patterns as their original genotypes, indicating the potential for spreading in a wide range of human populations and developing their own epidemiology. Recombination breakpoints were non-randomly distributed in the genome, and specific favored sites detected, such as within nt 1700-2000 and 2100-2300 regions, which displayed a statistically significant difference in comparison with the remaining genome. Elucidation of the effects of recombination events on the evolutionary history of HBV is critical to understand current and future evolution trends.
Collapse
Affiliation(s)
- Natalia M Araujo
- Laboratório de Virologia Molecular, Instituto Oswaldo Cruz, FIOCRUZ, Av. Brasil 4365, 21040-360 Rio de Janeiro, RJ, Brazil.
| |
Collapse
|
19
|
Arenas M. Advances in computer simulation of genome evolution: toward more realistic evolutionary genomics analysis by approximate bayesian computation. J Mol Evol 2015; 80:189-92. [PMID: 25808249 DOI: 10.1007/s00239-015-9673-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 03/19/2015] [Indexed: 11/29/2022]
Abstract
NGS technologies present a fast and cheap generation of genomic data. Nevertheless, ancestral genome inference is not so straightforward due to complex evolutionary processes acting on this material such as inversions, translocations, and other genome rearrangements that, in addition to their implicit complexity, can co-occur and confound ancestral inferences. Recently, models of genome evolution that accommodate such complex genomic events are emerging. This letter explores these novel evolutionary models and proposes their incorporation into robust statistical approaches based on computer simulations, such as approximate Bayesian computation, that may produce a more realistic evolutionary analysis of genomic data. Advantages and pitfalls in using these analytical methods are discussed. Potential applications of these ancestral genomic inferences are also pointed out.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Universidad Autónoma de Madrid (CSIC-UAM), C/Nicolás Cabrera, 1, Cantoblanco, 28049, Madrid, Spain,
| |
Collapse
|
20
|
Pérez-Losada M, Arenas M, Galán JC, Palero F, González-Candelas F. Recombination in viruses: mechanisms, methods of study, and evolutionary consequences. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2015; 30:296-307. [PMID: 25541518 PMCID: PMC7106159 DOI: 10.1016/j.meegid.2014.12.022] [Citation(s) in RCA: 198] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2014] [Revised: 12/15/2014] [Accepted: 12/17/2014] [Indexed: 02/08/2023]
Abstract
Recombination is a pervasive process generating diversity in most viruses. It joins variants that arise independently within the same molecule, creating new opportunities for viruses to overcome selective pressures and to adapt to new environments and hosts. Consequently, the analysis of viral recombination attracts the interest of clinicians, epidemiologists, molecular biologists and evolutionary biologists. In this review we present an overview of three major areas related to viral recombination: (i) the molecular mechanisms that underlie recombination in model viruses, including DNA-viruses (Herpesvirus) and RNA-viruses (Human Influenza Virus and Human Immunodeficiency Virus), (ii) the analytical procedures to detect recombination in viral sequences and to determine the recombination breakpoints, along with the conceptual and methodological tools currently used and a brief overview of the impact of new sequencing technologies on the detection of recombination, and (iii) the major areas in the evolutionary analysis of viral populations on which recombination has an impact. These include the evaluation of selective pressures acting on viral populations, the application of evolutionary reconstructions in the characterization of centralized genes for vaccine design, and the evaluation of linkage disequilibrium and population structure.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Portugal; Computational Biology Institute, George Washington University, Ashburn, VA 20147, USA
| | - Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Juan Carlos Galán
- Servicio de Microbiología, Hospital Ramón y Cajal and Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain; CIBER en Epidemiología y Salud Pública, Spain
| | - Ferran Palero
- CIBER en Epidemiología y Salud Pública, Spain; Unidad Mixta Infección y Salud Pública, FISABIO-Universitat de València, Valencia, Spain
| | - Fernando González-Candelas
- CIBER en Epidemiología y Salud Pública, Spain; Unidad Mixta Infección y Salud Pública, FISABIO-Universitat de València, Valencia, Spain.
| |
Collapse
|
21
|
Arenas M, Lopes JS, Beaumont MA, Posada D. CodABC: a computational framework to coestimate recombination, substitution, and molecular adaptation rates by approximate Bayesian computation. Mol Biol Evol 2015; 32:1109-12. [PMID: 25577191 PMCID: PMC4379410 DOI: 10.1093/molbev/msu411] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The estimation of substitution and recombination rates can provide important insights into the molecular evolution of protein-coding sequences. Here, we present a new computational framework, called "CodABC," to jointly estimate recombination, substitution and synonymous and nonsynonymous rates from coding data. CodABC uses approximate Bayesian computation with and without regression adjustment and implements a variety of codon models, intracodon recombination, and longitudinal sampling. CodABC can provide accurate joint parameter estimates from recombining coding sequences, often outperforming maximum-likelihood methods based on more approximate models. In addition, CodABC allows for the inclusion of several nuisance parameters such as those representing codon frequencies, transition matrices, heterogeneity across sites or invariable sites. CodABC is freely available from http://code.google.com/p/codabc/, includes a GUI, extensive documentation and ready-to-use examples, and can run in parallel on multicore machines.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa," Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, Vigo, Spain
| | - Joao S Lopes
- Instituto Gulbenkian de Ciencia, Oeiras, Portugal
| | - Mark A Beaumont
- School of Mathematical Sciences and School of Biological Sciences, University of Bristol, University Walk, Bristol, United Kingdom
| | - David Posada
- Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, Vigo, Spain
| |
Collapse
|
22
|
Hart MW. Models of selection, isolation, and gene flow in speciation. THE BIOLOGICAL BULLETIN 2014; 227:133-145. [PMID: 25411372 DOI: 10.1086/bblv227n2p133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Many marine ecologists aspire to use genetic data to understand how selection and demographic history shape the evolution of diverging populations as they become reproductively isolated species. I propose combining two types of genetic analysis focused on this key early stage of the speciation process to identify the selective agents directly responsible for population divergence. Isolation-with-migration (IM) models can be used to characterize reproductive isolation between populations (low gene flow), while codon models can be used to characterize selection for population differences at the molecular level (especially positive selection for high rates of amino acid substitution). Accessible transcriptome sequencing methods can generate the large quantities of data needed for both types of analysis. I highlight recent examples (including our work on fertilization genes in sea stars) in which this confluence of interest, models, and data has led to taxonomically broad advances in understanding marine speciation at the molecular level. I also highlight new models that incorporate both demography and selection: simulations based on these theoretical advances suggest that polymorphisms shared among individuals (a key source of information in IM models) may lead to false-positive evidence of selection (in codon models), especially during the early stages of population divergence and speciation that are most in need of study. The false-positive problem may be resolved through a combination of model improvements plus experiments that document the phenotypic and fitness effects of specific polymorphisms for which codon models and IM models indicate selection and reproductive isolation (such as genes that mediate sperm-egg compatibility at fertilization).
Collapse
Affiliation(s)
- Michael W Hart
- Department of Biological Sciences, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| |
Collapse
|
23
|
Benguigui M, Arenas M. Spatial and temporal simulation of human evolution. Methods, frameworks and applications. Curr Genomics 2014; 15:245-55. [PMID: 25132795 PMCID: PMC4133948 DOI: 10.2174/1389202915666140506223639] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2014] [Revised: 04/05/2014] [Accepted: 05/04/2014] [Indexed: 01/29/2023] Open
Abstract
Analyses of human evolution are fundamental to understand the current gradients of human diversity. In this concern, genetic samples collected from current populations together with archaeological data are the most important resources to study human evolution. However, they are often insufficient to properly evaluate a variety of evolutionary scenarios, leading to continuous debates and discussions. A commonly applied strategy consists of the use of computer simulations based on, as realistic as possible, evolutionary models, to evaluate alternative evolutionary scenarios through statistical correlations with the real data. Computer simulations can also be applied to estimate evolutionary parameters or to study the role of each parameter on the evolutionary process. Here we review the mainly used methods and evolutionary frameworks to perform realistic spatially explicit computer simulations of human evolution. Although we focus on human evolution, most of the methods and software we describe can also be used to study other species. We also describe the importance of considering spatially explicit models to better mimic human evolutionary scenarios based on a variety of phenomena such as range expansions, range shifts, range contractions, sex-biased dispersal, long-distance dispersal or admixtures of populations. We finally discuss future implementations to improve current spatially explicit simulations and their derived applications in human evolution.
Collapse
Affiliation(s)
- Macarena Benguigui
- Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| |
Collapse
|
24
|
Arenas M, Posada D. Simulation of genome-wide evolution under heterogeneous substitution models and complex multispecies coalescent histories. Mol Biol Evol 2014; 31:1295-301. [PMID: 24557445 PMCID: PMC3995339 DOI: 10.1093/molbev/msu078] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Genomic evolution can be highly heterogeneous. Here, we introduce a new framework to simulate genome-wide sequence evolution under a variety of substitution models that may change along the genome and the phylogeny, following complex multispecies coalescent histories that can include recombination, demographics, longitudinal sampling, population subdivision/species history, and migration. A key aspect of our simulation strategy is that the heterogeneity of the whole evolutionary process can be parameterized according to statistical prior distributions specified by the user. We used this framework to carry out a study of the impact of variable codon frequencies across genomic regions on the estimation of the genome-wide nonsynonymous/synonymous ratio. We found that both variable codon frequencies across genes and rate variation among sites and regions can lead to severe underestimation of the global dN/dS values. The program SGWE—Simulation of Genome-Wide Evolution—is freely available from http://code.google.com/p/sgwe-project/, including extensive documentation and detailed examples.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa," Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | | |
Collapse
|
25
|
Arenas M. The importance and application of the ancestral recombination graph. Front Genet 2013; 4:206. [PMID: 24133504 PMCID: PMC3796270 DOI: 10.3389/fgene.2013.00206] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2013] [Accepted: 09/24/2013] [Indexed: 11/13/2022] Open
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology “Severo Ochoa,” Consejo Superior de Investigaciones Científicas, Universidad Autónoma de MadridMadrid, Spain
| |
Collapse
|