1
|
Wu X, Rai SN, Weber GF. Beyond mutations: Accounting for quantitative changes in the analysis of protein evolution. Comput Struct Biotechnol J 2024; 23:2637-2647. [PMID: 39021584 PMCID: PMC11253266 DOI: 10.1016/j.csbj.2024.06.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 07/20/2024] Open
Abstract
Molecular phylogenetic research has relied on the analysis of the coding sequences by genes or of the amino acid sequences by the encoded proteins. Enumerating the numbers of mismatches, being indicators of mutation, has been central to pertinent algorithms. Specific amino acids possess quantifiable characteristics that enable the conversion from "words" (strings of letters denoting amino acids or bases) to "waves" (strings of quantitative values representing the physico-chemical properties) or to matrices (coordinates representing the positions in a comprehensive property space). The application of such numerical representations to evolutionary analysis takes into account not only the occurrence of mutations but also their properties as influences that drive speciation, because selective pressures favor certain mutations over others, and this predilection is represented in the characteristics of the incorporated amino acids (it is not born out solely by the mismatches). Besides being more discriminating sources for tree-generating algorithms than match/mismatch, the number strings can be examined for overall similarity with average mutual information, autocorrelation, and fractal dimension. Bivariate wavelet analysis aids in distinguishing hypermutable versus conserved domains of the protein. The matrix depiction is readily subjected to comparisons of distances, and it allows the generation of heat maps or graphs. This analysis preserves the accepted taxa order where tree construction with standard approaches yields conflicting results (for the protein S100A6). It also aids hypothesis generation about the origin of mitochondrial proteins. These analytical algorithms have been automated in R and are applicable to various processes that are describable in matrix format.
Collapse
Affiliation(s)
- Xiaoyong Wu
- Biostatistics and Informatics Shared Resources, University of Cincinnati Cancer Center, College of Medicine, Cincinnati, OH, USA
- Cancer Data Science Center, University of Cincinnati College of Medicine Department of Biostatistics, Health Informatice and Data Sciences, Cincinnati, OH, USA
| | - Shesh N. Rai
- Biostatistics and Informatics Shared Resources, University of Cincinnati Cancer Center, College of Medicine, Cincinnati, OH, USA
- Cancer Data Science Center, University of Cincinnati College of Medicine Department of Biostatistics, Health Informatice and Data Sciences, Cincinnati, OH, USA
| | - Georg F. Weber
- University of Cincinnati Cancer Center, College of Pharmacy, Cincinnati, OH, USA
| |
Collapse
|
2
|
Myburgh AM, Barnes A, Henriques R, Daniels SR. Congruent patterns of cryptic cladogenesis revealed using RADseq and Sanger sequencing in a velvet worm species complex (Onychophora: Peripatopsidae: Peripatopsis sedgwicki). Mol Phylogenet Evol 2024; 198:108132. [PMID: 38909874 DOI: 10.1016/j.ympev.2024.108132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 05/24/2024] [Accepted: 06/15/2024] [Indexed: 06/25/2024]
Abstract
In the present study, first generation DNA sequencing (mitochondrial cytochrome c oxidase subunit one, COI) and reduced-representative genomic RADseq data were used to understand the patterns and processes of diversification of the velvet worm, Peripatopsis sedgwicki species complex across its distribution range in South Africa. For the RADseq data, three datasets (two primary and one supplementary) were generated corresponding to 1,259-11,468 SNPs, in order to assess the diversity and phylogeography of the species complex. Tree topologies for the two primary datasets were inferred using maximum likelihood and Bayesian inferences methods. Phylogenetic analyses using the COI datasets retrieved four distinct, well-supported clades within the species complex. Five species delimitation methods applied to the COI data (ASAP, bPTP, bGMYC, STACEY and iBPP) all showed support for the distinction of the Fort Fordyce Nature Reserve specimens. In the main P. sedgwicki species complex, the species delimitation methods revealed a variable number of operational taxonomic units and overestimated the number of putative taxa. Divergence time estimates coupled with the geographic exclusivity of species and phylogeographic results suggest recent cladogenesis during the Plio/Pleistocene. The RADseq data were subjected to a principal components analysis and a discriminant analysis of principal components, under a maximum-likelihood framework. The latter results corroborate the four main clades observed using the COI data, however, applying additional filtering revealed additional diversity. The high overall congruence observed between the RADseq data and COI data suggest that first generation sequence data remain a cheap and effective method for evolutionary studies, although RADseq does provide a far greater resolution of contemporary temporo-spatial patterns.
Collapse
Affiliation(s)
- Angus Macgregor Myburgh
- Department of Botany and Zoology, Private Bag X1, Stellenbosch University, 7602, South Africa
| | - Aaron Barnes
- Department of Botany and Zoology, Private Bag X1, Stellenbosch University, 7602, South Africa
| | - Romina Henriques
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, South Africa
| | - Savel R Daniels
- Department of Botany and Zoology, Private Bag X1, Stellenbosch University, 7602, South Africa.
| |
Collapse
|
3
|
Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
4
|
Ferreiro D, Khalil R, Sousa SF, Arenas M. Substitution Models of Protein Evolution with Selection on Enzymatic Activity. Mol Biol Evol 2024; 41:msae026. [PMID: 38314876 PMCID: PMC10873502 DOI: 10.1093/molbev/msae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/25/2024] [Accepted: 01/31/2024] [Indexed: 02/07/2024] Open
Abstract
Substitution models of evolution are necessary for diverse evolutionary analyses including phylogenetic tree and ancestral sequence reconstructions. At the protein level, empirical substitution models are traditionally used due to their simplicity, but they ignore the variability of substitution patterns among protein sites. Next, in order to improve the realism of the modeling of protein evolution, a series of structurally constrained substitution models were presented, but still they usually ignore constraints on the protein activity. Here, we present a substitution model of protein evolution with selection on both protein structure and enzymatic activity, and that can be applied to phylogenetics. In particular, the model considers the binding affinity of the enzyme-substrate complex as well as structural constraints that include the flexibility of structural flaps, hydrogen bonds, amino acids backbone radius of gyration, and solvent-accessible surface area that are quantified through molecular dynamics simulations. We applied the model to the HIV-1 protease and evaluated it by phylogenetic likelihood in comparison with the best-fitting empirical substitution model and a structurally constrained substitution model that ignores the enzymatic activity. We found that accounting for selection on the protein activity improves the fitting of the modeled functional regions with the real observations, especially in data with high molecular identity, which recommends considering constraints on the protein activity in the development of substitution models of evolution.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Ruqaiya Khalil
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Sergio F Sousa
- UCIBIO/REQUIMTE, BioSIM, Departamento de Biomedicina, Faculdade de Medicina da Universidade do Porto, 4200-319 Porto, Portugal
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
5
|
Steenwyk JL, Li Y, Zhou X, Shen XX, Rokas A. Incongruence in the phylogenomics era. Nat Rev Genet 2023; 24:834-850. [PMID: 37369847 DOI: 10.1038/s41576-023-00620-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2023] [Indexed: 06/29/2023]
Abstract
Genome-scale data and the development of novel statistical phylogenetic approaches have greatly aided the reconstruction of a broad sketch of the tree of life and resolved many of its branches. However, incongruence - the inference of conflicting evolutionary histories - remains pervasive in phylogenomic data, hampering our ability to reconstruct and interpret the tree of life. Biological factors, such as incomplete lineage sorting, horizontal gene transfer, hybridization, introgression, recombination and convergent molecular evolution, can lead to gene phylogenies that differ from the species tree. In addition, analytical factors, including stochastic, systematic and treatment errors, can drive incongruence. Here, we review these factors, discuss methodological advances to identify and handle incongruence, and highlight avenues for future research.
Collapse
Affiliation(s)
- Jacob L Steenwyk
- Howards Hughes Medical Institute and the Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA
| | - Yuanning Li
- Institute of Marine Science and Technology, Shandong University, Qingdao, China
| | - Xiaofan Zhou
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, China
| | - Xing-Xing Shen
- Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA.
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA.
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
| |
Collapse
|
6
|
Malyutina A, Tang J, Amiryousefi A. Resolving network clusters disparity based on dissimilarity measurements with nonmetric analysis of variance. iScience 2023; 26:108354. [PMID: 38026214 PMCID: PMC10663764 DOI: 10.1016/j.isci.2023.108354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 06/22/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023] Open
Abstract
Classic ANOVA (cA) tests the explanatory power of a partitioning on a set of objects. More fit for clusters proximity analysis, nonparametric ANOVA (npA) extends to a case where instead of the object values themselves, their mutual distances are available. However, extending the cA applicability, the metric conditions in npA are limiting. Based on the central limit theorem (CLT), here we introduce nonmetric ANOVA (nmA) that by relaxing the metric properties between objects, allows an ANOVA-like statistical testing of a network clusters disparity. We present a parametric test statistic which under the null hypothesis of no differences between the competing clusters means, follows an exact F-distribution. We apply our method on three diverse biological examples, discuss its parallel performance, and note the specific use of each method tailored by the inherent data properties. The R code is provided at github.com/AmiryousefiLab/nmANOVA.
Collapse
Affiliation(s)
- Alina Malyutina
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Jing Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Ali Amiryousefi
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
- Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA
| |
Collapse
|
7
|
Štambuk N, Konjevoda P, Brčić-Kostić K, Baković J, Štambuk A. New algorithm for the analysis of nucleotide and amino acid evolutionary relationships based on Klein four-group. Biosystems 2023; 233:105030. [PMID: 37717902 DOI: 10.1016/j.biosystems.2023.105030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 09/10/2023] [Accepted: 09/10/2023] [Indexed: 09/19/2023]
Abstract
Phylogenetics is the study of ancestral relationships among biological species. Such sequence analyses are often represented as phylogenetic trees. The branching pattern of each tree and its topology reflect the evolutionary relatedness between analyzed sequences. We present a Klein four-group algorithm (K4A) for the evolutionary analysis of nucleotide and amino acid sequences. Klein four-group set of operators consists of: identity e (U), and three elements-a = transition (C), b = transversion (G) and c = transition-transversion or complementarity (A). We generated Klein four-group based distance matrices of: 1. Cayley table (CK4), 2. Table rows (K4R), 3. Table columns (K4C), and 4. Euclidean 2D distance (K4E). The performance of the matrices was tested on a dataset of RecA proteins in bacteria, eukaryotes (Rad51 homolog) and archaea (RadA homolog). RecA and its functional homologs are found in all species, and are essential for the repair and maintenance of DNA. Consequently, they represent a good model for the study of evolutionary relationship of protein and nucleotide sequences. The ancestral relationship between the sequences was correctly classified by all K4A matrices concerning general topology. All distance matrices exhibited small variations among species, and overall results of tree classification were in agreement with the general patterns obtained by standard BLOSUM and PAM substitution matrices. During the evolution of a code there is a phase of optimization of system rules, the ambiguity of a code is eliminated, and the system starts producing specific components. Klein four-group algorithm is consistent with the concept of ambiguity reduction. It also enables the use of different genetic code table variants optimized for particular transitions in evolution based on biological specificity.
Collapse
Affiliation(s)
- Nikola Štambuk
- Centre for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Krunoslav Brčić-Kostić
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia
| | - Josip Baković
- University Hospital Dubrava, Department of Surgery, Avenija Gojka Šuška 6, HR-10000, Zagreb, Croatia
| | - Albert Štambuk
- Faculty of Kinesiology, University of Zagreb, Horvaćanski zavoj 15, HR-10000 Zagreb, Croatia
| |
Collapse
|
8
|
Goes WM, Brasil CRF, Reis-Cunha JL, Coqueiro-Dos-Santos A, Grazielle-Silva V, de Souza Reis J, Souto TC, Laranjeira-Silva MF, Bartholomeu DC, Fernandes AP, Teixeira SMR. Complete assembly, annotation of virulence genes and CRISPR editing of the genome of Leishmania amazonensis PH8 strain. Genomics 2023; 115:110661. [PMID: 37263313 DOI: 10.1016/j.ygeno.2023.110661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 05/04/2023] [Accepted: 05/27/2023] [Indexed: 06/03/2023]
Abstract
We report the sequencing and assembly of the PH8 strain of Leishmania amazonensis one of the etiological agents of leishmaniasis. After combining data from long Pacbio reads, short Illumina reads and synteny with the Leishmania mexicana genome, the sequence of 34 chromosomes with 8317 annotated genes was generated. Multigene families encoding three virulence factors, A2, amastins and the GP63 metalloproteases, were identified and compared to their annotation in other Leishmania species. As they have been recently recognized as virulence factors essential for disease establishment and progression of the infection, we also identified 14 genes encoding proteins involved in parasite iron and heme metabolism and compared to genes from other Trypanosomatids. To follow these studies with a genetic approach to address the role of virulence factors, we tested two CRISPR-Cas9 protocols to generate L. amazonensis knockout cell lines, using the Miltefosine transporter gene as a proof of concept.
Collapse
Affiliation(s)
- Wanessa Moreira Goes
- Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Avenida Antônio Carlos 6627, Belo Horizonte, MG CEP 31.270-901, Brazil
| | - Carlos Rodolpho Ferreira Brasil
- Departamento de Análises Clínicas e Toxicológicas, Universidade Federal de Minas Gerais, Avenida Antônio Carlos 6627, Belo Horizonte, MG CEP 31.270-901, Brazil
| | - João Luis Reis-Cunha
- Departamento de Veterinária Preventiva, Universidade Federal de Minas Gerais, Avenida Antônio Carlos 6627, Belo Horizonte, MG CEP 31.270-901, Brazil; Departamento de Parasitologia, Universidade Federal de Minas Gerais, Avenida Antônio Carlos 6627, Belo Horizonte, MG CEP 31.270-901, Brazil
| | - Anderson Coqueiro-Dos-Santos
- Departamento de Parasitologia, Universidade Federal de Minas Gerais, Avenida Antônio Carlos 6627, Belo Horizonte, MG CEP 31.270-901, Brazil
| | - Viviane Grazielle-Silva
- Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Avenida Antônio Carlos 6627, Belo Horizonte, MG CEP 31.270-901, Brazil
| | - Júlia de Souza Reis
- Departamento de Análises Clínicas e Toxicológicas, Universidade Federal de Minas Gerais, Avenida Antônio Carlos 6627, Belo Horizonte, MG CEP 31.270-901, Brazil
| | - Tatiane Cristina Souto
- Departamento de Análises Clínicas e Toxicológicas, Universidade Federal de Minas Gerais, Avenida Antônio Carlos 6627, Belo Horizonte, MG CEP 31.270-901, Brazil
| | - Maria Fernanda Laranjeira-Silva
- Departamento de Fisiologia, Universidade de São Paulo, Rua do Matão 101, Cidade Universitária, São Paulo, SP CEP 05508-900, Brazil
| | - Daniella Castanheira Bartholomeu
- Departamento de Parasitologia, Universidade Federal de Minas Gerais, Avenida Antônio Carlos 6627, Belo Horizonte, MG CEP 31.270-901, Brazil
| | - Ana Paula Fernandes
- Departamento de Análises Clínicas e Toxicológicas, Universidade Federal de Minas Gerais, Avenida Antônio Carlos 6627, Belo Horizonte, MG CEP 31.270-901, Brazil; Centro de Tecnologia de Vacinas, Universidade Federal de Minas Gerais, Rua Professor José Vieira de Mendonça 770, Belo Horizonte, MG, CEP 31.210-360, Brazil
| | - Santuza Maria Ribeiro Teixeira
- Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Avenida Antônio Carlos 6627, Belo Horizonte, MG CEP 31.270-901, Brazil; Centro de Tecnologia de Vacinas, Universidade Federal de Minas Gerais, Rua Professor José Vieira de Mendonça 770, Belo Horizonte, MG, CEP 31.210-360, Brazil.
| |
Collapse
|
9
|
Tan HZ, Jansen JJFJ, Allport GA, Garg KM, Chattopadhyay B, Irestedt M, Pang SEH, Chilton G, Gwee CY, Rheindt FE. Megafaunal extinctions, not climate change, may explain Holocene genetic diversity declines in Numenius shorebirds. eLife 2023; 12:e85422. [PMID: 37549057 PMCID: PMC10406428 DOI: 10.7554/elife.85422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 06/27/2023] [Indexed: 08/09/2023] Open
Abstract
Understanding the relative contributions of historical and anthropogenic factors to declines in genetic diversity is important for informing conservation action. Using genome-wide DNA of fresh and historic specimens, including that of two species widely thought to be extinct, we investigated fluctuations in genetic diversity and present the first complete phylogenomic tree for all nine species of the threatened shorebird genus Numenius, known as whimbrels and curlews. Most species faced sharp declines in effective population size, a proxy for genetic diversity, soon after the Last Glacial Maximum (around 20,000 years ago). These declines occurred prior to the Anthropocene and in spite of an increase in the breeding area predicted by environmental niche modeling, suggesting that they were not caused by climatic or recent anthropogenic factors. Crucially, these genetic diversity declines coincide with mass extinctions of mammalian megafauna in the Northern Hemisphere. Among other factors, the demise of ecosystem-engineering megafauna which maintained open habitats may have been detrimental for grassland and tundra-breeding Numenius shorebirds. Our work suggests that the impact of historical factors such as megafaunal extinction may have had wider repercussions on present-day population dynamics of open habitat biota than previously appreciated.
Collapse
Affiliation(s)
- Hui Zhen Tan
- Department of Biological Sciences, National University of SingaporeSingaporeSingapore
| | | | | | - Kritika M Garg
- Department of Biological Sciences, National University of SingaporeSingaporeSingapore
| | - Balaji Chattopadhyay
- Department of Biological Sciences, National University of SingaporeSingaporeSingapore
| | - Martin Irestedt
- Department of Bioinformatics and Genetics, Swedish Museum of Natural HistoryStockholmSweden
| | - Sean EH Pang
- Department of Biological Sciences, National University of SingaporeSingaporeSingapore
| | - Glen Chilton
- Department of Biology, St. Mary's UniversityCalgaryCanada
| | - Chyi Yin Gwee
- Department of Biological Sciences, National University of SingaporeSingaporeSingapore
| | - Frank E Rheindt
- Department of Biological Sciences, National University of SingaporeSingaporeSingapore
| |
Collapse
|
10
|
Keating JN, Garwood RJ, Sansom RS. Phylogenetic congruence, conflict and consilience between molecular and morphological data. BMC Ecol Evol 2023; 23:30. [PMID: 37403037 DOI: 10.1186/s12862-023-02131-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 06/08/2023] [Indexed: 07/06/2023] Open
Abstract
Morphology and molecules are important data sources for estimating evolutionary relationships. Modern studies often utilise morphological and molecular partitions alongside each other in combined analyses. However, the effect of combining phenomic and genomic partitions is unclear. This is exacerbated by their size imbalance, and conflict over the efficacy of different inference methods when using morphological characters. To systematically address the effect of topological incongruence, size imbalance, and tree inference methods, we conduct a meta-analysis of 32 combined (molecular + morphology) datasets across metazoa. Our results reveal that morphological-molecular topological incongruence is pervasive: these data partitions yield very different trees, irrespective of which method is used for morphology inference. Analysis of the combined data often yields unique trees that are not sampled by either partition individually, even with the inclusion of relatively small quantities of morphological characters. Differences between morphology inference methods in terms of resolution and congruence largely relate to consensus methods. Furthermore, stepping stone Bayes factor analyses reveal that morphological and molecular partitions are not consistently combinable, i.e. data partitions are not always best explained under a single evolutionary process. In light of these results, we advise that the congruence between morphological and molecular data partitions needs to be considered in combined analyses. Nonetheless, our results reveal that, for most datasets, morphology and molecules can, and should, be combined in order to best estimate evolutionary history and reveal hidden support for novel relationships. Studies that analyse only phenomic or genomic data in isolation are unlikely to provide the full evolutionary picture.
Collapse
Affiliation(s)
- Joseph N Keating
- Department of Earth and Environmental Sciences, University of Manchester, Manchester, M13 9PL, UK
- School of Earth Sciences, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol, BS8 1TQ, UK
| | - Russell J Garwood
- Department of Earth and Environmental Sciences, University of Manchester, Manchester, M13 9PL, UK
- Natural History Museum, London, SW7 5BD, UK
| | - Robert S Sansom
- Department of Earth and Environmental Sciences, University of Manchester, Manchester, M13 9PL, UK.
| |
Collapse
|
11
|
Vihinen M. Nonsynonymous Synonymous Variants Demand for a Paradigm Shift in Genetics. Curr Genomics 2023; 24:18-23. [PMID: 37920730 PMCID: PMC10334700 DOI: 10.2174/1389202924666230417101020] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 02/20/2023] [Accepted: 03/01/2023] [Indexed: 11/04/2023] Open
Abstract
Synonymous (also known as silent) variations are by definition not considered to change the coded protein. Still many variations in this category affect either protein abundance or properties. As this situation is confusing, we have recently introduced systematics for synonymous variations and those that may on the surface look like synonymous, but these may affect the coded protein in various ways. A new category, unsense variation, was introduced to describe variants that do not introduce a stop codon into the variation site, but which lead to different types of changes in the coded protein. Many of these variations lead to mRNA degradation and missing protein. Here, consequences of the systematics are discussed from the perspectives of variation annotation and interpretation, evolutionary calculations, nonsynonymous-to-synonymous substitution rates, phylogenetics and other evolutionary inferences that are based on the principle of (nearly) neutral synonymous variations. It may be necessary to reassess published results. Further, databases for synonymous variations and prediction methods for such variations should consider unsense variations. Thus, there is a need to evaluate and reflect principles of numerous aspects in genetics, ranging from variation naming and classification to evolutionary calculations.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, BMC B13, Sweden
| |
Collapse
|
12
|
Del Amparo R, Arenas M. Influence of substitution model selection on protein phylogenetic tree reconstruction. Gene 2023; 865:147336. [PMID: 36871672 DOI: 10.1016/j.gene.2023.147336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 02/22/2023] [Accepted: 02/28/2023] [Indexed: 03/06/2023]
Abstract
Probabilistic phylogenetic tree reconstruction is traditionally performed under a best-fitting substitution model of molecular evolution previously selected according to diverse statistical criteria. Interestingly, some recent studies proposed that this procedure is unnecessary for phylogenetic tree reconstruction leading to a debate in the field. In contrast to DNA sequences, phylogenetic tree reconstruction from protein sequences is traditionally based on empirical exchangeability matrices that can differ among taxonomic groups and protein families. Considering this aspect, here we investigated the influence of selecting a substitution model of protein evolution on phylogenetic tree reconstruction by the analyses of real and simulated data. We found that phylogenetic tree reconstructions based on a selected best-fitting substitution model of protein evolution are the most accurate, in terms of topology and branch lengths, compared with those derived from substitution models with amino acid replacement matrices far from the selected best-fitting model, especially when the data has large genetic diversity. Indeed, we found that substitution models with similar amino acid replacement matrices produce similar reconstructed phylogenetic trees, suggesting the use of substitution models as similar as possible to a selected best-fitting model when the latter cannot be used. Therefore, we recommend the use of the traditional protocol of selection among substitution models of evolution for protein phylogenetic tree reconstruction.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain; Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain.
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain; Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain; Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain.
| |
Collapse
|
13
|
Silva SR, Miranda VFO, Michael TP, Płachno BJ, Matos RG, Adamec L, Pond SLK, Lucaci AG, Pinheiro DG, Varani AM. The phylogenomics and evolutionary dynamics of the organellar genomes in carnivorous Utricularia and Genlisea species (Lentibulariaceae). Mol Phylogenet Evol 2023; 181:107711. [PMID: 36693533 DOI: 10.1016/j.ympev.2023.107711] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 01/13/2023] [Accepted: 01/18/2023] [Indexed: 01/22/2023]
Abstract
Utricularia and Genlisea are highly specialized carnivorous plants whose phylogenetic history has been poorly explored using phylogenomic methods. Additional sampling and genomic data are needed to advance our phylogenetic and taxonomic knowledge of this group of plants. Within a comparative framework, we present a characterization of plastome (PT) and mitochondrial (MT) genes of 26 Utricularia and six Genlisea species, with representatives of all subgenera and growth habits. All PT genomes maintain similar gene content, showing minor variation across the genes located between the PT junctions. One exception is a major variation related to different patterns in the presence and absence of ndh genes in the small single copy region, which appears to follow the phylogenetic history of the species rather than their lifestyle. All MT genomes exhibit similar gene content, with most differences related to a lineage-specific pseudogenes. We find evidence for episodic positive diversifying selection in PT and for most of the Utricularia MT genes that may be related to the current hypothesis that bladderworts' nuclear DNA is under constant ROS oxidative DNA damage and unusual DNA repair mechanisms, or even low fidelity polymerase that bypass lesions which could also be affecting the organellar genomes. Finally, both PT and MT phylogenetic trees were well resolved and highly supported, providing a congruent phylogenomic hypothesis for Utricularia and Genlisea clade given the study sampling.
Collapse
Affiliation(s)
- Saura R Silva
- UNESP - São Paulo State University, School of Agricultural and Veterinarian Sciences, Department of Agricultural and Environmental Biotechnology, Campus Jaboticabal, CEP 14884-900 SP, Brazil.
| | - Vitor F O Miranda
- UNESP - São Paulo State University, School of Agricultural and Veterinarian Sciences, Department of Biology, Laboratory of Plant Systematics, Campus Jaboticabal, CEP 14884-900 SP, Brazil.
| | - Todd P Michael
- Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA.
| | - Bartosz J Płachno
- Department of Plant Cytology and Embryology, Institute of Botany, Faculty of Biology, Jagiellonian University in Kraków, Gronostajowa 9 St., 30-387 Cracow, Poland.
| | - Ramon G Matos
- UNESP - São Paulo State University, School of Agricultural and Veterinarian Sciences, Department of Biology, Laboratory of Plant Systematics, Campus Jaboticabal, CEP 14884-900 SP, Brazil.
| | - Lubomir Adamec
- Department of Experimental and Functional Morphology, Institute of Botany CAS, Dukelská 135, CZ-379 01 Třeboň, Czech Republic.
| | - Sergei L K Pond
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA.
| | - Alexander G Lucaci
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA.
| | - Daniel G Pinheiro
- UNESP - São Paulo State University, School of Agricultural and Veterinarian Sciences, Department of Agricultural and Environmental Biotechnology, Campus Jaboticabal, CEP 14884-900 SP, Brazil.
| | - Alessandro M Varani
- UNESP - São Paulo State University, School of Agricultural and Veterinarian Sciences, Department of Agricultural and Environmental Biotechnology, Campus Jaboticabal, CEP 14884-900 SP, Brazil.
| |
Collapse
|
14
|
The Structure of Evolutionary Model Space for Proteins across the Tree of Life. BIOLOGY 2023; 12:biology12020282. [PMID: 36829559 PMCID: PMC9952988 DOI: 10.3390/biology12020282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/04/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023]
Abstract
The factors that determine the relative rates of amino acid substitution during protein evolution are complex and known to vary among taxa. We estimated relative exchangeabilities for pairs of amino acids from clades spread across the tree of life and assessed the historical signal in the distances among these clade-specific models. We separately trained these models on collections of arbitrarily selected protein alignments and on ribosomal protein alignments. In both cases, we found a clear separation between the models trained using multiple sequence alignments from bacterial clades and the models trained on archaeal and eukaryotic data. We assessed the predictive power of our novel clade-specific models of sequence evolution by asking whether fit to the models could be used to identify the source of multiple sequence alignments. Model fit was generally able to correctly classify protein alignments at the level of domain (bacterial versus archaeal), but the accuracy of classification at finer scales was much lower. The only exceptions to this were the relatively high classification accuracy for two archaeal lineages: Halobacteriaceae and Thermoprotei. Genomic GC content had a modest impact on relative exchangeabilities despite having a large impact on amino acid frequencies. Relative exchangeabilities involving aromatic residues exhibited the largest differences among models. There were a small number of exchangeabilities that exhibited large differences in comparisons among major clades and between generalized models and ribosomal protein models. Taken as a whole, these results reveal that a small number of relative exchangeabilities are responsible for much of the structure of the "model space" for protein sequence evolution. The clade-specific models we generated may be useful tools for protein phylogenetics, and the structure of evolutionary model space that they revealed has implications for phylogenomic inference across the tree of life.
Collapse
|
15
|
Gupta MK, Vadde R. Next-generation development and application of codon model in evolution. Front Genet 2023; 14:1091575. [PMID: 36777719 PMCID: PMC9911445 DOI: 10.3389/fgene.2023.1091575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 01/17/2023] [Indexed: 01/28/2023] Open
Abstract
To date, numerous nucleotide, amino acid, and codon substitution models have been developed to estimate the evolutionary history of any sequence/organism in a more comprehensive way. Out of these three, the codon substitution model is the most powerful. These models have been utilized extensively to detect selective pressure on a protein, codon usage bias, ancestral reconstruction and phylogenetic reconstruction. However, due to more computational demanding, in comparison to nucleotide and amino acid substitution models, only a few studies have employed the codon substitution model to understand the heterogeneity of the evolutionary process in a genome-scale analysis. Hence, there is always a question of how to develop more robust but less computationally demanding codon substitution models to get more accurate results. In this review article, the authors attempted to understand the basis of the development of different types of codon-substitution models and how this information can be utilized to develop more robust but less computationally demanding codon substitution models. The codon substitution model enables to detect selection regime under which any gene or gene region is evolving, codon usage bias in any organism or tissue-specific region and phylogenetic relationship between different lineages more accurately than nucleotide and amino acid substitution models. Thus, in the near future, these codon models can be utilized in the field of conservation, breeding and medicine.
Collapse
|
16
|
Del Amparo R, González-Vázquez LD, Rodríguez-Moure L, Bastolla U, Arenas M. Consequences of Genetic Recombination on Protein Folding Stability. J Mol Evol 2023; 91:33-45. [PMID: 36463317 PMCID: PMC9849154 DOI: 10.1007/s00239-022-10080-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 11/25/2022] [Indexed: 12/05/2022]
Abstract
Genetic recombination is a common evolutionary mechanism that produces molecular diversity. However, its consequences on protein folding stability have not attracted the same attention as in the case of point mutations. Here, we studied the effects of homologous recombination on the computationally predicted protein folding stability for several protein families, finding less detrimental effects than we previously expected. Although recombination can affect multiple protein sites, we found that the fraction of recombined proteins that are eliminated by negative selection because of insufficient stability is not significantly larger than the corresponding fraction of proteins produced by mutation events. Indeed, although recombination disrupts epistatic interactions, the mean stability of recombinant proteins is not lower than that of their parents. On the other hand, the difference of stability between recombined proteins is amplified with respect to the parents, promoting phenotypic diversity. As a result, at least one third of recombined proteins present stability between those of their parents, and a substantial fraction have higher or lower stability than those of both parents. As expected, we found that parents with similar sequences tend to produce recombined proteins with stability close to that of the parents. Finally, the simulation of protein evolution along the ancestral recombination graph with empirical substitution models commonly used in phylogenetics, which ignore constraints on protein folding stability, showed that recombination favors the decrease of folding stability, supporting the convenience of adopting structurally constrained models when possible for inferences of protein evolutionary histories with recombination.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Luis Daniel González-Vázquez
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Laura Rodríguez-Moure
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Ugo Bastolla
- Centre for Molecular Biology Severo Ochoa (CSIC-UAM), 28049 Madrid, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain ,Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain
| |
Collapse
|
17
|
Zuckerman NS, Shulman LM. Next-Generation Sequencing in the Study of Infectious Diseases. Infect Dis (Lond) 2023. [DOI: 10.1007/978-1-0716-2463-0_1090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/10/2023] Open
|
18
|
Liu C, Song J, Ogata H, Akutsu T. MSNet-4mC: learning effective multi-scale representations for identifying DNA N4-methylcytosine sites. Bioinformatics 2022; 38:5160-5167. [PMID: 36205602 DOI: 10.1093/bioinformatics/btac671] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/09/2022] [Accepted: 10/05/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION N4-methylcytosine (4mC) is an essential kind of epigenetic modification that regulates a wide range of biological processes. However, experimental methods for detecting 4mC sites are time-consuming and labor-intensive. As an alternative, computational methods that are capable of automatically identifying 4mC with data analysis techniques become a reasonable option. A major challenge is how to develop effective methods to fully exploit the complex interactions within the DNA sequences to improve the predictive capability. RESULTS In this work, we propose MSNet-4mC, a lightweight neural network building upon convolutional operations with multi-scale receptive fields to perceive cross-element relationships over both short and long ranges of given DNA sequences. With strong imbalances in the number of candidates in different species in mind, we compute and apply class weights in the cross-entropy loss to balance the training process. Extensive benchmarking experiments show that our method achieves a significant performance improvement and outperforms other state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION The source code and models are freely available for download at https://github.com/LIU-CT/MSNet-4mC, implemented in Python and supported on Linux and Windows. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chunting Liu
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Kyoto 606-8501, Japan.,Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Hiroyuki Ogata
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Tatsuya Akutsu
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Kyoto 606-8501, Japan.,Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| |
Collapse
|
19
|
Lucaci AG, Zehr JD, Shank SD, Bouvier D, Ostrovsky A, Mei H, Nekrutenko A, Martin DP, Kosakovsky Pond SL. RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis. PLoS One 2022; 17:e0275623. [PMID: 36322581 PMCID: PMC9629619 DOI: 10.1371/journal.pone.0275623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 09/20/2022] [Indexed: 11/06/2022] Open
Abstract
An important unmet need revealed by the COVID-19 pandemic is the near-real-time identification of potentially fitness-altering mutations within rapidly growing SARS-CoV-2 lineages. Although powerful molecular sequence analysis methods are available to detect and characterize patterns of natural selection within modestly sized gene-sequence datasets, the computational complexity of these methods and their sensitivity to sequencing errors render them effectively inapplicable in large-scale genomic surveillance contexts. Motivated by the need to analyze new lineage evolution in near-real time using large numbers of genomes, we developed the Rapid Assessment of Selection within CLades (RASCL) pipeline. RASCL applies state of the art phylogenetic comparative methods to evaluate selective processes acting at individual codon sites and across whole genes. RASCL is scalable and produces automatically updated regular lineage-specific selection analysis reports: even for lineages that include tens or hundreds of thousands of sampled genome sequences. Key to this performance is (i) generation of automatically subsampled high quality datasets of gene/ORF sequences drawn from a selected "query" viral lineage; (ii) contextualization of these query sequences in codon alignments that include high-quality "background" sequences representative of global SARS-CoV-2 diversity; and (iii) the extensive parallelization of a suite of computationally intensive selection analysis tests. Within hours of being deployed to analyze a novel rapidly growing lineage of interest, RASCL will begin yielding JavaScript Object Notation (JSON)-formatted reports that can be either imported into third-party analysis software or explored in standard web-browsers using the premade RASCL interactive data visualization dashboard. By enabling the rapid detection of genome sites evolving under different selective regimes, RASCL is well-suited for near-real-time monitoring of the population-level selective processes that will likely underlie the emergence of future variants of concern in measurably evolving pathogens with extensive genomic surveillance.
Collapse
Affiliation(s)
- Alexander G. Lucaci
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Jordan D. Zehr
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Stephen D. Shank
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Dave Bouvier
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, United States of America
| | - Alexander Ostrovsky
- Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD, United States of America
| | - Han Mei
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, United States of America
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, United States of America
| | - Darren P. Martin
- Division of Computational Biology, Department of Integrative Biomedical Sciences, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Sergei L. Kosakovsky Pond
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
20
|
Genomic Determinants Potentially Associated with Clinical Manifestations of Human-Pathogenic Tick-Borne Flaviviruses. Int J Mol Sci 2022; 23:ijms232113404. [PMID: 36362200 PMCID: PMC9658301 DOI: 10.3390/ijms232113404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/26/2022] [Accepted: 10/29/2022] [Indexed: 11/06/2022] Open
Abstract
The tick-borne flavivirus group contains at least five species that are pathogenic to humans, three of which induce encephalitis (tick-borne encephalitis virus, louping-ill virus, Powassan virus) and another two species induce hemorrhagic fever (Omsk hemorrhagic fever virus, Kyasanur Forest disease virus). To date, the molecular mechanisms responsible for these strikingly different clinical forms are not completely understood. Using a bioinformatic approach, we performed the analysis of each amino acid (aa) position in the alignment of 323 polyprotein sequences to calculate the fixation index (Fst) per site and find the regions (determinants) where sequences belonging to two designated groups were most different. Our algorithm revealed 36 potential determinants (Fst ranges from 0.91 to 1.0) located in all viral proteins except a capsid protein. In an envelope (E) protein, most of the determinants were located on the virion surface regions (domains II and III) and one (absolutely specific site 457) was located in the transmembrane region. Another 100% specific determinant site (E63D) with Fst = 1.0 was located in the central hydrophilic domain of the NS2b, which mediates NS3 protease activity. The NS5 protein contains the largest number of determinants (14) and two of them are absolutely specific (T226S, E290D) and are located near the RNA binding site 219 (methyltransferase domain) and the extension structure. We assume that even if not absolutely, highly specific sites, together with absolutely specific ones (Fst = 1.0) can play a supporting role in cell and tissue tropism determination.
Collapse
|
21
|
Ayuso-Fernández I, Molpeceres G, Camarero S, Ruiz-Dueñas FJ, Martínez AT. Ancestral sequence reconstruction as a tool to study the evolution of wood decaying fungi. FRONTIERS IN FUNGAL BIOLOGY 2022; 3:1003489. [PMID: 37746217 PMCID: PMC10512382 DOI: 10.3389/ffunb.2022.1003489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 09/22/2022] [Indexed: 09/26/2023]
Abstract
The study of evolution is limited by the techniques available to do so. Aside from the use of the fossil record, molecular phylogenetics can provide a detailed characterization of evolutionary histories using genes, genomes and proteins. However, these tools provide scarce biochemical information of the organisms and systems of interest and are therefore very limited when they come to explain protein evolution. In the past decade, this limitation has been overcome by the development of ancestral sequence reconstruction (ASR) methods. ASR allows the subsequent resurrection in the laboratory of inferred proteins from now extinct organisms, becoming an outstanding tool to study enzyme evolution. Here we review the recent advances in ASR methods and their application to study fungal evolution, with special focus on wood-decay fungi as essential organisms in the global carbon cycling.
Collapse
Affiliation(s)
- Iván Ayuso-Fernández
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Gonzalo Molpeceres
- Centro de Investigaciones Biológicas “Margarita Salas” (CIB), CSIC, Madrid, Spain
| | - Susana Camarero
- Centro de Investigaciones Biológicas “Margarita Salas” (CIB), CSIC, Madrid, Spain
| | | | - Angel T. Martínez
- Centro de Investigaciones Biológicas “Margarita Salas” (CIB), CSIC, Madrid, Spain
| |
Collapse
|
22
|
Engineering functional thermostable proteins using ancestral sequence reconstruction. J Biol Chem 2022; 298:102435. [PMID: 36041629 PMCID: PMC9525910 DOI: 10.1016/j.jbc.2022.102435] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 08/23/2022] [Accepted: 08/24/2022] [Indexed: 11/20/2022] Open
Abstract
Natural proteins are often only slightly more stable in the native state than the denatured state, and an increase in environmental temperature can easily shift the balance towards unfolding. Therefore, the engineering of proteins to improve protein stability is an area of intensive research. Thermostable proteins are required to withstand industrial process conditions, for increased shelf-life of protein therapeutics, for developing robust 'biobricks' for synthetic biology applications, and for research purposes (e.g. structure determination). In addition, thermostability buffers the often destabilizing effects of mutations introduced to improve other properties. Rational design approaches to engineering thermostability require structural information, but even with advanced computational methods, it is challenging to predict or parameterize all the relevant structural factors with sufficient precision to anticipate the results of a given mutation. Directed evolution is an alternative when structures are unavailable but requires extensive screening of mutant libraries. Recently however, bioinspired approaches based on phylogenetic analyses have shown great promise. Leveraging the rapid expansion in sequence data and bioinformatic tools, ancestral sequence reconstruction (ASR) can generate highly stable folds for novel applications in industrial chemistry, medicine, and synthetic biology. This review provides an overview of the factors important for successful inference of thermostable proteins by ASR and what it can reveal about the determinants of stability in proteins.
Collapse
|
23
|
Del Amparo R, Arenas M. Consequences of Substitution Model Selection on Protein Ancestral Sequence Reconstruction. Mol Biol Evol 2022; 39:6628884. [PMID: 35789388 PMCID: PMC9254009 DOI: 10.1093/molbev/msac144] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The selection of the best-fitting substitution model of molecular evolution is a traditional step for phylogenetic inferences, including ancestral sequence reconstruction (ASR). However, a few recent studies suggested that applying this procedure does not affect the accuracy of phylogenetic tree reconstruction. Here, we revisited this debate topic by analyzing the influence of selection among substitution models of protein evolution, with focus on exchangeability matrices, on the accuracy of ASR using simulated and real data. We found that the selected best-fitting substitution model produces the most accurate ancestral sequences, especially if the data present large genetic diversity. Indeed, ancestral sequences reconstructed under substitution models with similar exchangeability matrices were similar, suggesting that if the selected best-fitting model cannot be used for the reconstruction, applying a model similar to the selected one is preferred. We conclude that selecting among substitution models of protein evolution is recommended for reconstructing accurate ancestral sequences.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, Vigo, Spain.,Departamento de Bioquímica, Xenética e Immunoloxía, Universidade de Vigo, Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, Vigo, Spain.,Departamento de Bioquímica, Xenética e Immunoloxía, Universidade de Vigo, Vigo, Spain.,Galicia Sur Health Research Institute (IIS Galicia Sur), Vigo, Spain
| |
Collapse
|
24
|
Benndorf R, Velazquez R, Zehr JD, Pond SLK, Martin JL, Lucaci AG. Human HspB1, HspB3, HspB5 and HspB8: Shaping these disease factors during vertebrate evolution. Cell Stress Chaperones 2022; 27:309-323. [PMID: 35678958 PMCID: PMC9346038 DOI: 10.1007/s12192-022-01268-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/21/2022] [Accepted: 03/22/2022] [Indexed: 12/05/2022] Open
Abstract
Small heat shock proteins (sHSPs) emerged early in evolution and occur in all domains of life and nearly in all species, including humans. Mutations in four sHSPs (HspB1, HspB3, HspB5, HspB8) are associated with neuromuscular disorders. The aim of this study is to investigate the evolutionary forces shaping these sHSPs during vertebrate evolution. We performed comparative evolutionary analyses on a set of orthologous sHSP sequences, based on the ratio of non-synonymous: synonymous substitution rates for each codon. We found that these sHSPs had been historically exposed to different degrees of purifying selection, decreasing in this order: HspB8 > HspB1, HspB5 > HspB3. Within each sHSP, regions with different degrees of purifying selection can be discerned, resulting in characteristic selective pressure profiles. The conserved α-crystallin domains were exposed to the most stringent purifying selection compared to the flanking regions, supporting a 'dimorphic pattern' of evolution. Thus, during vertebrate evolution the different sequence partitions were exposed to different and measurable degrees of selective pressures. Among the disease-associated mutations, most are missense mutations primarily in HspB1 and to a lesser extent in the other sHSPs. Our data provide an explanation for this disparate incidence. Contrary to the expectation, most missense mutations cause dominant disease phenotypes. Theoretical considerations support a connection between the historic exposure of these sHSP genes to a high degree of purifying selection and the unusual prevalence of genetic dominance of the associated disease phenotypes. Our study puts the genetics of inheritable sHSP-borne diseases into the context of vertebrate evolution.
Collapse
Affiliation(s)
| | - Ryan Velazquez
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, PA 19122 USA
| | - Jordan D. Zehr
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, PA 19122 USA
| | - Sergei L. Kosakovsky Pond
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, PA 19122 USA
| | - Jody L. Martin
- Cell and Molecular Core, Cardiovascular Research Institute, University of California at Davis, Davis, CA USA
| | - Alexander G. Lucaci
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, PA 19122 USA
| |
Collapse
|
25
|
Czech L, Stamatakis A, Dunthorn M, Barbera P. Metagenomic Analysis Using Phylogenetic Placement-A Review of the First Decade. FRONTIERS IN BIOINFORMATICS 2022; 2:871393. [PMID: 36304302 PMCID: PMC9580882 DOI: 10.3389/fbinf.2022.871393] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 04/11/2022] [Indexed: 12/20/2022] Open
Abstract
Phylogenetic placement refers to a family of tools and methods to analyze, visualize, and interpret the tsunami of metagenomic sequencing data generated by high-throughput sequencing. Compared to alternative (e. g., similarity-based) methods, it puts metabarcoding sequences into a phylogenetic context using a set of known reference sequences and taking evolutionary history into account. Thereby, one can increase the accuracy of metagenomic surveys and eliminate the requirement for having exact or close matches with existing sequence databases. Phylogenetic placement constitutes a valuable analysis tool per se, but also entails a plethora of downstream tools to interpret its results. A common use case is to analyze species communities obtained from metagenomic sequencing, for example via taxonomic assignment, diversity quantification, sample comparison, and identification of correlations with environmental variables. In this review, we provide an overview over the methods developed during the first 10 years. In particular, the goals of this review are 1) to motivate the usage of phylogenetic placement and illustrate some of its use cases, 2) to outline the full workflow, from raw sequences to publishable figures, including best practices, 3) to introduce the most common tools and methods and their capabilities, 4) to point out common placement pitfalls and misconceptions, 5) to showcase typical placement-based analyses, and how they can help to analyze, visualize, and interpret phylogenetic placement data.
Collapse
Affiliation(s)
- Lucas Czech
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, United States
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Micah Dunthorn
- Natural History Museum, University of Oslo, Oslo, Norway
| | | |
Collapse
|
26
|
Santander-Jimenez S, Vega-Rodriguez MA, Sousa L. Inter-Algorithm Multiobjective Cooperation for Phylogenetic Reconstruction on Amino Acid Data. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:3577-3591. [PMID: 32915754 DOI: 10.1109/tcyb.2020.2995464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Inter-algorithm cooperative approaches are increasingly gaining interest as a way to boost the search capabilities of evolutionary algorithms (EAs). However, the growing complexity of real-world optimization problems demands new cooperative designs that implement performance-driven strategies to improve the solution quality. This article explores multiobjective cooperation to address an important problem in bioinformatics: the reconstruction of phylogenetic histories from amino acid data. The proposed method is built using representative algorithms from the three main multiobjective design trends: 1) nondominated sorting genetic algorithm II; 2) indicator-based evolutionary algorithm; and 3) multiobjective evolutionary algorithm based on decomposition. The cooperation is supervised by an Elite island component that, along with managing migrations, retrieves multitrend performance feedback from each approach to run additional instantiations of the most satisfying algorithm in each stage of the execution. Experimentation on five real-world problem instances shows the benefits of the proposal to handle complex optimization tasks, in comparison to stand-alone algorithms, standard island models, and other state-of-the-art methods.
Collapse
|
27
|
Cornuault J, Sanmartín I. A road map for phylogenetic models of species trees. Mol Phylogenet Evol 2022; 173:107483. [DOI: 10.1016/j.ympev.2022.107483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/09/2022] [Accepted: 04/05/2022] [Indexed: 10/18/2022]
|
28
|
Distinguishing Evolutionary Conservation from Derivedness. Life (Basel) 2022; 12:life12030440. [PMID: 35330191 PMCID: PMC8954198 DOI: 10.3390/life12030440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 03/14/2022] [Accepted: 03/14/2022] [Indexed: 11/17/2022] Open
Abstract
While the concept of “evolutionary conservation” has enabled biologists to explain many ancestral features and traits, it has also frequently been misused to evaluate the degree of changes from a common ancestor, or “derivedness”. We propose that the distinction of these two concepts allows us to properly understand phenotypic and organismal evolution. From a methodological aspect, “conservation” mainly considers genes or traits which species have in common, while “derivedness” additionally covers those that are not commonly shared, such as novel or lost traits and genes to evaluate changes from the time of divergence from a common ancestor. Due to these differences, while conservation-oriented methods are effective in identifying ancestral features, they may be prone to underestimating the overall changes accumulated during the evolution of certain lineages. Herein, we describe our recently developed method, “transcriptomic derivedness index”, for estimating the phenotypic derivedness of embryos with a molecular approach using the whole-embryonic transcriptome as a phenotype. Although echinoderms are often considered as highly derived species, our analyses with this method showed that their embryos, at least at the transcriptomic level, may not be much more derived than those of chordates. We anticipate that the future development of derivedness-oriented methods could provide quantitative indicators for finding highly/lowly evolvable traits.
Collapse
|
29
|
Context-Dependent Substitution Dynamics in Plastid DNA Across a Wide Range of Taxonomic Groups. J Mol Evol 2022; 90:44-55. [PMID: 35037071 DOI: 10.1007/s00239-021-10040-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 12/01/2021] [Indexed: 10/19/2022]
Abstract
The influence of neighboring base composition, or context, on substitution bias at fourfold degenerate coding sites and in intergenic regions in plastid DNA is compared across the angiosperms, gymnosperms, ferns, liverworts, chlorophytes, stramenopiles and rhodophytes. An influence of flanking base G + C content on the relative rates of transitions and transversions is observed in all lineages and extends up to four nucleotides from the site of substitution in some. Despite finding context effects in all lineages, significant differences were observed between lineages. Overall, the data suggest that context is a general factor affecting mutation bias in plastid DNA but that the dynamics of the influence have evolved over time. It is also shown that, although there are similar effects of context on substitution bias at fourfold degenerate coding sites and at sites within intergenic regions, there are also small but significant differences, suggesting that there could be some selection on some of these sites and that there could be some difference in the mutation and/or repair process between coding and noncoding DNA.
Collapse
|
30
|
Abstract
The reconstruction of genetic material of ancestral organisms constitutes a powerful application of evolutionary biology. A fundamental step in this inference is the ancestral sequence reconstruction (ASR), which can be performed with diverse methodologies implemented in computer frameworks. However, most of these methodologies ignore evolutionary properties frequently observed in microbes, such as genetic recombination and complex selection processes, that can bias the traditional ASR. From a practical perspective, here I review methodologies for the reconstruction of ancestral DNA and protein sequences, with particular focus on microbes, and including biases, recommendations, and software implementations. I conclude that microbial ASR is a complex analysis that should be carefully performed and that there is a need for methods to infer more realistic ancestral microbial sequences.
Collapse
Affiliation(s)
- Miguel Arenas
- Biomedical Research Center (CINBIO), University of Vigo, Vigo, Spain.
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain.
- Galicia Sur Health Research Institute (IIS Galicia Sur), Vigo, Spain.
| |
Collapse
|
31
|
Del Amparo R, Arenas M. HIV Protease and Integrase Empirical Substitution Models of Evolution: Protein-Specific Models Outperform Generalist Models. Genes (Basel) 2021; 13:61. [PMID: 35052404 PMCID: PMC8774313 DOI: 10.3390/genes13010061] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 12/22/2021] [Accepted: 12/22/2021] [Indexed: 12/24/2022] Open
Abstract
Diverse phylogenetic methods require a substitution model of evolution that should mimic, as accurately as possible, the real substitution process. At the protein level, empirical substitution models have traditionally been based on a large number of different proteins from particular taxonomic levels. However, these models assume that all of the proteins of a taxonomic level evolve under the same substitution patterns. We believe that this assumption is highly unrealistic and should be relaxed by considering protein-specific substitution models that account for protein-specific selection processes. In order to test this hypothesis, we inferred and evaluated four new empirical substitution models for the protease and integrase of HIV and other viruses. We found that these models more accurately fit, compared with any of the currently available empirical substitution models, the evolutionary process of these proteins. We conclude that evolutionary inferences from protein sequences are more accurate if they are based on protein-specific substitution models rather than taxonomic-specific (generalist) substitution models. We also present four new empirical substitution models of protein evolution that could be useful for phylogenetic inferences of viral protease and integrase.
Collapse
Affiliation(s)
- Roberto Del Amparo
- Centro de Investigacións Biomédicas (CINBIO), University of Vigo, 36310 Vigo, Spain;
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- Centro de Investigacións Biomédicas (CINBIO), University of Vigo, 36310 Vigo, Spain;
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain
| |
Collapse
|
32
|
Spielman SJ, Miraglia ML. Relative model selection of evolutionary substitution models can be sensitive to multiple sequence alignment uncertainty. BMC Ecol Evol 2021; 21:214. [PMID: 34844571 PMCID: PMC8628390 DOI: 10.1186/s12862-021-01931-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 10/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Multiple sequence alignments (MSAs) represent the fundamental unit of data inputted to most comparative sequence analyses. In phylogenetic analyses in particular, errors in MSA construction have the potential to induce further errors in downstream analyses such as phylogenetic reconstruction itself, ancestral state reconstruction, and divergence time estimation. In addition to providing phylogenetic methods with an MSA to analyze, researchers must also specify a suitable evolutionary model for the given analysis. Most commonly, researchers apply relative model selection to select a model from candidate set and then provide both the MSA and the selected model as input to subsequent analyses. While the influence of MSA errors has been explored for most stages of phylogenetics pipelines, the potential effects of MSA uncertainty on the relative model selection procedure itself have not been explored. RESULTS We assessed the consistency of relative model selection when presented with multiple perturbed versions of a given MSA. We find that while relative model selection is mostly robust to MSA uncertainty, in a substantial proportion of circumstances, relative model selection identifies distinct best-fitting models from different MSAs created from the same set of sequences. We find that this issue is more pervasive for nucleotide data compared to amino-acid data. However, we also find that it is challenging to predict whether relative model selection will be robust or sensitive to uncertainty in a given MSA. CONCLUSIONS We find that that MSA uncertainty can affect virtually all steps of phylogenetic analysis pipelines to a greater extent than has previously been recognized, including relative model selection.
Collapse
Affiliation(s)
| | - Molly L Miraglia
- Department of Molecular and Cellular Biosciences, Rowan University, Glassboro, NJ, 08028, USA.,Fox Chase Cancer Center, Philadelphia, PA, 19111, USA
| |
Collapse
|
33
|
Sahin E. Putative Group I Introns in the Nuclear Internal Transcribed Spacer of the Basidiomycete Fungus Gautieria Vittad. CYTOL GENET+ 2021. [DOI: 10.3103/s009545272105011x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
34
|
Podder A, Panja S, Chaudhuri A, Roy A, Biswas M, Homechaudhuri S. Patterns of morphological traits shaping the feeding guilds in the intertidal mudflat fishes of the Indian Sundarbans. JOURNAL OF FISH BIOLOGY 2021; 99:1010-1031. [PMID: 34021587 DOI: 10.1111/jfb.14800] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 05/19/2021] [Accepted: 05/19/2021] [Indexed: 06/12/2023]
Abstract
Broad-scale patterns of resource utilization and the corresponding morphological evolution is a result of an integral relationship among form and function. In addition, there is also an inherent role of the latter in determining species co-interaction and assemblage pattern that forms an integral aspect of ecological research. The present study aimed to evaluate the ecomorphological relationship among 37 fish species inhabiting the intertidal mudflats of the Indian Sundarbans by outlining the following objectives: (i) identifying and characterizing feeding guilds/groups and (ii) understanding the inter-relationship between morphometry with (a) the established feeding guild classifications and (b) observed prey taxa (that characterizes these feeding groups) for determining the role of morphometry in prey acquisition followed by (iii) the evaluation of their potential phylogenetic convergence among the species. For the first objective, two approaches for feeding guild classification were made (3-Guild and 8-Guild) for assessing the prediction accuracy of morphological characters in identifying the different guilds. While the former was based on troph values, the latter classification mode relied on the similarities in diet composition among the different fish species. For addressing the second objective, we employed two different models namely, linear discriminant (LDA) and redundancy analysis (RDA). While the LDA model tested the prediction accuracy of morphological traits in classifying the different feeding guilds, RDA was applied to model the correlation between the morphological traits and the prey categories. In the LDA model, morphological characters showed higher accuracy (78.4%) in classifying three feeding groups rather than eight feeding groups (73%). Following this, the RDA model (explaining 79.78% of constrained variance) showed gill raker intensity, protrusion length, head depth, caudal peduncle, eye diameter and inter-orbital distance to be highly associated with selection of specific prey types by species, thereby characterizing a particular feeding guild. However, generalized linear models testing for correlation between troph value and feeding groups showed substantial variation (90.35%) in the dietary index being explained by the 8-Guild classification. Hence, our study maintains the assumption that broad morphological differentiation acts as one of the underlying processes resulting in dietary variations that results in the varying modes of resource utilization by the coexisting species, thereby determining the structure of a trophic guild. Furthermore, it also suggests that in terms of prey abundance or selectivity, the 8-Guild model is much more conducive in representing the feeding habits of the species while the morphological traits reflected a relatively broader scheme of classification, (i.e., 3-Guild model) with certain traits being phylogenetically conserved within these groups.
Collapse
Affiliation(s)
- Anupam Podder
- Aquatic Bioresource Research Laboratory, Department of Zoology, University of Calcutta, Kolkata, India
| | - Soumyadip Panja
- Aquatic Bioresource Research Laboratory, Department of Zoology, University of Calcutta, Kolkata, India
| | - Atreyee Chaudhuri
- Aquatic Bioresource Research Laboratory, Department of Zoology, University of Calcutta, Kolkata, India
| | - Anwesha Roy
- Aquatic Bioresource Research Laboratory, Department of Zoology, University of Calcutta, Kolkata, India
| | - Missidona Biswas
- Aquatic Bioresource Research Laboratory, Department of Zoology, University of Calcutta, Kolkata, India
| | - Sumit Homechaudhuri
- Aquatic Bioresource Research Laboratory, Department of Zoology, University of Calcutta, Kolkata, India
| |
Collapse
|
35
|
Arenas M. ProteinEvolverABC: coestimation of recombination and substitution rates in protein sequences by approximate Bayesian computation. Bioinformatics 2021; 38:58-64. [PMID: 34450622 PMCID: PMC8696103 DOI: 10.1093/bioinformatics/btab617] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 07/24/2021] [Accepted: 08/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The evolutionary processes of mutation and recombination, upon which selection operates, are fundamental to understand the observed molecular diversity. Unlike nucleotide sequences, the estimation of the recombination rate in protein sequences has been little explored, neither implemented in evolutionary frameworks, despite protein sequencing methods are largely used. RESULTS In order to accommodate this need, here I present a computational framework, called ProteinEvolverABC, to jointly estimate recombination and substitution rates from alignments of protein sequences. The framework implements the approximate Bayesian computation approach, with and without regression adjustments and includes a variety of substitution models of protein evolution, demographics and longitudinal sampling. It also implements several nuisance parameters such as heterogeneous amino acid frequencies and rate of change among sites and, proportion of invariable sites. The framework produces accurate coestimation of recombination and substitution rates under diverse evolutionary scenarios. As illustrative examples of usage, I applied it to several viral protein families, including coronaviruses, showing heterogeneous substitution and recombination rates. AVAILABILITY AND IMPLEMENTATION ProteinEvolverABC is freely available from https://github.com/miguelarenas/proteinevolverabc, includes a graphical user interface for helping the specification of the input settings, extensive documentation and ready-to-use examples. Conveniently, the simulations can run in parallel on multicore machines. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
36
|
da S Vieira D, Polveiro RC, Butler TJ, Hackett TA, Braga CP, Puniya BL, Teixeira WFP, de M Padilha P, Adamec J, Feitosa FLF. An in silico, structural, and biological analysis of lactoferrin of different mammals. Int J Biol Macromol 2021; 187:119-126. [PMID: 34302867 DOI: 10.1016/j.ijbiomac.2021.07.102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 07/14/2021] [Accepted: 07/15/2021] [Indexed: 11/28/2022]
Abstract
Lactoferrin (LF) belongs to the family of transferrins having multifunctional roles associated with the immune system of animals. To follow the aims for this study was selected 20 sequences of LF from mammalian species to evaluate the chemical, biological, and structural properties. Bioinformatics approaches used programs such as MAFFT for sequence alignment; PartitionFinder and MrBayes for phylogenetic approaches; I-TASSER, PROCHECK, Molecular Operating Environment (MOE), SWISS Model server, Peptide DB and Expasy ProtParam to estimate the physicochemical properties, to model the protein and predicted secondary structures. A phylogenic analysis shows species with genetic similarities clustered by complexity and unique grouping between Capra hircus, Macaca mulatta, and Myotis lucifugus, since they presented more amino acids but not overall changes in the iron-binding sites or biological aspects. Structural deviations in these clusters obtained in LF from those species were found in residues 46 (position 406-450), that is part of alpha-helix, and 37 (position 295-331), that is part of the beta-sheets. Our predicted model can be used to investigate more about structural aspects of LF and be applied for medicinal research.
Collapse
Affiliation(s)
- Dielson da S Vieira
- São Paulo State University "Júlio de Mesquita Filho" (UNESP), School of Veterinary Medicine, Araçatuba, Sao Paulo, Brazil.
| | - Richard C Polveiro
- Federal University of Viçosa (UFV), Veterinary Department, Viçosa, Minas Gerais, Brazil
| | - Thomas J Butler
- Environmental Sustainability and Health Institute (ESHI), School of Biological and Health Sciences, Technological University Dublin, Dublin 7, Ireland
| | - Timothy A Hackett
- Department of Biochemistry, University of Nebraska - Lincoln, Lincoln, NE, USA
| | - Camila P Braga
- Department of Biochemistry, University of Nebraska - Lincoln, Lincoln, NE, USA
| | - Bhanwar Lal Puniya
- Department of Biochemistry, University of Nebraska - Lincoln, Lincoln, NE, USA
| | - Weslen F P Teixeira
- Federal University of Goiás (UFG), Department of Veterinary Medicine, Goiânia, Goiás, Brazil
| | - Pedro de M Padilha
- São Paulo State University "Júlio de Mesquita Filho" (UNESP), Biosciences Institute, Botucatu, São Paulo, Brazil
| | - Jiri Adamec
- Department of Biochemistry, University of Nebraska - Lincoln, Lincoln, NE, USA
| | - Francisco L F Feitosa
- São Paulo State University "Júlio de Mesquita Filho" (UNESP), School of Veterinary Medicine, Araçatuba, Sao Paulo, Brazil
| |
Collapse
|
37
|
Kosakovsky Pond SL, Wisotsky SR, Escalante A, Magalis BR, Weaver S. Contrast-FEL-A Test for Differences in Selective Pressures at Individual Sites among Clades and Sets of Branches. Mol Biol Evol 2021; 38:1184-1198. [PMID: 33064823 PMCID: PMC7947784 DOI: 10.1093/molbev/msaa263] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
A number of evolutionary hypotheses can be tested by comparing selective pressures among sets of branches in a phylogenetic tree. When the question of interest is to identify specific sites within genes that may be evolving differently, a common approach is to perform separate analyses on subsets of sequences and compare parameter estimates in a post hoc fashion. This approach is statistically suboptimal and not always applicable. Here, we develop a simple extension of a popular fixed effects likelihood method in the context of codon-based evolutionary phylogenetic maximum likelihood testing, Contrast-FEL. It is suitable for identifying individual alignment sites where any among the K≥2 sets of branches in a phylogenetic tree have detectably different ω ratios, indicative of different selective regimes. Using extensive simulations, we show that Contrast-FEL delivers good power, exceeding 90% for sufficiently large differences, while maintaining tight control over false positive rates, when the model is correctly specified. We conclude by applying Contrast-FEL to data from five previously published studies spanning a diverse range of organisms and focusing on different evolutionary questions.
Collapse
Affiliation(s)
| | - Sadie R Wisotsky
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Ananias Escalante
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Brittany Rife Magalis
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Emerging Pathogens Institute, University of Florida, Gainesville, FL
| | - Steven Weaver
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| |
Collapse
|
38
|
De Maio N, Alekseyenko AV, Coleman-Smith WJ, Pardi F, Suchard MA, Tamuri AU, Truszkowski J, Goldman N. A phylogenetic approach for weighting genetic sequences. BMC Bioinformatics 2021; 22:285. [PMID: 34049487 PMCID: PMC8164272 DOI: 10.1186/s12859-021-04183-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 05/04/2021] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Many important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are 'novel' compared to the others in the same dataset, and low weights to sequences that are over-represented. RESULTS We formalise this principle by rigorously defining the evolutionary 'novelty' of a sequence within an alignment. This results in new sequence weights that we call 'phylogenetic novelty scores'. These scores have various desirable properties, and we showcase their use by considering, as an example application, the inference of character frequencies at an alignment column-important, for example, in protein family profiling. We give computationally efficient algorithms for calculating our scores and, using simulations, show that they are versatile and can improve the accuracy of character frequency estimation compared to existing sequence weighting schemes. CONCLUSIONS Our phylogenetic novelty scores can be useful when an evolutionarily meaningful system for adjusting for uneven taxon sampling is desired. They have numerous possible applications, including estimation of evolutionary conservation scores and sequence logos, identification of targets in conservation biology, and improving and measuring sequence alignment accuracy.
Collapse
Affiliation(s)
- Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Alexander V. Alekseyenko
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
- Present Address: Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC USA
| | - William J. Coleman-Smith
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Fabio Pardi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
- Present Address: LIRMM, University of Montpellier, CNRS, Montpellier, France
| | - Marc A. Suchard
- Departments of Biostatistics, Biomathematics and Human Genetics, University of California, Los Angeles, CA USA
| | - Asif U. Tamuri
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
- Present Address: Research IT Services, University College London, London, UK
| | - Jakub Truszkowski
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
- Present Address: RBC Borealis AI, Waterloo, ON Canada
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
39
|
Spielman SJ. Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics. Mol Biol Evol 2021; 37:2110-2123. [PMID: 32191313 PMCID: PMC7306691 DOI: 10.1093/molbev/msaa075] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
It is regarded as best practice in phylogenetic reconstruction to perform relative model selection to determine an appropriate evolutionary model for the data. This procedure ranks a set of candidate models according to their goodness of fit to the data, commonly using an information theoretic criterion. Users then specify the best-ranking model for inference. Although it is often assumed that better-fitting models translate to increase accuracy, recent studies have shown that the specific model employed may not substantially affect inferences. We examine whether there is a systematic relationship between relative model fit and topological inference accuracy in protein phylogenetics, using simulations and real sequences. Simulations employed site-heterogeneous mechanistic codon models that are distinct from protein-level phylogenetic inference models, allowing us to investigate how protein models performs when they are misspecified to the data, as will be the case for any real sequence analysis. We broadly find that phylogenies inferred across models with vastly different fits to the data produce highly consistent topologies. We additionally find that all models infer similar proportions of false-positive splits, raising the possibility that all available models of protein evolution are similarly misspecified. Moreover, we find that the parameter-rich GTR (general time reversible) model, whose amino acid exchangeabilities are free parameters, performs similarly to models with fixed exchangeabilities, although the inference precision associated with GTR models was not examined. We conclude that, although relative model selection may not hinder phylogenetic analysis on protein data, it may not offer specific predictable improvements and is not a reliable proxy for accuracy.
Collapse
|
40
|
Tao Q, Barba-Montoya J, Huuki LA, Durnan MK, Kumar S. Relative Efficiencies of Simple and Complex Substitution Models in Estimating Divergence Times in Phylogenomics. Mol Biol Evol 2021; 37:1819-1831. [PMID: 32119075 PMCID: PMC7253201 DOI: 10.1093/molbev/msaa049] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The conventional wisdom in molecular evolution is to apply parameter-rich models of nucleotide and amino acid substitutions for estimating divergence times. However, the actual extent of the difference between time estimates produced by highly complex models compared with those from simple models is yet to be quantified for contemporary data sets that frequently contain sequences from many species and genes. In a reanalysis of many large multispecies alignments from diverse groups of taxa, we found that the use of the simplest models can produce divergence time estimates and credibility intervals similar to those obtained from the complex models applied in the original studies. This result is surprising because the use of simple models underestimates sequence divergence for all the data sets analyzed. We found three fundamental reasons for the observed robustness of time estimates to model complexity in many practical data sets. First, the estimates of branch lengths and node-to-tip distances under the simplest model show an approximately linear relationship with those produced by using the most complex models applied on data sets with many sequences. Second, relaxed clock methods automatically adjust rates on branches that experience considerable underestimation of sequence divergences, resulting in time estimates that are similar to those from complex models. And, third, the inclusion of even a few good calibrations in an analysis can reduce the difference in time estimates from simple and complex models. The robustness of time estimates to model complexity in these empirical data analyses is encouraging, because all phylogenomics studies use statistical models that are oversimplified descriptions of actual evolutionary substitution processes.
Collapse
Affiliation(s)
- Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Jose Barba-Montoya
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Louise A Huuki
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Mary Kathleen Durnan
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA.,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
41
|
Barba-Montoya J, Tao Q, Kumar S. Using a GTR+Γ substitution model for dating sequence divergence when stationarity and time-reversibility assumptions are violated. Bioinformatics 2021; 36:i884-i894. [PMID: 33381826 DOI: 10.1093/bioinformatics/btaa820] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2020] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION As the number and diversity of species and genes grow in contemporary datasets, two common assumptions made in all molecular dating methods, namely the time-reversibility and stationarity of the substitution process, become untenable. No software tools for molecular dating allow researchers to relax these two assumptions in their data analyses. Frequently the same General Time Reversible (GTR) model across lineages along with a gamma (+Γ) distributed rates across sites is used in relaxed clock analyses, which assumes time-reversibility and stationarity of the substitution process. Many reports have quantified the impact of violations of these underlying assumptions on molecular phylogeny, but none have systematically analyzed their impact on divergence time estimates. RESULTS We quantified the bias on time estimates that resulted from using the GTR + Γ model for the analysis of computer-simulated nucleotide sequence alignments that were evolved with non-stationary (NS) and non-reversible (NR) substitution models. We tested Bayesian and RelTime approaches that do not require a molecular clock for estimating divergence times. Divergence times obtained using a GTR + Γ model differed only slightly (∼3% on average) from the expected times for NR datasets, but the difference was larger for NS datasets (∼10% on average). The use of only a few calibrations reduced these biases considerably (∼5%). Confidence and credibility intervals from GTR + Γ analysis usually contained correct times. Therefore, the bias introduced by the use of the GTR + Γ model to analyze datasets, in which the time-reversibility and stationarity assumptions are violated, is likely not large and can be reduced by applying multiple calibrations. AVAILABILITY AND IMPLEMENTATION All datasets are deposited in Figshare: https://doi.org/10.6084/m9.figshare.12594638.
Collapse
Affiliation(s)
- Jose Barba-Montoya
- Institute for Genomics and Evolutionary Medicine.,Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Qiqing Tao
- Institute for Genomics and Evolutionary Medicine.,Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine.,Department of Biology, Temple University, Philadelphia, PA 19122, USA.,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
42
|
Lim D, Blanchette M. EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM. Bioinformatics 2021; 36:i353-i361. [PMID: 32657367 PMCID: PMC7355264 DOI: 10.1093/bioinformatics/btaa447] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Motivation Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutational processes have complex context dependencies that remain poorly modeled and understood. Results We introduce EvoLSTM, a recurrent neural network-based evolution simulator that captures mutational context dependencies. EvoLSTM uses a sequence-to-sequence long short-term memory model trained to predict mutation probabilities at each position of a given sequence, taking into consideration the 14 flanking nucleotides. EvoLSTM can realistically simulate mammalian and plant DNA sequence evolution and reveals unexpectedly strong long-range context dependencies in mutation probabilities. EvoLSTM brings modern machine-learning approaches to bear on sequence evolution. It will serve as a useful tool to study and simulate complex mutational processes. Availability and implementation Code and dataset are available at https://github.com/DongjoonLim/EvoLSTM. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dongjoon Lim
- School of Computer Science, McGill University, Montreal, Quebec H3A 0G4, Canada
| | - Mathieu Blanchette
- School of Computer Science, McGill University, Montreal, Quebec H3A 0G4, Canada
| |
Collapse
|
43
|
Cabrera VM. Human molecular evolutionary rate, time dependency and transient polymorphism effects viewed through ancient and modern mitochondrial DNA genomes. Sci Rep 2021; 11:5036. [PMID: 33658608 PMCID: PMC7930196 DOI: 10.1038/s41598-021-84583-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 02/15/2021] [Indexed: 01/31/2023] Open
Abstract
Human evolutionary genetics gives a chronological framework to interpret the human history. It is based on the molecular clock hypothesis that suppose a straightforward relationship between the mutation rate and the substitution rate with independence of other factors as demography dynamics. Analyzing ancient and modern human complete mitochondrial genomes we show here that, along the time, the substitution rate can be significantly slower or faster than the average germline mutation rate confirming a time dependence effect mainly attributable to changes in the effective population size of the human populations, with an exponential growth in recent times. We also detect that transient polymorphisms play a slowdown role in the evolutionary rate deduced from haplogroup intraspecific trees. Finally, we propose the use of the most divergent lineages within haplogroups as a practical approach to correct these molecular clock mismatches.
Collapse
Affiliation(s)
- Vicente M Cabrera
- Retired member of Departamento de Genética, Facultad de Biología, Universidad de La Laguna, Canary Islands, Spain.
| |
Collapse
|
44
|
Garcia-Aroca T, Price PP, Tomaso-Peterson M, Allen TW, Wilkerson TH, Spurlock TN, Faske TR, Bluhm B, Conner K, Sikora E, Guyer R, Kelly H, Squiers BM, Doyle VP. Xylaria necrophora, sp. nov., is an emerging root-associated pathogen responsible for taproot decline of soybean in the southern United States. Mycologia 2021; 113:326-347. [PMID: 33555993 DOI: 10.1080/00275514.2020.1846965] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 11/02/2020] [Indexed: 10/22/2022]
Abstract
Taproot decline (TRD) is a disease of soybean that has been reported recently from the southern United States (U.S.). Symptoms of TRD include foliar interveinal chlorosis followed by necrosis. Darkened, charcoal-colored areas of thin stromatic tissue are evident on the taproot and lateral roots along with areas of necrosis within the root and white mycelia within the pith. Upright stromata typical of Xylaria can be observed on crop debris and emerging from infested roots in fields where taproot decline is present, but these have not been determined to contain fertile perithecia. Symptomatic plant material was collected across the known range of the disease in the southern U.S., and the causal agent was isolated from roots. Four loci, ⍺-actin (ACT), β-tubulin (TUB2), the nuclear rDNA internal transcribed spacers (nrITS), and the RNA polymerase subunit II (RPB2), were sequenced from representative isolates. Both maximum likelihood and Bayesian phylogenetic analyses showed consistent clustering of representative TRD isolates in a highly supported clade within the Xylaria arbuscula species complex in the "HY" clade of the family Xylariaceae, distinct from any previously described taxa. In order to understand the origin of this pathogen, we sequenced herbarium specimens previously determined to be "Xylaria arbuscula" based on morphology and xylariaceous endophytes collected in the southern U.S. Some historical specimens from U.S. herbaria collected in the southern region as saprophytes as well as a single specimen from Martinique clustered within the "TRD" clade in phylogenetic analyses, suggesting a possible shift in lifestyle. The remaining specimens that clustered within the family Xylariaceae, but outside of the "TRD" clade, are reported. Both morphological evidence and molecular evidence indicate that the TRD pathogen is a novel species, which is described as Xylaria necrophora.
Collapse
Affiliation(s)
- Teddy Garcia-Aroca
- Department of Plant Pathology and Crop Physiology, Louisiana State University, Baton Rouge, Louisiana 70803
| | - Paul P Price
- LSU AgCenter, Macon Ridge Research Station, Winnsboro, Louisiana 71295
| | - Maria Tomaso-Peterson
- Department of Biochemistry, Molecular Biology, Entomology, and Plant Pathology, Mississippi State University, Starkville, Mississippi 39762
| | - Tom W Allen
- Delta Research and Extension Center, Mississippi State University, Stoneville, Mississippi 38776
| | - Tessie H Wilkerson
- Delta Research and Extension Center, Mississippi State University, Stoneville, Mississippi 38776
| | - Terry N Spurlock
- Department of Entomology and Plant Pathology, University of Arkansas System Division of Agriculture Cooperative Extension Service, Lonoke, Arkansas 72086
| | - Travis R Faske
- Department of Entomology and Plant Pathology, University of Arkansas System Division of Agriculture Cooperative Extension Service, Lonoke, Arkansas 72086
| | - Burt Bluhm
- Department of Entomology and Plant Pathology, University of Arkansas, Fayetteville, Arkansas 72701
| | - Kassie Conner
- Alabama Cooperative Extension System, Auburn University, Auburn, Alabama 36849
| | - Edward Sikora
- Alabama Cooperative Extension System, Auburn University, Auburn, Alabama 36849
| | - Rachel Guyer
- Department of Entomology and Plant Pathology, West Tennessee Research and Education Center, University of Tennessee, Jackson, Tennessee 38301
| | - Heather Kelly
- Department of Entomology and Plant Pathology, West Tennessee Research and Education Center, University of Tennessee, Jackson, Tennessee 38301
| | - Brooklyn M Squiers
- Department of Plant Pathology and Crop Physiology, Louisiana State University, Baton Rouge, Louisiana 70803
| | - Vinson P Doyle
- Department of Plant Pathology and Crop Physiology, Louisiana State University, Baton Rouge, Louisiana 70803
| |
Collapse
|
45
|
Fort A, McHale M, Cascella K, Potin P, Usadel B, Guiry MD, Sulpice R. Foliose Ulva Species Show Considerable Inter-Specific Genetic Diversity, Low Intra-Specific Genetic Variation, and the Rare Occurrence of Inter-Specific Hybrids in the Wild. JOURNAL OF PHYCOLOGY 2021; 57:219-233. [PMID: 32996142 PMCID: PMC7894351 DOI: 10.1111/jpy.13079] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 08/24/2020] [Accepted: 09/19/2020] [Indexed: 05/22/2023]
Abstract
Foliose Ulva spp. have become increasingly important worldwide for their environmental and financial impacts. A large number of such Ulva species have rapid reproduction and proliferation habits, which explains why they are responsible for Ulva blooms, known as "green tides", having dramatic negative effects on coastal ecosystems, but also making them attractive for aquaculture applications. Despite the increasing interest in the genus Ulva, particularly on the larger foliose species for aquaculture, their inter- and intra-specific genetic diversity is still poorly described. We compared the cytoplasmic genome (chloroplast and mitochondrion) of 110 strains of large distromatic foliose Ulva from Ireland, Brittany (France), the Netherlands and Portugal. We found six different species, with high levels of inter-specific genetic diversity, despite highly similar or overlapping morphologies. Genetic variation was as high as 82 SNPs/kb between Ulva pseudorotundata and U. laetevirens, indicating considerable genetic diversity. On the other hand, intra-specific genetic diversity was relatively low, with only 36 variant sites (0.03 SNPs/kb) in the mitochondrial genome of the 29 Ulva rigida individuals found in this study, despite different geographical origins. The use of next-generation sequencing allowed for the detection of a single inter-species hybrid between two genetically closely related species, U. laetevirens, and U. rigida, among the 110 strains analyzed in this study. Altogether, this study represents an important advance in our understanding of Ulva biology and provides genetic information for genomic selection of large foliose strains in aquaculture.
Collapse
Affiliation(s)
- Antoine Fort
- Plant Systems Biology LabRyan Institute & MaREI Centre for MarineClimate and EnergySchool of Natural SciencesNational University of Ireland ‐ GalwayGalwayH91 TK33Ireland
| | - Marcus McHale
- Plant Systems Biology LabRyan Institute & MaREI Centre for MarineClimate and EnergySchool of Natural SciencesNational University of Ireland ‐ GalwayGalwayH91 TK33Ireland
| | - Kevin Cascella
- UMR 8227Integrative Biology of Marine ModelsCNRSSorbonne Université SciencesStation Biologique de Roscoff, CS 90074F‐29688RoscoffFrance
| | - Philippe Potin
- UMR 8227Integrative Biology of Marine ModelsCNRSSorbonne Université SciencesStation Biologique de Roscoff, CS 90074F‐29688RoscoffFrance
| | - Björn Usadel
- Institute for Biology IRWTH Aachen UniversityWorringer Weg 3Aachen52074Germany
| | - Michael D. Guiry
- AlgaeBaseRyan InstituteNational University of IrelandGalwayH91 TK33Ireland
| | - Ronan Sulpice
- Plant Systems Biology LabRyan Institute & MaREI Centre for MarineClimate and EnergySchool of Natural SciencesNational University of Ireland ‐ GalwayGalwayH91 TK33Ireland
| |
Collapse
|
46
|
Del Amparo R, Branco C, Arenas J, Vicens A, Arenas M. Analysis of selection in protein-coding sequences accounting for common biases. Brief Bioinform 2021; 22:6105943. [PMID: 33479739 DOI: 10.1093/bib/bbaa431] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/17/2020] [Accepted: 12/22/2020] [Indexed: 12/16/2022] Open
Abstract
The evolution of protein-coding genes is usually driven by selective processes, which favor some evolutionary trajectories over others, optimizing the subsequent protein stability and activity. The analysis of selection in this type of genetic data is broadly performed with the metric nonsynonymous/synonymous substitution rate ratio (dN/dS). However, most of the well-established methodologies to estimate this metric make crucial assumptions, such as lack of recombination or invariable codon frequencies along genes, which can bias the estimation. Here, we review the most relevant biases in the dN/dS estimation and provide a detailed guide to estimate this metric using state-of-the-art procedures that account for such biases, along with illustrative practical examples and recommendations. We also discuss the traditional interpretation of the estimated dN/dS emphasizing the importance of considering complementary biological information such as the role of the observed substitutions on the stability and function of proteins. This review is oriented to help evolutionary biologists that aim to accurately estimate selection in protein-coding sequences.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Jesús Arenas
- Unit of Microbiology and Immunology, University of Zaragoza, 50013 Zaragoza, Spain
| | - Alberto Vicens
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
47
|
Namyatova AA, Schwartz MD, Cassis G. Determining the position of Diomocoris, Micromimetus and Taylorilygus in the Lygus-complex based on molecular data and first records of Diomocoris and Micromimetus from Australia, including four new species (Insecta : Hemiptera : Miridae : Mirinae). INVERTEBR SYST 2021. [DOI: 10.1071/is20015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The Lygus-complex is one of the most taxonomically challenging groups of Miridae (Heteroptera), and its Australian fauna is poorly studied. Here we examine the Australian taxa of the Lygus-complex using morphological and molecular methods. After a detailed morphological study of the material collected throughout Australia, Taylorilygus nebulosus is transferred to Diomocoris, with the genus recorded for the first time in this country. Taylorilygus apicalis, also widely distributed in Australia, is redescribed on the basis of Australian material. The genus Micromimetus is recorded for the first time in Australia, with M. celiae, sp. nov., M. hannahae, sp. nov., M. nikolai, sp. nov. and M. shofneri, sp. nov. described as new to science. Micromimetus pictipes is redescribed and its distributional range is increased. The monophyly of the Lygus-complex and relationships within this group were tested using cytochrome c oxidase subunit I (COI), 16S rRNA, 18S rRNA and 28S rRNA markers. The Lygus-complex has been found to be non-monophyletic. Phylogeny confirmed the monophyly of Micromimetus, and it has shown that Taylorilygus apicalis is closer to Micromimetus species than to Diomocoris nebulosus. This study is the initial step in understanding the Lygus-complex phylogeny; analyses with more taxa, more genes and morphology are needed to reveal the interrelationships within this group, and sister-group relationships of Australian taxa.
http://zoobank.org/urn:lsid:zoobank.org:pub:7393D96B-2BBA-438D-A134-D372EFE7FB9E
Collapse
|
48
|
Johnson MM, Wilke CO. Site-Specific Amino Acid Distributions Follow a Universal Shape. J Mol Evol 2020; 88:731-741. [PMID: 33230664 PMCID: PMC7717668 DOI: 10.1007/s00239-020-09976-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 11/17/2020] [Indexed: 11/25/2022]
Abstract
In many applications of evolutionary inference, a model of protein evolution needs to be fitted to the amino acid variation at individual sites in a multiple sequence alignment. Most existing models fall into one of two extremes: Either they provide a coarse-grained description that lacks biophysical realism (e.g., dN/dS models), or they require a large number of parameters to be fitted (e.g., mutation-selection models). Here, we ask whether a middle ground is possible: Can we obtain a realistic description of site-specific amino acid frequencies while severely restricting the number of free parameters in the model? We show that a distribution with a single free parameter can accurately capture the variation in amino acid frequency at most sites in an alignment, as long as we are willing to restrict our analysis to predicting amino acid frequencies by rank rather than by amino acid identity. This result holds equally well both in alignments of empirical protein sequences and of sequences evolved under a biophysically realistic all-atom force field. Our analysis reveals a near universal shape of the frequency distributions of amino acids. This insight has the potential to lead to new models of evolution that have both increased realism and a limited number of free parameters.
Collapse
Affiliation(s)
- Mackenzie M Johnson
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA.
| |
Collapse
|
49
|
Kubatko L. Book Review: A Mathematical Primer of Molecular Phylogenetics, by Xuhua Xia. Syst Biol 2020. [DOI: 10.1093/sysbio/syaa082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Laura Kubatko
- Department of Statistics, The Ohio State University, Columbus, OH 43210, USA
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
50
|
Ahrens JB, Teufel AI, Siltberg-Liberles J. A Phylogenetic Rate Parameter Indicates Different Sequence Divergence Patterns in Orthologs and Paralogs. J Mol Evol 2020; 88:720-730. [PMID: 33118098 DOI: 10.1007/s00239-020-09969-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 10/15/2020] [Indexed: 10/23/2022]
Abstract
Heterotachy-the change in sequence evolutionary rate over time-is a common feature of protein molecular evolution. Decades of studies have shed light on the conditions under which heterotachy occurs, and there is evidence that site-specific evolutionary rate shifts are correlated with changes in protein function. Here, we present a large-scale, computational analysis using thousands of protein sequence alignments from animal and plant proteomes, representing genes related either by orthology (speciation events) or paralogy (gene duplication), to compare sequence divergence patterns in orthologous vs. paralogous sequence alignments. We use sequence-based phylogenetic analyses to infer overall sequence divergence (tree length/number of sequences) and to fit site-specific rates to a discrete gamma distribution with a shape parameter α. This inference method is applied to real protein sequence alignments, as well as alignments simulated under various models of protein sequence evolution. Our simulations indicate that sequence divergence and the α parameter are positively correlated when sequences evolve with heterotachy, meaning that inferred site rate distributions appear more uniform as sequences diverge. Divergence and α are also positively correlated in both orthologous and paralogous genes, but the average increase in α (as a function of divergence) is significantly higher in paralogous protein alignments than in orthologous alignments. This result is consistent with the widely held view that recently duplicated proteins initially evolve under relaxed selective pressure, promoting functional divergence by accumulation of amino acid replacements, and hence experience more evolutionary rate fluctuations than orthologous proteins. We discuss these findings in the context of the ortholog conjecture, a long-standing assumption in molecular evolution, which posits that protein sequences related by orthology tend to be more functionally conserved than paralogous proteins.
Collapse
Affiliation(s)
- Joseph B Ahrens
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, Miami, FL, USA. .,Department of Biochemistry and Molecular Genetics, Computational Bioscience Program, University of Colorado Denver, Aurora, CO, USA.
| | - Ashley I Teufel
- Department of Integrative Biology, The University of Texas At Austin, Austin, TX, USA.,Santa Fe Institute, Santa Fe, NM, USA
| | - Jessica Siltberg-Liberles
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, Miami, FL, USA.
| |
Collapse
|