1
|
Pérez-Calle V, Bellot S, Kuhnhäuser BG, Pillon Y, Forest F, Leitch IJ, Baker WJ. Phylogeny, biogeography and ecological diversification of New Caledonian palms (Arecaceae). ANNALS OF BOTANY 2024; 134:85-100. [PMID: 38527418 PMCID: PMC11161567 DOI: 10.1093/aob/mcae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 03/24/2024] [Indexed: 03/27/2024]
Abstract
BACKGROUND AND AIMS The geographical origin and evolutionary mechanisms underpinning the rich and distinctive New Caledonian flora remain poorly understood. This is attributable to the complex geological past of the island and to the scarcity of well-resolved species-level phylogenies. Here, we infer phylogenetic relationships and divergence times of New Caledonian palms, which comprise 40 species. We use this framework to elucidate the biogeography of New Caledonian palm lineages and to explore how extant species might have formed. METHODS A phylogenetic tree including 37 New Caledonian palm species and 77 relatives from tribe Areceae was inferred from 151 nuclear genes obtained by targeted sequencing. Fossil-calibrated divergence times were estimated and ancestral ranges inferred. Ancestral and extant ecological preferences in terms of elevation, precipitation and substrate were compared between New Caledonian sister species to explore their possible roles as drivers of speciation. KEY RESULTS New Caledonian palms form four well-supported clades, inside which relationships are well resolved. Our results support the current classification but suggest that Veillonia and Campecarpus should be resurrected and fail to clarify whether Rhopalostylidinae is sister to or nested in Basseliniinae. New Caledonian palm lineages are derived from New Guinean and Australian ancestors, which reached the island through at least three independent dispersal events between the Eocene and Miocene. Palms then dispersed out of New Caledonia at least five times, mainly towards Pacific islands. Geographical and ecological transitions associated with speciation events differed across time and genera. Substrate transitions were more frequently associated with older events than with younger ones. CONCLUSIONS Neighbouring areas and a mosaic of local habitats shaped the palm flora of New Caledonia, and the island played a significant role in generating palm diversity across the Pacific region. This new spatio-temporal framework will enable population-level ecological and genetic studies to unpick the mechanisms underpinning New Caledonian palm endemism.
Collapse
Affiliation(s)
- Victor Pérez-Calle
- Department of Biology, Memorial University of Newfoundland, St John’s, Newfoundland A1B 3X9, Canada
| | | | | | - Yohan Pillon
- DIADE, Univ Montpellier, CIRAD, IRD, Montpellier, France
| | - Félix Forest
- Royal Botanic Gardens, Kew, Richmond TW9 3AE, UK
| | | | | |
Collapse
|
2
|
Dai C, Cao HX, Tian JX, Gao YC, Liu HT, Xu SY, Wang YJ, Zheng YG. Structural-guided design to improve the catalytic performance of aldo-keto reductase KdAKR. Biotechnol Bioeng 2023; 120:3543-3556. [PMID: 37641876 DOI: 10.1002/bit.28535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 08/07/2023] [Accepted: 08/13/2023] [Indexed: 08/31/2023]
Abstract
Aldo-keto reductases (AKRs) are important biocatalysts that can be used to synthesize chiral pharmaceutical alcohols. In this study, the catalytic activity and stereoselectivity of a NADPH-dependent AKR from Kluyveromyces dobzhanskii (KdAKR) toward t-butyl 6-chloro (5S)-hydroxy-3-oxohexanoate ((5S)-CHOH) were improved by mutating its residues in the loop regions around the substrate-binding pocket. And the thermostability of KdAKR was improved by a consensus sequence method targeted on the flexible regions. The best mutant M6 (Y28A/L58I/I63L/G223P/Y296W/W297H) exhibited a 67-fold higher catalytic efficiency compared to the wild-type (WT) KdAKR, and improved R-selectivity toward (5S)-CHOH (dep value from 47.6% to >99.5%). Moreover, M6 exhibited a 6.3-fold increase in half-life (t1/2 ) at 40°C compared to WT. Under the optimal conditions, M6 completely converted 200 g/L (5S)-CHOH to diastereomeric pure t-butyl 6-chloro-(3R, 5S)-dihydroxyhexanoate ((3R, 5S)-CDHH) within 8.0 h, with a space-time yield of 300.7 g/L/day. Our results deepen the understandings of the structure-function relationship of AKRs, providing a certain guidance for the modification of other AKRs.
Collapse
Affiliation(s)
- Chen Dai
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, People's Republic of China
- Engineering Research Center of Bioconversion and Biopurification of the Ministry of Education, Zhejiang University of Technology, Hangzhou, Zhejiang, People's Republic of China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, People's Republic of China
| | - Hai-Xing Cao
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, People's Republic of China
- Engineering Research Center of Bioconversion and Biopurification of the Ministry of Education, Zhejiang University of Technology, Hangzhou, Zhejiang, People's Republic of China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, People's Republic of China
| | - Jia-Xin Tian
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, People's Republic of China
- Engineering Research Center of Bioconversion and Biopurification of the Ministry of Education, Zhejiang University of Technology, Hangzhou, Zhejiang, People's Republic of China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, People's Republic of China
| | - Yan-Chi Gao
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, People's Republic of China
- Engineering Research Center of Bioconversion and Biopurification of the Ministry of Education, Zhejiang University of Technology, Hangzhou, Zhejiang, People's Republic of China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, People's Republic of China
| | - Hua-Tao Liu
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, People's Republic of China
- Engineering Research Center of Bioconversion and Biopurification of the Ministry of Education, Zhejiang University of Technology, Hangzhou, Zhejiang, People's Republic of China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, People's Republic of China
| | - Shen-Yuan Xu
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, People's Republic of China
- Engineering Research Center of Bioconversion and Biopurification of the Ministry of Education, Zhejiang University of Technology, Hangzhou, Zhejiang, People's Republic of China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, People's Republic of China
| | - Ya-Jun Wang
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, People's Republic of China
- Engineering Research Center of Bioconversion and Biopurification of the Ministry of Education, Zhejiang University of Technology, Hangzhou, Zhejiang, People's Republic of China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, People's Republic of China
| | - Yu-Guo Zheng
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, People's Republic of China
- Engineering Research Center of Bioconversion and Biopurification of the Ministry of Education, Zhejiang University of Technology, Hangzhou, Zhejiang, People's Republic of China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, People's Republic of China
| |
Collapse
|
3
|
McWhite CD, Armour-Garb I, Singh M. Leveraging protein language models for accurate multiple sequence alignments. Genome Res 2023; 33:1145-1153. [PMID: 37414576 PMCID: PMC10538487 DOI: 10.1101/gr.277675.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 06/29/2023] [Indexed: 07/08/2023]
Abstract
Multiple sequence alignment (MSA) is a critical step in the study of protein sequence and function. Typically, MSA algorithms progressively align pairs of sequences and combine these alignments with the aid of a guide tree. These alignment algorithms use scoring systems based on substitution matrices to measure amino acid similarities. Although successful, standard methods struggle on sets of proteins with low sequence identity: the so-called twilight zone of protein alignment. For these difficult cases, another source of information is needed. Protein language models are a powerful new approach that leverages massive sequence data sets to produce high-dimensional contextual embeddings for each amino acid in a sequence. These embeddings have been shown to reflect physicochemical and higher-order structural and functional attributes of amino acids within proteins. Here, we present a novel approach to MSA, based on clustering and ordering amino acid contextual embeddings. Our method for aligning semantically consistent groups of proteins circumvents the need for many standard components of MSA algorithms, avoiding initial guide tree construction, intermediate pairwise alignments, gap penalties, and substitution matrices. The added information from contextual embeddings leads to higher accuracy alignments for structurally similar proteins with low amino-acid similarity. We anticipate that protein language models will become a fundamental component of the next generation of algorithms for generating MSAs.
Collapse
Affiliation(s)
- Claire D McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA;
| | - Isabel Armour-Garb
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
- Department of Computer Science, Princeton University, Princeton, New Jersey 08544, USA
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA;
- Department of Computer Science, Princeton University, Princeton, New Jersey 08544, USA
| |
Collapse
|
4
|
Pardo-De la Hoz CJ, Magain N, Piatkowski B, Cornet L, Dal Forno M, Carbone I, Miadlikowska J, Lutzoni F. Ancient Rapid Radiation Explains Most Conflicts Among Gene Trees and Well-Supported Phylogenomic Trees of Nostocalean Cyanobacteria. Syst Biol 2023; 72:694-712. [PMID: 36827095 DOI: 10.1093/sysbio/syad008] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 02/12/2023] [Accepted: 02/22/2023] [Indexed: 02/25/2023] Open
Abstract
Prokaryotic genomes are often considered to be mosaics of genes that do not necessarily share the same evolutionary history due to widespread horizontal gene transfers (HGTs). Consequently, representing evolutionary relationships of prokaryotes as bifurcating trees has long been controversial. However, studies reporting conflicts among gene trees derived from phylogenomic data sets have shown that these conflicts can be the result of artifacts or evolutionary processes other than HGT, such as incomplete lineage sorting, low phylogenetic signal, and systematic errors due to substitution model misspecification. Here, we present the results of an extensive exploration of phylogenetic conflicts in the cyanobacterial order Nostocales, for which previous studies have inferred strongly supported conflicting relationships when using different concatenated phylogenomic data sets. We found that most of these conflicts are concentrated in deep clusters of short internodes of the Nostocales phylogeny, where the great majority of individual genes have low resolving power. We then inferred phylogenetic networks to detect HGT events while also accounting for incomplete lineage sorting. Our results indicate that most conflicts among gene trees are likely due to incomplete lineage sorting linked to an ancient rapid radiation, rather than to HGTs. Moreover, the short internodes of this radiation fit the expectations of the anomaly zone, i.e., a region of the tree parameter space where a species tree is discordant with its most likely gene tree. We demonstrated that concatenation of different sets of loci can recover up to 17 distinct and well-supported relationships within the putative anomaly zone of Nostocales, corresponding to the observed conflicts among well-supported trees based on concatenated data sets from previous studies. Our findings highlight the important role of rapid radiations as a potential cause of strongly conflicting phylogenetic relationships when using phylogenomic data sets of bacteria. We propose that polytomies may be the most appropriate phylogenetic representation of these rapid radiations that are part of anomaly zones, especially when all possible genomic markers have been considered to infer these phylogenies. [Anomaly zone; bacteria; horizontal gene transfer; incomplete lineage sorting; Nostocales; phylogenomic conflict; rapid radiation; Rhizonema.].
Collapse
Affiliation(s)
| | - Nicolas Magain
- Evolution and Conservation Biology, InBioS Research Center, Université de Liège, Liège 4000, Belgium
| | - Bryan Piatkowski
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | - Luc Cornet
- Evolution and Conservation Biology, InBioS Research Center, Université de Liège, Liège 4000, Belgium
- BCCM/IHEM, Mycology and Aerobiology, Sciensano, Brussels, Belgium
| | | | - Ignazio Carbone
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC 27606, USA
| | | | | |
Collapse
|
5
|
Baltzis A, Mansouri L, Jin S, Langer BE, Erb I, Notredame C. Highly significant improvement of protein sequence alignments with AlphaFold2. Bioinformatics 2022; 38:5007-5011. [PMID: 36130276 PMCID: PMC9665868 DOI: 10.1093/bioinformatics/btac625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/29/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Protein sequence alignments are essential to structural, evolutionary and functional analysis, but their accuracy is often limited by sequence similarity unless molecular structures are available. Protein structures predicted at experimental grade accuracy, as achieved by AlphaFold2, could therefore have a major impact on sequence analysis. RESULTS Here, we find that multiple sequence alignments estimated on AlphaFold2 predictions are almost as accurate as alignments estimated on experimental structures and significantly closer to the structural reference than sequence-based alignments. We also show that AlphaFold2 structural models of relatively low quality can be used to obtain highly accurate alignments. These results suggest that, besides structure modeling, AlphaFold2 encodes higher-order dependencies that can be exploited for sequence analysis. AVAILABILITY AND IMPLEMENTATION All data, analyses and results are available on Zenodo (https://doi.org/10.5281/zenodo.7031286). The code and scripts have been deposited in GitHub (https://github.com/cbcrg/msa-af2-nf) and the various containers in (https://cloud.sylabs.io/library/athbaltzis/af2/alphafold, https://hub.docker.com/r/athbaltzis/pred). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Suzanne Jin
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
| | - Björn E Langer
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
| | - Ionas Erb
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
| | | |
Collapse
|
6
|
Hubley R, Wheeler TJ, Smit AFA. Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families. NAR Genom Bioinform 2022; 4:lqac040. [PMID: 35591887 PMCID: PMC9112768 DOI: 10.1093/nargab/lqac040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 03/29/2022] [Accepted: 04/29/2022] [Indexed: 02/06/2023] Open
Abstract
The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family.
Collapse
Affiliation(s)
- Robert Hubley
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Travis J Wheeler
- Department of Computer Science, University of Montana, Missoula, MT 59801, USA
| | | |
Collapse
|
7
|
Sapoval N, Aghazadeh A, Nute MG, Antunes DA, Balaji A, Baraniuk R, Barberan CJ, Dannenfelser R, Dun C, Edrisi M, Elworth RAL, Kille B, Kyrillidis A, Nakhleh L, Wolfe CR, Yan Z, Yao V, Treangen TJ. Current progress and open challenges for applying deep learning across the biosciences. Nat Commun 2022; 13:1728. [PMID: 35365602 PMCID: PMC8976012 DOI: 10.1038/s41467-022-29268-7] [Citation(s) in RCA: 61] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 03/09/2022] [Indexed: 11/19/2022] Open
Abstract
Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.
Collapse
Affiliation(s)
- Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Amirali Aghazadeh
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA
| | - Michael G Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Dinler A Antunes
- Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
| | - Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Richard Baraniuk
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - C J Barberan
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | | | - Chen Dun
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - R A Leo Elworth
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Cameron R Wolfe
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Zhi Yan
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vicky Yao
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| |
Collapse
|
8
|
Bansal MS. Deciphering Microbial Gene Family Evolution Using Duplication-Transfer-Loss Reconciliation and RANGER-DTL. Methods Mol Biol 2022; 2569:233-252. [PMID: 36083451 DOI: 10.1007/978-1-0716-2691-7_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenetic reconciliation has emerged as a principled, highly effective technique for investigating the origin, spread, and evolutionary history of microbial gene families. Proper application of phylogenetic reconciliation requires a clear understanding of potential pitfalls and sources of error, and knowledge of the most effective reconciliation-based tools and protocols to use to maximize accuracy. In this book chapter, we provide a brief overview of Duplication-Transfer-Loss (DTL) reconciliation, the standard reconciliation model used to study microbial gene families and provide a step-by-step computational protocol to maximize the accuracy of DTL reconciliation and minimize false-positive evolutionary inferences.
Collapse
Affiliation(s)
- Mukul S Bansal
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
9
|
Loewenthal G, Rapoport D, Avram O, Moshe A, Wygoda E, Itzkovitch A, Israeli O, Azouri D, Cartwright RA, Mayrose I, Pupko T. A probabilistic model for indel evolution: differentiating insertions from deletions. Mol Biol Evol 2021; 38:5769-5781. [PMID: 34469521 PMCID: PMC8662616 DOI: 10.1093/molbev/msab266] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here, we introduce several improvements to indel modeling: 1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; 2) we introduce numerous summary statistics that allow approximate Bayesian computation-based parameter estimation; 3) we develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical data sets; and 4) using a model-selection scheme, we test whether the richer model better fits biological data compared with the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical data sets and that, for the majority of these data sets, the deletion rate is higher than the insertion rate.
Collapse
Affiliation(s)
- Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Rapoport
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Asher Moshe
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Alon Itzkovitch
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Omer Israeli
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Azouri
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.,School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, Arizona, USA.,School of Life Sciences, Arizona State University, Tempe, Arizona, USA
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
10
|
Gupta M, Zaharias P, Warnow T. Accurate Large-scale Phylogeny-Aware Alignment using BAli-Phy. Bioinformatics 2021; 37:4677-4683. [PMID: 34320635 DOI: 10.1093/bioinformatics/btab555] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 06/25/2021] [Accepted: 07/27/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION BAli-Phy, a popular Bayesian method that co-estimates multiple sequence alignments and phylogenetic trees, is a rigorous statistical method, but due to its computational requirements, it has generally been limited to relatively small datasets (at most about 100 sequences). Here we repurpose BAli-Phy as a ``phylogeny-aware" alignment method: we estimate the phylogeny from the input of unaligned sequences, and then use that as a fixed tree within BAli-Phy. RESULTS We show that this approach achieves high accuracy, greatly superior to Prank, the current most popular phylogeny-aware alignment method, and is even more accurate than MAFFT, one of the top performing alignment methods in common use. Furthermore, this approach can be used to align very large datasets (up to 1000 sequences in this study). AVAILABILITY See https://doi.org/10.13012/B2IDB-7863273_V1 for datasets used in this study. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maya Gupta
- 1University of Illinois Urbana-Champaign, Urbana IL 61801, USA
| | - Paul Zaharias
- 1University of Illinois Urbana-Champaign, Urbana IL 61801, USA
| | - Tandy Warnow
- 1University of Illinois Urbana-Champaign, Urbana IL 61801, USA
| |
Collapse
|
11
|
Aadland K, Kolaczkowski B. Alignment-Integrated Reconstruction of Ancestral Sequences Improves Accuracy. Genome Biol Evol 2021; 12:1549-1565. [PMID: 32785673 PMCID: PMC7523730 DOI: 10.1093/gbe/evaa164] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/03/2020] [Indexed: 12/31/2022] Open
Abstract
Ancestral sequence reconstruction (ASR) uses an alignment of extant protein sequences, a phylogeny describing the history of the protein family and a model of the molecular-evolutionary process to infer the sequences of ancient proteins, allowing researchers to directly investigate the impact of sequence evolution on protein structure and function. Like all statistical inferences, ASR can be sensitive to violations of its underlying assumptions. Previous studies have shown that, whereas phylogenetic uncertainty has only a very weak impact on ASR accuracy, uncertainty in the protein sequence alignment can more strongly affect inferred ancestral sequences. Here, we show that errors in sequence alignment can produce errors in ASR across a range of realistic and simplified evolutionary scenarios. Importantly, sequence reconstruction errors can lead to errors in estimates of structural and functional properties of ancestral proteins, potentially undermining the reliability of analyses relying on ASR. We introduce an alignment-integrated ASR approach that combines information from many different sequence alignments. We show that integrating alignment uncertainty improves ASR accuracy and the accuracy of downstream structural and functional inferences, often performing as well as highly accurate structure-guided alignment. Given the growing evidence that sequence alignment errors can impact the reliability of ASR studies, we recommend that future studies incorporate approaches to mitigate the impact of alignment uncertainty. Probabilistic modeling of insertion and deletion events has the potential to radically improve ASR accuracy when the model reflects the true underlying evolutionary history, but further studies are required to thoroughly evaluate the reliability of these approaches under realistic conditions.
Collapse
Affiliation(s)
- Kelsey Aadland
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida
| | - Bryan Kolaczkowski
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida
| |
Collapse
|
12
|
Abstract
Multiple sequence alignment is a core first step in many bioinformatics analyses, and errors in these alignments can have negative consequences for scientific studies. In this article, we review some of the recent literature evaluating multiple sequence alignment methods and identify specific challenges that arise when performing these evaluations. In particular, we discuss the different trends observed in simulation studies and when using biological benchmarks. Overall, we find that multiple sequence alignment, far from being a "solved problem," would benefit from new attention.
Collapse
Affiliation(s)
- Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
13
|
Warnow T, Mirarab S. Multiple Sequence Alignment for Large Heterogeneous Datasets Using SATé, PASTA, and UPP. Methods Mol Biol 2021; 2231:99-119. [PMID: 33289889 DOI: 10.1007/978-1-0716-1036-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The estimation of very large multiple sequence alignments is a challenging problem that requires special techniques in order to achieve high accuracy. Here we describe two software packages-PASTA and UPP-for constructing alignments on large and ultra-large datasets. Both methods have been able to produce highly accurate alignments on 1,000,000 sequences, and trees computed on these alignments are also highly accurate. PASTA provides the best tree accuracy when the input sequences are all full-length, but UPP provides improved accuracy compared to PASTA and other methods when the input contains a large number of fragmentary sequences. Both methods are available in open source form on GitHub.
Collapse
Affiliation(s)
- Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| | - Siavash Mirarab
- Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA, USA
| |
Collapse
|
14
|
Smith SA, Walker-Hale N, Walker JF. Intragenic Conflict in Phylogenomic Data Sets. Mol Biol Evol 2020; 37:3380-3388. [DOI: 10.1093/molbev/msaa170] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Abstract
Most phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors can cause intragenic conflict. The extent to which this conflict is present in empirical data sets is not well documented, but if common, could have far-reaching implications for phylogenetic analyses. We examined several large phylogenomic data sets from diverse taxa using a fast and simple method to identify well-supported intragenic conflict. We found conflict to be highly variable between data sets, from 1% to >92% of genes investigated. We analyzed four exemplar genes in detail and analyzed simulated data under several scenarios. Our results suggest that alignment error may be one major source of conflict, but other conflicts remain unexplained and may represent biological signal or other errors. Whether as part of data analysis pipelines or to explore biologically processes, analyses of within-gene phylogenetic signal should become common.
Collapse
Affiliation(s)
- Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| | | | - Joseph F Walker
- The Sainsbury Laboratory (SLCU), University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
15
|
Li S, Yang Q, Tang B. Improving the thermostability and acid resistance of Rhizopus oryzae α-amylase by using multiple sequence alignment based site-directed mutagenesis. Biotechnol Appl Biochem 2020; 67:677-684. [PMID: 32133700 DOI: 10.1002/bab.1907] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 02/25/2020] [Indexed: 12/14/2022]
Abstract
Higher thermostability or acid resistance for fungal α-amylase will help to improve the sugar-making process and cut down the production costs. Here, the thermostability or acid resistance of Rhizopus oryzae α-amylase (ROAmy) was significantly enhanced by site-directed evolution based on multiple sequence alignment (MSA) method. For instance, compared with the wild-type ROAmy, the optimum temperature of mutants G136D and A144Y was increased from 50 to 55 °C, whereas for mutants V174R and I276P, the optimum temperature was increased from 50 to 60 °C. The optimum pH of mutants G136D and A144Y shifted from 5.5 to 5.0, whereas for mutants V174R and T253E, the optimum pH changed from 5.5 to 4.5. The results showed that mutant V174R had a 2.52-fold increase in half-life at 55 °C, a 2.55-fold increase in half-life at pH 4.5, and a 1.61-fold increase in catalytic efficiency (kcat /Km ) on soluble starch. The three-dimensional model simulation revealed that changes of hydrophilicity, hydrogen bond, salt bridge, or rigidity observed in mutants might mainly account for the improvement of thermostability and acid resistance. The mutants with improved catalytic properties attained in this work may render an accessible and operable approach for directed evolution of fungal α-amylase aimed at interesting functions.
Collapse
Affiliation(s)
- Song Li
- School of Biological and Chemical Engineering, Anhui Polytechnic University, Central Beijing Road, Wuhu, China
| | - Qian Yang
- School of Biological and Chemical Engineering, Anhui Polytechnic University, Central Beijing Road, Wuhu, China
| | - Bin Tang
- School of Biological and Chemical Engineering, Anhui Polytechnic University, Central Beijing Road, Wuhu, China
| |
Collapse
|
16
|
Obbard DJ, Shi M, Roberts KE, Longdon B, Dennis AB. A new lineage of segmented RNA viruses infecting animals. Virus Evol 2020; 6:vez061. [PMID: 31976084 PMCID: PMC6966834 DOI: 10.1093/ve/vez061] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Metagenomic sequencing has revolutionised our knowledge of virus diversity, with new virus sequences being reported faster than ever before. However, virus discovery from metagenomic sequencing usually depends on detectable homology: without a sufficiently close relative, so-called 'dark' virus sequences remain unrecognisable. An alternative approach is to use virus-identification methods that do not depend on detecting homology, such as virus recognition by host antiviral immunity. For example, virus-derived small RNAs have previously been used to propose 'dark' virus sequences associated with the Drosophilidae (Diptera). Here, we combine published Drosophila data with a comprehensive search of transcriptomic sequences and selected meta-transcriptomic datasets to identify a completely new lineage of segmented positive-sense single-stranded RNA viruses that we provisionally refer to as the Quenyaviruses. Each of the five segments contains a single open reading frame, with most encoding proteins showing no detectable similarity to characterised viruses, and one sharing a small number of residues with the RNA-dependent RNA polymerases of single- and double-stranded RNA viruses. Using these sequences, we identify close relatives in approximately 20 arthropods, including insects, crustaceans, spiders, and a myriapod. Using a more conserved sequence from the putative polymerase, we further identify relatives in meta-transcriptomic datasets from gut, gill, and lung tissues of vertebrates, reflecting infections of vertebrates or of their associated parasites. Our data illustrate the utility of small RNAs to detect viruses with limited sequence conservation, and provide robust evidence for a new deeply divergent and phylogenetically distinct RNA virus lineage.
Collapse
Affiliation(s)
- Darren J Obbard
- Institute of Evolutionary Biology, University of Edinburgh, Charlotte Auerbach Road, Edinburgh EH9 3FL, UK
| | - Mang Shi
- Charles Perkins Center, The University of Sydney, NSW 2006, Australia
| | - Katherine E Roberts
- Biosciences, College of Life & Environmental Sciences, University of Exeter, Penryn Campus, Penryn, Cornwall TR10 9FE, UK
| | - Ben Longdon
- Biosciences, College of Life & Environmental Sciences, University of Exeter, Penryn Campus, Penryn, Cornwall TR10 9FE, UK
| | - Alice B Dennis
- Department of Evolutionary Biology & Systematic Zoology, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany
| |
Collapse
|
17
|
Cabra-García J, Hormiga G. Exploring the impact of morphology, multiple sequence alignment and choice of optimality criteria in phylogenetic inference: a case study with the Neotropical orb-weaving spider genus Wagneriana (Araneae: Araneidae). Zool J Linn Soc 2019. [DOI: 10.1093/zoolinnean/zlz088] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Abstract
We present a total evidence phylogenetic analysis of the Neotropical orb-weaving spider genus Wagneriana and discuss the phylogenetic impacts of methodological choices. We analysed 167 phenotypic characters and nine loci scored for 115 Wagneriana and outgroups, including 46 newly sequenced species. We compared total evidence analyses and molecular-only analyses to evaluate the impact of phenotypic evidence, and we performed analyses using the programs POY, TNT, RAxML, GARLI, IQ-TREE and MrBayes to evaluate the effects of multiple sequence alignment and optimality criteria. In all analyses, Wagneriana carimagua and Wagneriana uropygialis were nested in the genera Parawixia and Alpaida, respectively, and the remaining species of Wagneriana fell into three main clades, none of which formed a pair of sister taxa. However, sister-group relationships among the main clades and their internal relationships were strongly influenced by methodological choices. Alignment methods had comparable topological effects to those of optimality criteria in terms of ‘subtree pruning and regrafting’ moves. The inclusion of phenotypic evidence, 2.80–3.05% of the total evidence matrices, increased support irrespective of the optimality criterion used. The monophyly of some groups was recovered only after the addition of morphological characters. A new araneid genus, Popperaneus gen. nov., is erected, and Paraverrucosa is resurrected. Four new synonymies and seven new combinations are proposed.
Collapse
Affiliation(s)
- Jimmy Cabra-García
- Departamento de Biología, Universidad del Valle, Cali, AA, Colombia
- Departamento de Zoologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Gustavo Hormiga
- The George Washington University, Department of Biological Sciences, Washington, DC, USA
| |
Collapse
|
18
|
Wang Y, Zhao Q, Wan QX, Wang KX, Zha XF. P-element Somatic Inhibitor Protein Binding a Target Sequence in dsx Pre-mRNA Conserved in Bombyx mori and Spodoptera litura. Int J Mol Sci 2019; 20:ijms20092361. [PMID: 31086020 PMCID: PMC6539025 DOI: 10.3390/ijms20092361] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 05/10/2019] [Accepted: 05/10/2019] [Indexed: 02/06/2023] Open
Abstract
Bombyx mori doublesex (Bmdsx) functions as a double-switch gene in the final step of the sex-determination cascade in the silkworm Bombyx mori. The P-element somatic inhibitor (PSI) protein in B. mori interacts with Bmdsx pre-mRNA in CE1 as an exonic splicing silencer to promote male-specific splicing of Bmdsx. However, the character of the interaction between BmPSI and Bmdsx pre-mRNA remains unclear. Electrophoretic mobility shift assay (EMSA) results showed that the four KH_1 motifs in BmPSI are all essential for the binding, especially the former two KH_1 motifs. Three active sites (I116, L127, and IGGI) in the KH_1 motif were found to be necessary for the binding through EMSA, circular dichroism (CD) spectroscopy, and isothermal titration calorimetry (ITC). The PSI homologous protein in S. litura (SlPSI) was purified and the binding of SlPSI and CE1 was verified. Compared with BmPSI, the mutant SlPSI proteins of I116 and IGGI lost their ability to bind to CE1. In conclusion, the binding of PSI and dsx pre-mRNA are generally conserved in both B. mori and S. litura. These findings provide clues for sex determination in Lepidoptera.
Collapse
Affiliation(s)
- Yao Wang
- State Key Laboratory of Silkworm Genome Biology, Biological Science Research Center, Southwest University, Beibei, Chongqing 400715, China.
| | - Qin Zhao
- State Key Laboratory of Silkworm Genome Biology, Biological Science Research Center, Southwest University, Beibei, Chongqing 400715, China.
| | - Qiu-Xing Wan
- State Key Laboratory of Silkworm Genome Biology, Biological Science Research Center, Southwest University, Beibei, Chongqing 400715, China.
| | - Kai-Xuan Wang
- State Key Laboratory of Silkworm Genome Biology, Biological Science Research Center, Southwest University, Beibei, Chongqing 400715, China.
| | - Xing-Fu Zha
- State Key Laboratory of Silkworm Genome Biology, Biological Science Research Center, Southwest University, Beibei, Chongqing 400715, China.
- Chongqing Key Laboratory of Sericultural Science, Southwest University, Chongqing 400715, China.
- Chongqing Engineering and Technology Research Center for Novel Silk Materials, Southwest University, Chongqing 400715, China.
| |
Collapse
|