1
|
Liu C, Zhou X, Li Y, Hittinger CT, Pan R, Huang J, Chen XX, Rokas A, Chen Y, Shen XX. The Influence of the Number of Tree Searches on Maximum Likelihood Inference in Phylogenomics. Syst Biol 2024; 73:807-822. [PMID: 38940001 DOI: 10.1093/sysbio/syae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 06/20/2024] [Accepted: 06/26/2024] [Indexed: 06/29/2024] Open
Abstract
Maximum likelihood (ML) phylogenetic inference is widely used in phylogenomics. As heuristic searches most likely find suboptimal trees, it is recommended to conduct multiple (e.g., 10) tree searches in phylogenetic analyses. However, beyond its positive role, how and to what extent multiple tree searches aid ML phylogenetic inference remains poorly explored. Here, we found that a random starting tree was not as effective as the BioNJ and parsimony starting trees in inferring the ML gene tree and that RAxML-NG and PhyML were less sensitive to different starting trees than IQ-TREE. We then examined the effect of the number of tree searches on ML tree inference with IQ-TREE and RAxML-NG, by running 100 tree searches on 19,414 gene alignments from 15 animal, plant, and fungal phylogenomic datasets. We found that the number of tree searches substantially impacted the recovery of the best-of-100 ML gene tree topology among 100 searches for a given ML program. In addition, all of the concatenation-based trees were topologically identical if the number of tree searches was ≥10. Quartet-based ASTRAL trees inferred from 1 to 80 tree searches differed topologically from those inferred from 100 tree searches for 6/15 phylogenomic datasets. Finally, our simulations showed that gene alignments with lower difficulty scores had a higher chance of finding the best-of-100 gene tree topology and were more likely to yield the correct trees.
Collapse
Affiliation(s)
- Chao Liu
- Department of Plant Protection, Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou 310058, China
- Centre for Evolutionary & Organismal Biology, Zhejiang University, Hangzhou 310058, China
| | - Xiaofan Zhou
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou 510642, China
| | - Yuanning Li
- Institute of Marine Science and Technology, Shandong University, Qingdao 266237, China
- Department of Biological Sciences and Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Chris Todd Hittinger
- Laboratory of Genetics, Wisconsin Energy Institute, Center for Genomic Science Innovation, DOE Great Lakes Bioenergy Research Center, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Ronghui Pan
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, 310027, China
| | - Jinyan Huang
- Zhejiang Provincial Key Laboratory of Pancreatic Disease, Zhejiang University School of Medicine First Affiliated Hospital, Hangzhou 310003, China
| | - Xue-Xin Chen
- Department of Plant Protection, Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou 310058, China
| | - Antonis Rokas
- Department of Biological Sciences and Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Yun Chen
- Department of Plant Protection, Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou 310058, China
| | - Xing-Xing Shen
- Department of Plant Protection, Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou 310058, China
- Centre for Evolutionary & Organismal Biology, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
2
|
Bjornson S, Verbruggen H, Upham NS, Steenwyk JL. Reticulate evolution: Detection and utility in the phylogenomics era. Mol Phylogenet Evol 2024; 201:108197. [PMID: 39270765 DOI: 10.1016/j.ympev.2024.108197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2024] [Revised: 08/13/2024] [Accepted: 09/08/2024] [Indexed: 09/15/2024]
Abstract
Phylogenomics has enriched our understanding that the Tree of Life can have network-like or reticulate structures among some taxa and genes. Two non-vertical modes of evolution - hybridization/introgression and horizontal gene transfer - deviate from a strictly bifurcating tree model, causing non-treelike patterns. However, these reticulate processes can produce similar patterns to incomplete lineage sorting or recombination, potentially leading to ambiguity. Here, we present a brief overview of a phylogenomic workflow for inferring organismal histories and compare methods for distinguishing modes of reticulate evolution. We discuss how the timing of coalescent events can help disentangle introgression from incomplete lineage sorting and how horizontal gene transfer events can help determine the relative timing of speciation events. In doing so, we identify pitfalls of certain methods and discuss how to extend their utility across the Tree of Life. Workflows, methods, and future directions discussed herein underscore the need to embrace reticulate evolutionary patterns for understanding the timing and rates of evolutionary events, providing a clearer view of life's history.
Collapse
Affiliation(s)
- Saelin Bjornson
- School of BioSciences, University of Melbourne, Victoria, Australia
| | - Heroen Verbruggen
- School of BioSciences, University of Melbourne, Victoria, Australia; CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
| | - Nathan S Upham
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| | - Jacob L Steenwyk
- Howards Hughes Medical Institute and the Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
3
|
Vieira C, Brooks CM, Akita S, Kim MS, Saunders GW. Of sea, rivers and symbiosis: Diversity, systematics, biogeography and evolution of the deeply diverging florideophycean order Hildenbrandiales (Rhodophyta). Mol Phylogenet Evol 2024; 197:108106. [PMID: 38750675 DOI: 10.1016/j.ympev.2024.108106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 05/03/2024] [Accepted: 05/12/2024] [Indexed: 05/27/2024]
Abstract
The Hildenbrandiales, a typically saxicolous red algal order, is an early diverging florideophycean group with global significance in marine and freshwater ecosystems across diverse temperature zones. To comprehensively elucidate the diversity, phylogeny, biogeography, and evolution of this order, we conducted a thorough re-examination employing molecular data derived from nearly 700 specimens. Employing a species delimitation method, we identified Evolutionary Species Units (ESUs) within the Hildenbrandiales aiming to enhance our understanding of species diversity and generate the first time-calibrated tree and ancestral area reconstruction for this order. Mitochondrial cox1 and chloroplast rbcL markers were used to infer species boundaries, and subsequent phylogenetic reconstructions involved concatenated sequences of cox1, rbcL, and 18S rDNA. Time calibration of the resulting phylogenetic tree used a fossil record from a Triassic purportedly freshwater Hildenbrandia species and three secondary time points from the literature. Our species delimitation analysis revealed an astounding 97 distinct ESUs, quintupling the known diversity within this order. Our time-calibration analysis placed the origin of Hildenbrandiales (crown age) in the Ediacaran period, with freshwater species emerging as a monophyletic group during the later Permian to early Triassic. Phylogenetic reconstructions identified seven major clades, experiencing early diversification during the Silurian to Carboniferous period. Two major evolutionary events-colonization of freshwater habitats and obligate systemic symbiosis with a marine fungus-marked this order, leading to significant morphological alterations without a commensurate increase in species diversification. Despite the remarkable newly discovered diversity, the extant taxon diversity appears relatively constrained when viewed against an evolutionary timeline spanning over 800 million years. This limitation may stem from restricted geographic sampling or the prevalence of asexual reproduction. However, species richness estimation and rarefaction analyses suggest a substantially larger diversity yet to be uncovered-potentially four times greater. These findings drastically reshape our understanding of the deeply diverging florideophycean order Hildenbrandiales species diversity, and contribute valuable insights into this order's evolutionary history and ecological adaptations. Supported by phylogenetic, ecological and morphological evidence, we established the genus Riverina gen. nov. to accommodate freshwater species of Hildenbrandiales, which form a monophyletic clade in our analyses. This marks the first step toward refining the taxonomy of the Hildenbrandiales, an order demanding thorough revisions, notably with the creation of several genera to address the polyphyletic status of Hildenbrandia. However, the limited diagnostic features pose a challenge, necessitating a fresh approach to defining genera. A potential solution lies in embracing a molecular systematic perspective, which can offer precise delineations of taxonomic boundaries.
Collapse
Affiliation(s)
- Christophe Vieira
- Research Institute for Basic Sciences, Jeju National University, Jeju 63243, Korea.
| | - Cody M Brooks
- Bedford Institute of Oceanography, Department of Fisheries and Oceans, Dartmouth, NS, Canada
| | - Shingo Akita
- Faculty of Fisheries Sciences, Hokkaido University, Minato-cho 3-1-1, Hakodate, Hokkaido 041-8611, Japan
| | - Myung Sook Kim
- Research Institute for Basic Sciences, Jeju National University, Jeju 63243, Korea.
| | - Gary W Saunders
- Biology Department, Centre for Environmental and Molecular Algal Research, University of New Brunswick, Fredericton, NB, Canada
| |
Collapse
|
4
|
Kokkonen AL, Searle PC, Shiozawa DK, Evans RP. Using de novo transcriptomes to decipher the relationships in cutthroat trout subspecies ( Oncorhynchus clarkii). Evol Appl 2024; 17:e13735. [PMID: 39006004 PMCID: PMC11239772 DOI: 10.1111/eva.13735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 05/18/2024] [Accepted: 05/27/2024] [Indexed: 07/16/2024] Open
Abstract
For almost 200 years, the taxonomy of cutthroat trout (Oncorhynchus clarkii), a salmonid native to Western North America, has been in flux as ichthyologists and fisheries biologists have tried to describe the diversity within these fishes. Starting in the 1950s, Robert Behnke reexamined the cutthroat trout and identified 14 subspecies based on morphological traits, Pleistocene events, and modern geographic ranges. His designations became instrumental in recognizing and preserving the remaining diversity of cutthroat trout. Over time, molecular techniques (i.e. karyotypes, allozymes, mitochondrial DNA, SNPs, and microsatellite arrays) have largely reinforced Behnke's phylogenies, but have also revealed that some relationships are consistently weakly supported. To further resolve these relationships, we generated de novo transcriptomes for nine cutthroat subspecies, as well as a Bear River Bonneville form and two Colorado River lineages (blue and green). We present phylogenies of these subspecies generated from multiple sets of orthologous genes extracted from our transcriptomes. We confirm many of the relationships identified in previous morphological and molecular studies, as well as discuss the importance of significant differences apparent in our phylogenies from these studies within a geological perspective. Specific findings include three distinct clades: (1) Bear River Bonneville form and Yellowstone cutthroat trout; (2) Bonneville cutthroat trout (n = 2); and (3) Greenback and Rio Grande cutthroat trout. We also identify potential gene transfer between Bonneville cutthroat trout and a population of Colorado River green lineage cutthroat trout. Using these findings, it appears that additional groups warrant species-level consideration if other recent species elevations are retained.
Collapse
Affiliation(s)
- Andrea L. Kokkonen
- Department of Microbiology and Molecular BiologyBrigham Young UniversityProvoUtahUSA
| | - Peter C. Searle
- Department of Ecology and Evolutionary BiologyCornell UniversityIthacaNew YorkUSA
| | | | - R. Paul Evans
- Department of Microbiology and Molecular BiologyBrigham Young UniversityProvoUtahUSA
| |
Collapse
|
5
|
Koper K, Han SW, Kothadia R, Salamon H, Yoshikuni Y, Maeda HA. Multisubstrate specificity shaped the complex evolution of the aminotransferase family across the tree of life. Proc Natl Acad Sci U S A 2024; 121:e2405524121. [PMID: 38885378 PMCID: PMC11214133 DOI: 10.1073/pnas.2405524121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 05/14/2024] [Indexed: 06/20/2024] Open
Abstract
Aminotransferases (ATs) are an ancient enzyme family that play central roles in core nitrogen metabolism, essential to all organisms. However, many of the AT enzyme functions remain poorly defined, limiting our fundamental understanding of the nitrogen metabolic networks that exist in different organisms. Here, we traced the deep evolutionary history of the AT family by analyzing AT enzymes from 90 species spanning the tree of life (ToL). We found that each organism has maintained a relatively small and constant number of ATs. Mapping the distribution of ATs across the ToL uncovered that many essential AT reactions are carried out by taxon-specific AT enzymes due to wide-spread nonorthologous gene displacements. This complex evolutionary history explains the difficulty of homology-based AT functional prediction. Biochemical characterization of diverse aromatic ATs further revealed their broad substrate specificity, unlike other core metabolic enzymes that evolved to catalyze specific reactions today. Interestingly, however, we found that these AT enzymes that diverged over billion years share common signatures of multisubstrate specificity by employing different nonconserved active site residues. These findings illustrate that AT family enzymes had leveraged their inherent substrate promiscuity to maintain a small yet distinct set of multifunctional AT enzymes in different taxa. This evolutionary history of versatile ATs likely contributed to the establishment of robust and diverse nitrogen metabolic networks that exist throughout the ToL. The study provides a critical foundation to systematically determine diverse AT functions and underlying nitrogen metabolic networks across the ToL.
Collapse
Affiliation(s)
- Kaan Koper
- Department of Botany, University of Wisconsin-Madison, Madison, WI53706
| | - Sang-Woo Han
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
- Department of Biotechnology, Konkuk University, Chungju27478, South Korea
| | - Ramani Kothadia
- The US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA94720
| | - Hugh Salamon
- The US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA94720
| | - Yasuo Yoshikuni
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
- The US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA94720
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA94720
- Center for Advanced Bioenergy and Bioproducts Innovation, Lawrence Berkeley National Laboratory, Berkeley, CA94720
- Global Center for Food, Land, and Water Resources, Research Faculty of Agriculture, Hokkaido University, Hokkaido, Japan 060-8589
- Institute of Global Innovation Research, Tokyo University of Agriculture and Technology, Tokyo183-8538, Japan
| | - Hiroshi A. Maeda
- Department of Botany, University of Wisconsin-Madison, Madison, WI53706
| |
Collapse
|
6
|
Spirin S, Sigorskikh A, Efremov A, Penzar D, Karyagina A. PhyloBench: A Benchmark for Evaluating Phylogenetic Programs. Mol Biol Evol 2024; 41:msae084. [PMID: 38860506 PMCID: PMC11231946 DOI: 10.1093/molbev/msae084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 04/05/2024] [Accepted: 04/22/2024] [Indexed: 06/12/2024] Open
Abstract
Phylogenetic inference based on protein sequence alignment is a widely used procedure. Numerous phylogenetic algorithms have been developed, most of which have many parameters and options. Choosing a program, options, and parameters can be a nontrivial task. No benchmark for comparison of phylogenetic programs on real protein sequences was publicly available. We have developed PhyloBench, a benchmark for evaluating the quality of phylogenetic inference, and used it to test a number of popular phylogenetic programs. PhyloBench is based on natural, not simulated, protein sequences of orthologous evolutionary domains. The measure of accuracy of an inferred tree is its distance to the corresponding species tree. A number of tree-to-tree distance measures were tested. The most reliable results were obtained using the Robinson-Foulds distance. Our results confirmed recent findings that distance methods are more accurate than maximum likelihood (ML) and maximum parsimony. We tested the bayesian program MrBayes on natural protein sequences and found that, on our datasets, it performs better than ML, but worse than distance methods. Of the methods we tested, the Balanced Minimum Evolution method implemented in FastME yielded the best results on our material. Alignments and reference species trees are available at https://mouse.belozersky.msu.ru/tools/phylobench/ together with a web-interface that allows for a semi-automatic comparison of a user's method with a number of popular programs.
Collapse
Affiliation(s)
- Sergey Spirin
- Belozersky Institute, Lomonosov Moscow State University, Moscow, Russia
- Higher School of Economics, Moscow, Russia
| | - Andrey Sigorskikh
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Aleksei Efremov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Dmitry Penzar
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
- Artificial Intelligence Research Institute, Moscow, Russia
| | - Anna Karyagina
- Belozersky Institute, Lomonosov Moscow State University, Moscow, Russia
- Gamaleya Center of Epidemiology and Microbiology, Moscow, Russia
- Institute of Agricultural Biotechnology, Moscow, Russia
| |
Collapse
|
7
|
Zhang L, Wang F, Wu J, Ye S, Xu Y, Liu Y. Fine-Scale Genetic Structure of Curculio chinensis (Coleoptera: Curculionidae) Based on Mitochondrial COI: The Role of Host Specificity and Spatial Distance. INSECTS 2024; 15:116. [PMID: 38392535 PMCID: PMC10888635 DOI: 10.3390/insects15020116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 01/30/2024] [Accepted: 02/04/2024] [Indexed: 02/24/2024]
Abstract
The Camellia weevil, Curculio chinensis (Chevrolat, 1978), is a dominant oligophagous pest that bores into the fruit of oil-tea Camellia. Genetic differentiation among populations in various hosts can easily occur, which hinders research on pest management. In this study, the genetic structure, genetic diversity, and phylogenetic structure of local C. chinensis populations were examined using 147 individuals (from 6 localities in Jiangxi), based on 2 mitochondrial COI markers. Results indicated that the C. chinensis population in Jiangxi exhibits a high haplotype diversity, especially for the populations from Cam. meiocarpa plantations. Structural differentiation was observed between Haplogroup 1 (73 individuals from Ganzhou, Jian, and Pingxiang) in the monoculture plantations of Cam. meiocarpa and Haplogroup 2 (75 individuals from Pingxiang and Jiujiang) in Cam. oleifera. Two haplogroups have recently undergone a demographic expansion, and Haplogroup 1 has shown a higher number of effective migrants than Haplogroup 2. This suggests that C. chinensis has been spreading from Cam. meiocarpa plantations to other oil-tea Camellia, such as Cam. oleifera. The increased cultivation of oil-tea Camellia in Jiangxi has contributed to a unique genetic structure within the C. chinensis population. This has, in turn, expanded the distribution of C. chinensis and increased migration between populations.
Collapse
Affiliation(s)
- Li Zhang
- Institute of Jiangxi Oil-Tea Camellia, Jiujiang University, Jiujiang 332005, China; (F.W.); (J.W.); (S.Y.); (Y.L.)
| | - Fuping Wang
- Institute of Jiangxi Oil-Tea Camellia, Jiujiang University, Jiujiang 332005, China; (F.W.); (J.W.); (S.Y.); (Y.L.)
| | - Jiaxi Wu
- Institute of Jiangxi Oil-Tea Camellia, Jiujiang University, Jiujiang 332005, China; (F.W.); (J.W.); (S.Y.); (Y.L.)
| | - Sicheng Ye
- Institute of Jiangxi Oil-Tea Camellia, Jiujiang University, Jiujiang 332005, China; (F.W.); (J.W.); (S.Y.); (Y.L.)
| | - Ye Xu
- School of Agricultural Science, Jiangxi Agricultural University, Nanchang 330045, China;
| | - Yanan Liu
- Institute of Jiangxi Oil-Tea Camellia, Jiujiang University, Jiujiang 332005, China; (F.W.); (J.W.); (S.Y.); (Y.L.)
| |
Collapse
|
8
|
Fraser ASC, Low KE, Tingley JP, Reintjes G, Thomas D, Brumer H, Abbott DW. SACCHARIS v2: Streamlining Prediction of Carbohydrate-Active Enzyme Specificities Within Large Datasets. Methods Mol Biol 2024; 2836:299-330. [PMID: 38995547 DOI: 10.1007/978-1-0716-4007-4_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
Carbohydrates are chemically and structurally diverse, composed of a wide array of monosaccharides, stereochemical linkages, substituent groups, and intermolecular associations with other biological molecules. A large repertoire of carbohydrate-active enzymes (CAZymes) and enzymatic activities are required to form, dismantle, and metabolize these complex molecules. The software SACCHARIS (Sequence Analysis and Clustering of CarboHydrate Active enzymes for Rapid Informed prediction of Specificity) provides a rapid, easy-to-use pipeline for the prediction of potential CAZyme function in new datasets. We have updated SACCHARIS to (i) simplify its installation by re-writing in Python and packaging for Conda; (ii) enhance its usability through a new (optional) interactive GUI; and (iii) enable semi-automated annotation of phylogenetic tree output via a new R package or the commonly-used webserver iTOL. Significantly, SACCHARIS v2 has been developed with high-throughput omics in mind, with pipeline automation geared toward complex (meta)genome and (meta)transcriptome datasets to reveal the total CAZyme content ("CAZome") of an organism or community. Here, we outline the development and use of SACCHARIS v2 to discover and annotate CAZymes and provide insight into complex carbohydrate metabolisms in individual organisms and communities.
Collapse
Affiliation(s)
- Alexander S C Fraser
- Michael Smith Laboratories and Department of Chemistry, University of British Columbia, Vancouver, BC, Canada
| | - Kristin E Low
- Agriculture and Agri-Food Canada, Lethbridge Research and Development Centre, Lethbridge, AB, Canada
| | - Jeffrey P Tingley
- Agriculture and Agri-Food Canada, Lethbridge Research and Development Centre, Lethbridge, AB, Canada
- Department of Chemistry and Biochemistry, University of Lethbridge, Lethbridge, AB, Canada
| | - Greta Reintjes
- Microbial-Carbohydrate Interactions Group, University of Bremen, Bremen, Germany
| | - Dallas Thomas
- Agriculture and Agri-Food Canada, Lethbridge Research and Development Centre, Lethbridge, AB, Canada
| | - Harry Brumer
- Michael Smith Laboratories and Department of Chemistry, University of British Columbia, Vancouver, BC, Canada
| | - D Wade Abbott
- Agriculture and Agri-Food Canada, Lethbridge Research and Development Centre, Lethbridge, AB, Canada.
- Department of Chemistry and Biochemistry, University of Lethbridge, Lethbridge, AB, Canada.
| |
Collapse
|
9
|
Field CJ, Bowerman KL, Hugenholtz P. Multiple independent losses of sporulation and peptidoglycan in the Mycoplasmatales and related orders of the class Bacilli. Microb Genom 2024; 10:001176. [PMID: 38189216 PMCID: PMC10868615 DOI: 10.1099/mgen.0.001176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 12/19/2023] [Indexed: 01/09/2024] Open
Abstract
Many peptidoglycan-deficient bacteria such as the Mycoplasmatales are known host-associated lineages, lacking the environmental resistance mechanisms and metabolic capabilities necessary for a free-living lifestyle. Several peptidoglycan-deficient and non-sporulating orders of interest are thought to be descended from Gram-positive sporulating Bacilli through reductive evolution. Here we annotate 2650 genomes belonging to the class Bacilli, according to the Genome Taxonomy Database, to predict the peptidoglycan and sporulation phenotypes of three novel orders, RFN20, RF39 and ML615J-28, known only through environmental sequence surveys. These lineages are interspersed between peptidoglycan-deficient non-sporulating orders including the Mycoplasmatales and Acholeplasmatales, and more typical Gram-positive orders such as the Erysipelotrichales and Staphylococcales. We use the extant genotypes to perform ancestral state reconstructions. The novel orders are predicted to have small genomes with minimal metabolic capabilities and to comprise a mix of peptidoglycan-deficient and/or non-sporulating species. In contrast to expectations based on cultured representatives, the order Erysipelotrichales lacks many of the genes involved in peptidoglycan and endospore formation. The reconstructed evolutionary history of these traits suggests multiple independent whole-genome reductions and loss of phenotype via intermediate transition states that continue into the present. We suggest that the evolutionary history of the reduced-genome lineages within the class Bacilli is one driven by multiple independent transitions to host-associated lifestyles, with the degree of reduction in environmental resistance and metabolic capabilities correlated with degree of host association.
Collapse
Affiliation(s)
- Christian J. Field
- School of Chemistry and Molecular Biosciences, The Australian Centre for Ecogenomics, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Kate L. Bowerman
- School of Chemistry and Molecular Biosciences, The Australian Centre for Ecogenomics, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Philip Hugenholtz
- School of Chemistry and Molecular Biosciences, The Australian Centre for Ecogenomics, The University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
10
|
Steenwyk JL, Li Y, Zhou X, Shen XX, Rokas A. Incongruence in the phylogenomics era. Nat Rev Genet 2023; 24:834-850. [PMID: 37369847 PMCID: PMC11499941 DOI: 10.1038/s41576-023-00620-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2023] [Indexed: 06/29/2023]
Abstract
Genome-scale data and the development of novel statistical phylogenetic approaches have greatly aided the reconstruction of a broad sketch of the tree of life and resolved many of its branches. However, incongruence - the inference of conflicting evolutionary histories - remains pervasive in phylogenomic data, hampering our ability to reconstruct and interpret the tree of life. Biological factors, such as incomplete lineage sorting, horizontal gene transfer, hybridization, introgression, recombination and convergent molecular evolution, can lead to gene phylogenies that differ from the species tree. In addition, analytical factors, including stochastic, systematic and treatment errors, can drive incongruence. Here, we review these factors, discuss methodological advances to identify and handle incongruence, and highlight avenues for future research.
Collapse
Affiliation(s)
- Jacob L Steenwyk
- Howards Hughes Medical Institute and the Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA
| | - Yuanning Li
- Institute of Marine Science and Technology, Shandong University, Qingdao, China
| | - Xiaofan Zhou
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, China
| | - Xing-Xing Shen
- Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA.
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA.
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
| |
Collapse
|
11
|
Portik DM, Streicher JW, Wiens JJ. Frog phylogeny: A time-calibrated, species-level tree based on hundreds of loci and 5,242 species. Mol Phylogenet Evol 2023; 188:107907. [PMID: 37633542 DOI: 10.1016/j.ympev.2023.107907] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 08/15/2023] [Accepted: 08/15/2023] [Indexed: 08/28/2023]
Abstract
Large-scale, time-calibrated phylogenies from supermatrix studies have become crucial for evolutionary and ecological studies in many groups of organisms. However, in frogs (anuran amphibians), there is a serious problem with existing supermatrix estimates. Specifically, these trees are based on a limited number of loci (15 or fewer), and the higher-level relationships estimated are discordant with recent phylogenomic estimates based on much larger numbers of loci. Here, we attempted to rectify this problem by generating an expanded supermatrix and combining this with data from phylogenomic studies. To assist in aligning ribosomal sequences for this supermatrix, we developed a new program (TaxonomyAlign) to help perform taxonomy-guided alignments. The new combined matrix contained 5,242 anuran species with data from 307 markers, but with 95% missing data overall. This dataset represented a 71% increase in species sampled relative to the previous largest supermatrix analysis of anurans (adding 2,175 species). Maximum-likelihood analyses generated a tree in which higher-level relationships (and estimated clade ages) were generally concordant with those from phylogenomic analyses but were more discordant with the previous largest supermatrix analysis. We found few obvious problems arising from the extensive missing data in most species. We also generated a set of 100 time-calibrated trees for use in comparative analyses. Overall, we provide an improved estimate of anuran phylogeny based on the largest number of combined taxa and markers to date. More broadly, we demonstrate the potential to combine phylogenomic and supermatrix analyses in other groups of organisms.
Collapse
Affiliation(s)
- Daniel M Portik
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721 USA; California Academy of Sciences, San Francisco, CA 94118, USA
| | | | - John J Wiens
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721 USA.
| |
Collapse
|
12
|
Zhang DF, He W, Shao Z, Ahmed I, Zhang Y, Li WJ, Zhao Z. EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets. BMC Bioinformatics 2023; 24:390. [PMID: 37838689 PMCID: PMC10576351 DOI: 10.1186/s12859-023-05527-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 10/10/2023] [Indexed: 10/16/2023] Open
Abstract
BACKGROUND Genome-scale phylogenetic analysis based on core gene sets is routinely used in microbiological research. However, the techniques are still not approachable for individuals with little bioinformatics experience. Here, we present EasyCGTree, a user-friendly and cross-platform pipeline to reconstruct genome-scale maximum-likehood (ML) phylogenetic tree using supermatrix (SM) and supertree (ST) approaches. RESULTS EasyCGTree was implemented in Perl programming languages and was built using a collection of published reputable programs. All the programs were precompiled as standalone executable files and contained in the EasyCGTree package. It can run after installing Perl language environment. Several profile hidden Markov models (HMMs) of core gene sets were prepared in advance to construct a profile HMM database (PHD) that was enclosed in the package and available for homolog searching. Customized gene sets can also be used to build profile HMM and added to the PHD via EasyCGTree. Taking 43 genomes of the genus Paracoccus as the testing data set, consensus (a variant of the typical SM), SM, and ST trees were inferred via EasyCGTree successfully, and the SM trees were compared with those inferred via the pipelines UBCG and bcgTree, using the metrics of cophenetic correlation coefficients (CCC) and Robinson-Foulds distance (topological distance). The results suggested that EasyCGTree can infer SM trees with nearly identical topology (distance < 0.1) and accuracy (CCC > 0.99) to those of trees inferred with the two pipelines. CONCLUSIONS EasyCGTree is an all-in-one automatic pipeline from input data to phylogenomic tree with guaranteed accuracy, and is much easier to install and use than the reference pipelines. In addition, ST is implemented in EasyCGTree conveniently and can be used to explore prokaryotic evolutionary signals from a different perspective. The EasyCGTree version 4 is freely available for Linux and Windows users at Github ( https://github.com/zdf1987/EasyCGTree4 ).
Collapse
Affiliation(s)
- Dao-Feng Zhang
- Jiangsu Province Engineering Research Center for Marine Bio-resources Sustainable Utilization and College of Oceanography, Hohai University, Nanjing, 210098, China.
| | - Wei He
- Jiangsu Province Engineering Research Center for Marine Bio-resources Sustainable Utilization and College of Oceanography, Hohai University, Nanjing, 210098, China
| | - Zongze Shao
- Jiangsu Province Engineering Research Center for Marine Bio-resources Sustainable Utilization and College of Oceanography, Hohai University, Nanjing, 210098, China.
- Key Laboratory of Marine Biogenetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, 361005, China.
| | - Iftikhar Ahmed
- National Agricultural Research Centre (NARC), Land Resources Research Institute (LRRI), National Culture Collection of Pakistan (NCCP), Islamabad, 45500, Pakistan
| | - Yuqin Zhang
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, 100050, China
| | - Wen-Jun Li
- Jiangsu Province Engineering Research Center for Marine Bio-resources Sustainable Utilization and College of Oceanography, Hohai University, Nanjing, 210098, China
- State Key Laboratory of Biocontrol, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) and Guangdong Provincial Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275, China
| | - Zhe Zhao
- Jiangsu Province Engineering Research Center for Marine Bio-resources Sustainable Utilization and College of Oceanography, Hohai University, Nanjing, 210098, China
| |
Collapse
|
13
|
Togkousidis A, Kozlov OM, Haag J, Höhler D, Stamatakis A. Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty. Mol Biol Evol 2023; 40:msad227. [PMID: 37804116 PMCID: PMC10584362 DOI: 10.1093/molbev/msad227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 09/06/2023] [Accepted: 09/26/2023] [Indexed: 10/08/2023] Open
Abstract
Phylogenetic inferences under the maximum likelihood criterion deploy heuristic tree search strategies to explore the vast search space. Depending on the input dataset, searches from different starting trees might all converge to a single tree topology. Often, though, distinct searches infer multiple topologies with large log-likelihood score differences or yield topologically highly distinct, yet almost equally likely, trees. Recently, Haag et al. introduced an approach to quantify, and implemented machine learning methods to predict, the dataset difficulty with respect to phylogenetic inference. Easy multiple sequence alignments (MSAs) exhibit a single likelihood peak on their likelihood surface, associated with a single tree topology to which most, if not all, independent searches rapidly converge. As difficulty increases, multiple locally optimal likelihood peaks emerge, yet from highly distinct topologies. To make use of this information, we introduce and implement an adaptive tree search heuristic in RAxML-NG, which modifies the thoroughness of the tree search strategy as a function of the predicted difficulty. Our adaptive strategy is based upon three observations. First, on easy datasets, searches converge rapidly and can hence be terminated at an earlier stage. Second, overanalyzing difficult datasets is hopeless, and thus it suffices to quickly infer only one of the numerous almost equally likely topologies to reduce overall execution time. Third, more extensive searches are justified and required on datasets with intermediate difficulty. While the likelihood surface exhibits multiple locally optimal peaks in this case, a small proportion of them is significantly better. Our experimental results for the adaptive heuristic on 9,515 empirical and 5,000 simulated datasets with varying difficulty exhibit substantial speedups, especially on easy and difficult datasets (53% of total MSAs), where we observe average speedups of more than 10×. Further, approximately 94% of the inferred trees using the adaptive strategy are statistically indistinguishable from the trees inferred under the standard strategy (RAxML-NG).
Collapse
Affiliation(s)
- Anastasis Togkousidis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - Oleksiy M Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - Julia Haag
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - Dimitri Höhler
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
- Biodiversity Computing Group, Institute of Computer Science, Foundation for Research and Technology - Hellas, GR - 711 10 Heraklion, Crete, Greece
| |
Collapse
|
14
|
Aizenbud Y, Jaffe A, Wang M, Hu A, Amsel N, Nadler B, Chang JT, Kluger Y. Spectral top-down recovery of latent tree models. INFORMATION AND INFERENCE : A JOURNAL OF THE IMA 2023; 12:iaad032. [PMID: 37593361 PMCID: PMC10431953 DOI: 10.1093/imaiai/iaad032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 03/24/2023] [Accepted: 06/24/2023] [Indexed: 08/19/2023]
Abstract
Modeling the distribution of high-dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common approach, termed divide-and-conquer, is to recover the tree structure in two steps. First, separately recover the structure of multiple, possibly random subsets of the terminal nodes. Second, merge the resulting subtrees to form a full tree. Here, we develop spectral top-down recovery (STDR), a deterministic divide-and-conquer approach to infer large latent tree models. Unlike previous methods, STDR partitions the terminal nodes in a non random way, based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that under certain conditions, this partitioning is consistent with the tree structure. This, in turn, leads to a significantly simpler merging procedure of the small subtrees. We prove that STDR is statistically consistent and bound the number of samples required to accurately recover the tree with high probability. Using simulated data from several common tree models in phylogenetics, we demonstrate that STDR has a significant advantage in terms of runtime, with improved or similar accuracy.
Collapse
Affiliation(s)
- Yariv Aizenbud
- Program in Applied Mathematics, Yale University, New Haven, CT 06511, USA
| | - Ariel Jaffe
- Program in Applied Mathematics, Yale University, New Haven, CT 06511, USA
| | - Meng Wang
- Department of Pathology, Yale University, New Haven, CT 06511, USA
| | - Amber Hu
- Program in Applied Mathematics, Yale University, New Haven, CT 06511, USA
| | - Noah Amsel
- Program in Applied Mathematics, Yale University, New Haven, CT 06511, USA
| | - Boaz Nadler
- Department of Computer Science, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Joseph T Chang
- Department of Statistics, Yale University, New Haven, CT 06520, USA
| | - Yuval Kluger
- Program in Applied Mathematics, Yale University, New Haven, CT 06511, USA
- Department of Pathology, Yale University, New Haven, CT 06511, USA
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
| |
Collapse
|
15
|
Holtz A, Baele G, Bourhy H, Zhukova A. Integrating full and partial genome sequences to decipher the global spread of canine rabies virus. Nat Commun 2023; 14:4247. [PMID: 37460566 DOI: 10.1038/s41467-023-39847-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 06/30/2023] [Indexed: 07/20/2023] Open
Abstract
Despite the rapid growth in viral genome sequencing, statistical methods face challenges in handling historical viral endemic diseases with large amounts of underutilized partial sequence data. We propose a phylogenetic pipeline that harnesses both full and partial viral genome sequences to investigate historical pathogen spread between countries. Its application to rabies virus (RABV) yields precise dating and confident estimates of its geographic dispersal. By using full genomes and partial sequences, we reduce both geographic and genetic biases that often hinder studies that focus on specific genes. Our pipeline reveals an emergence of the present canine-mediated RABV between years 1301 and 1403 and reveals regional introductions over a 700-year period. This geographic reconstruction enables us to locate episodes of human-mediated introductions of RABV and examine the role that European colonization played in its spread. Our approach enables phylogeographic analysis of large and genetically diverse data sets for many viral pathogens.
Collapse
Affiliation(s)
- Andrew Holtz
- Institut Pasteur, Université Paris Cité, Lyssavirus Epidemiology and Neuropathology Unit, F-75015, Paris, France.
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Hervé Bourhy
- Institut Pasteur, Université Paris Cité, Lyssavirus Epidemiology and Neuropathology Unit, F-75015, Paris, France
- World Health Organization Collaborating Center for Reference and Research on Rabies, Institut Pasteur, Paris, France
| | - Anna Zhukova
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015, Paris, France.
| |
Collapse
|
16
|
Hu RS, Zhang X, Wei Y. ZooPathWeb: a comprehensive web resource for zoonotic pathogens. BIOINFORMATICS ADVANCES 2023; 3:vbad094. [PMID: 37465397 PMCID: PMC10351968 DOI: 10.1093/bioadv/vbad094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 06/15/2023] [Accepted: 07/08/2023] [Indexed: 07/20/2023]
Abstract
Motivation Zoonotic pathogens, such as viruses, bacteria, fungi and parasites, can be transmitted from animals to humans, causing a wide range of diseases that can vary from mild to life-threatening. These pathogens typically exhibit a broad host range, infecting domestic and/or wild animals, which serve as reservoirs of infection. Human infection can occur through direct contact with infected animals or their body fluids, consumption of contaminated food or water, or via bites from infected arthropod vectors. Understanding the epidemiological characteristics and population structure of zoonotic pathogens is of paramount importance for preventing and controlling the spread of zoonotic diseases. Results Here, we present ZooPathWeb, a comprehensive online resource for zoonotic pathogens. ZooPathWeb provides essential information on pathogens that are particularly relevant to public health and includes a literature collection organized by pathogen classification, such as lineage, host, country or region and publication year. Moreover, we have developed four web-based utility tools for this release: SeqNHandle, PaPhy-ML, TreeView and BLAST. These tools are specifically designed to facilitate the identification of population structure and adaptive evolution in relation to zoonotic pathogens. Availability and implementation The ZooPathWeb website is accessed via http://lab.malab.cn/~hrs/zoopathweb/. The source code for AKINND, which is used for collecting pathogen-related literature, can be found at https://github.com/RuiSiHu/AKINND. Additionally, the source code for PaPhy-ML, utilized for phylogenetic analysis, can be found at https://github.com/RuiSiHu/PaPhy-ML. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Rui-Si Hu
- To whom correspondence should be addressed.
| | - Xin Zhang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 610054, China
| | - Yanming Wei
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710071, China
| |
Collapse
|
17
|
Kumar S, Tao Q, Lamarca AP, Tamura K. Computational Reproducibility of Molecular Phylogenies. Mol Biol Evol 2023; 40:msad165. [PMID: 37467477 PMCID: PMC10370456 DOI: 10.1093/molbev/msad165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 07/11/2023] [Accepted: 07/12/2023] [Indexed: 07/21/2023] Open
Abstract
Repeated runs of the same program can generate different molecular phylogenies from identical data sets under the same analytical conditions. This lack of reproducibility of inferred phylogenies casts a long shadow on downstream research employing these phylogenies in areas such as comparative genomics, systematics, and functional biology. We have assessed the relative accuracies and log-likelihoods of alternative phylogenies generated for computer-simulated and empirical data sets. Our findings indicate that these alternative phylogenies reconstruct evolutionary relationships with comparable accuracy. They also have similar log-likelihoods that are not inferior to the log-likelihoods of the true tree. We determined that the direct relationship between irreproducibility and inaccuracy is due to their common dependence on the amount of phylogenetic information in the data. While computational reproducibility can be enhanced through more extensive heuristic searches for the maximum likelihood tree, this does not lead to higher accuracy. We conclude that computational irreproducibility plays a minor role in molecular phylogenetics.
Collapse
Affiliation(s)
- Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Alessandra P Lamarca
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Koichiro Tamura
- Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University, Hachioji, Tokyo, Japan
- Department of Biological Sciences, Tokyo Metropolitan University, Hachioji, Tokyo, Japan
| |
Collapse
|
18
|
Jacques F, Bolivar P, Pietras K, Hammarlund EU. Roadmap to the study of gene and protein phylogeny and evolution-A practical guide. PLoS One 2023; 18:e0279597. [PMID: 36827278 PMCID: PMC9955684 DOI: 10.1371/journal.pone.0279597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 12/12/2022] [Indexed: 02/25/2023] Open
Abstract
Developments in sequencing technologies and the sequencing of an ever-increasing number of genomes have revolutionised studies of biodiversity and organismal evolution. This accumulation of data has been paralleled by the creation of numerous public biological databases through which the scientific community can mine the sequences and annotations of genomes, transcriptomes, and proteomes of multiple species. However, to find the appropriate databases and bioinformatic tools for respective inquiries and aims can be challenging. Here, we present a compilation of DNA and protein databases, as well as bioinformatic tools for phylogenetic reconstruction and a wide range of studies on molecular evolution. We provide a protocol for information extraction from biological databases and simple phylogenetic reconstruction using probabilistic and distance methods, facilitating the study of biodiversity and evolution at the molecular level for the broad scientific community.
Collapse
Affiliation(s)
- Florian Jacques
- Lund University Cancer Centre, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Paulina Bolivar
- Lund University Cancer Centre, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Kristian Pietras
- Lund University Cancer Centre, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Emma U. Hammarlund
- Lund University Cancer Centre, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Department of Laboratory Medicine, Lund University, Lund, Sweden
| |
Collapse
|
19
|
Ortiz J, Bobkov YV, DeBiasse MB, Mitchell DG, Edgar A, Martindale MQ, Moss AG, Babonis LS, Ryan JF. Independent Innexin Radiation Shaped Signaling in Ctenophores. Mol Biol Evol 2023; 40:7026321. [PMID: 36740225 PMCID: PMC9949713 DOI: 10.1093/molbev/msad025] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 12/30/2022] [Accepted: 01/25/2023] [Indexed: 02/07/2023] Open
Abstract
Innexins facilitate cell-cell communication by forming gap junctions or nonjunctional hemichannels, which play important roles in metabolic, chemical, ionic, and electrical coupling. The lack of knowledge regarding the evolution and role of these channels in ctenophores (comb jellies), the likely sister group to the rest of animals, represents a substantial gap in our understanding of the evolution of intercellular communication in animals. Here, we identify and phylogenetically characterize the complete set of innexins of four ctenophores: Mnemiopsis leidyi, Hormiphora californensis, Pleurobrachia bachei, and Beroe ovata. Our phylogenetic analyses suggest that ctenophore innexins diversified independently from those of other animals and were established early in the emergence of ctenophores. We identified a four-innexin genomic cluster, which was present in the last common ancestor of these four species and has been largely maintained in these lineages. Evidence from correlated spatial and temporal gene expression of the M. leidyi innexin cluster suggests that this cluster has been maintained due to constraints related to gene regulation. We describe the basic electrophysiological properties of putative ctenophore hemichannels from muscle cells using intracellular recording techniques, showing substantial overlap with the properties of bilaterian innexin channels. Together, our results suggest that the last common ancestor of animals had gap junctional channels also capable of forming functional innexin hemichannels, and that innexin genes have independently evolved in major lineages throughout Metazoa.
Collapse
Affiliation(s)
| | | | - Melissa B DeBiasse
- Whitney Laboratory for Marine Bioscience, University of Florida, St Augustine, FL, USA,School of Natural Sciences, University of California Merced, Merced, CA, USA
| | - Dorothy G Mitchell
- Whitney Laboratory for Marine Bioscience, University of Florida, St Augustine, FL, USA,Department of Biology, University of Florida, Gainesville, FL, USA
| | - Allison Edgar
- Whitney Laboratory for Marine Bioscience, University of Florida, St Augustine, FL, USA
| | - Mark Q Martindale
- Whitney Laboratory for Marine Bioscience, University of Florida, St Augustine, FL, USA,Department of Biology, University of Florida, Gainesville, FL, USA
| | - Anthony G Moss
- Biological Sciences Department, Auburn University, Auburn, AL, USA
| | - Leslie S Babonis
- Whitney Laboratory for Marine Bioscience, University of Florida, St Augustine, FL, USA,Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, USA
| | | |
Collapse
|
20
|
Li J, Fu C, Ai Q, Xie S, Huang C, Zhao M, Fu J, Wu H. Whole-genome resequencing reveals complex effects of geographical-palaeoclimatic interactions on diversification of moustache toads in East Asia. Mol Ecol 2023; 32:644-659. [PMID: 36380736 DOI: 10.1111/mec.16781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 11/10/2022] [Accepted: 11/11/2022] [Indexed: 11/17/2022]
Abstract
Geographical features and palaeoclimatic fluctuations are two classical evolutionary forces that shape genetic diversification within species. Fine-grained analysis of the mechanisms involved through population demographic processes, however, remains limited. Taking advantage of two recently published reference genomes, we resequenced the genomes and examined the evolutionary history of the moustache toads, a group endemic to East Asia where complex topography and fluctuating palaeoclimate are known to have had profound impacts on organisms. Moustache toads probably originated in southeast Yunnan, China, and diversified towards the northwestern of Yunnan, as well as central and eastern China. Further exploration based on three widespread species (Leptobrachium ailaonicum, L. boringii and L. liui) using demographic modelling and species distribution models revealed that mountains and river valleys in East Asia not only functioned as geographical barriers, but also provided dispersal corridors and facilitated continuous migration or post-glacial secondary contact among moustache toad populations. Furthermore, periodic oscillation of effective population sizes accompanying fluctuations of historical temperature and population contraction at the Last Glacial Maximum support the widespread impact of climatic changes of the Pleistocene on species diversification in East Asia. This impact was moderate for populations of L. ailaonicum and L. boringii in the southwestern mountains but severe for populations of L. liui in the eastern lowland regions of continental East Asia, which is supported by different degrees of change of their effective population sizes. Our findings reveal mechanisms underlying genetic diversification among moustache toads, and highlight the power of genomic data and demographic modelling for examining complex historical population-level processes and for understanding how geographical and palaeoclimatic factors interactively shape current intraspecific diversity.
Collapse
Affiliation(s)
- Jun Li
- Institute of Evolution and Ecology, International Research Centre of Ecology and Environment, School of Life Sciences, Central China Normal University, Wuhan, Hubei, China
| | - Chao Fu
- Institute of Evolution and Ecology, International Research Centre of Ecology and Environment, School of Life Sciences, Central China Normal University, Wuhan, Hubei, China
| | - Qingbo Ai
- Institute of Evolution and Ecology, International Research Centre of Ecology and Environment, School of Life Sciences, Central China Normal University, Wuhan, Hubei, China
| | - Siyu Xie
- Institute of Evolution and Ecology, International Research Centre of Ecology and Environment, School of Life Sciences, Central China Normal University, Wuhan, Hubei, China
| | - Chunhua Huang
- Institute of Evolution and Ecology, International Research Centre of Ecology and Environment, School of Life Sciences, Central China Normal University, Wuhan, Hubei, China
| | - Mian Zhao
- Institute of Evolution and Ecology, International Research Centre of Ecology and Environment, School of Life Sciences, Central China Normal University, Wuhan, Hubei, China
| | - Jinzhong Fu
- Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada
| | - Hua Wu
- Institute of Evolution and Ecology, International Research Centre of Ecology and Environment, School of Life Sciences, Central China Normal University, Wuhan, Hubei, China
| |
Collapse
|
21
|
Xiang C, Gao F, Jakovlić I, Lei H, Hu Y, Zhang H, Zou H, Wang G, Zhang D. Using PhyloSuite for molecular phylogeny and tree-based analyses. IMETA 2023; 2:e87. [PMID: 38868339 PMCID: PMC10989932 DOI: 10.1002/imt2.87] [Citation(s) in RCA: 77] [Impact Index Per Article: 77.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 01/04/2023] [Accepted: 01/15/2023] [Indexed: 06/14/2024]
Abstract
Phylogenetic analysis has entered the genomics (multilocus) era. For less experienced researchers, conquering the large number of software programs required for a multilocus-based phylogenetic reconstruction can be somewhat daunting and time-consuming. PhyloSuite, a software with a user-friendly GUI, was designed to make this process more accessible by integrating multiple software programs needed for multilocus and single-gene phylogenies and further streamlining the whole process. In this protocol, we aim to explain how to conduct each step of the phylogenetic pipeline and tree-based analyses in PhyloSuite. We also present a new version of PhyloSuite (v1.2.3), wherein we fixed some bugs, made some optimizations, and introduced some new functions, including a number of tree-based analyses, such as signal-to-noise calculation, saturation analysis, spurious species identification, and etc. The step-by-step protocol includes background information (i.e., what the step does), reasons (i.e., why do the step), and operations (i.e., how to do it). This protocol will help researchers quick-start their way through the multilocus phylogenetic analysis, especially those interested in conducting organelle-based analyses.
Collapse
Affiliation(s)
- Chuan‐Yu Xiang
- State Key Laboratory of Grassland Agro‐Ecosystems, and College of EcologyLanzhou UniversityLanzhouChina
| | - Fangluan Gao
- Institute of Plant Virology, Fujian Agriculture and Forestry UniversityFuzhouChina
| | - Ivan Jakovlić
- State Key Laboratory of Grassland Agro‐Ecosystems, and College of EcologyLanzhou UniversityLanzhouChina
| | - Hong‐Peng Lei
- State Key Laboratory of Grassland Agro‐Ecosystems, and College of EcologyLanzhou UniversityLanzhouChina
| | - Ye Hu
- State Key Laboratory of Grassland Agro‐Ecosystems, and College of EcologyLanzhou UniversityLanzhouChina
| | - Hong Zhang
- State Key Laboratory of Grassland Agro‐Ecosystems, and College of EcologyLanzhou UniversityLanzhouChina
| | - Hong Zou
- Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture, and State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of SciencesWuhanChina
| | - Gui‐Tang Wang
- Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture, and State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of SciencesWuhanChina
| | - Dong Zhang
- State Key Laboratory of Grassland Agro‐Ecosystems, and College of EcologyLanzhou UniversityLanzhouChina
| |
Collapse
|
22
|
Palmieri L, Giribet G, Sharma PP. Too early for the ferry: The biogeographic history of the Assamiidae of southeast Asia (Chelicerata: Opiliones, Laniatores). Mol Phylogenet Evol 2023; 178:107647. [PMID: 36273758 DOI: 10.1016/j.ympev.2022.107647] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 08/30/2022] [Accepted: 10/17/2022] [Indexed: 11/21/2022]
Abstract
Opiliones (harvestmen) have come to be regarded as an abundant source of model groups for study of historical biogeography, due to their ancient age, poor dispersal capability, and high fidelity to biogeographic terranes. One of the least understood harvestman groups is the Paleotropical Assamiidae, one of the more diverse families of Opiliones. Due to a labyrinthine taxonomy, poorly established generic and subfamilial boundaries, and the lack of taxonomic keys for the group, few efforts have been undertaken to decipher relationships within this arachnid lineage. Neither the monophyly of the family, nor its exact placement in the harvestman phylogeny, have been established. Here, we assessed the internal phylogeny of Assamiidae using a ten-locus Sanger dataset, sampling key lineages putatively ascribed to this family for five of the ten markers. Our analyses recovered Assamiidae as a monophyletic group, in a clade with the primarily Afrotropical Pyramidopidae and the southeast Asian Beloniscidae. Internal relationships of assamiids disfavored the systematic validity of subfamilies, with biogeography reflecting much better phylogenetic structure than the existing higher-level taxonomy. To assess whether the Asian assamiids came to occupy Indo-Pacific terranes via rafting on the Indian subcontinent, we performed divergence dating to infer the age of the family. Our results show that Indo-Pacific clades are ancient, originating well before the Cretaceous and therefore predate a vicariant mechanism commonly encountered for Paleotropical taxa.
Collapse
Affiliation(s)
- Luciano Palmieri
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI 53711, USA.
| | - Gonzalo Giribet
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Prashant P Sharma
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI 53711, USA.
| |
Collapse
|
23
|
Li Y, Gao H, Zhang H, Yu R, Feng F, Tang J, Li B. Characterization and expression profiling of G protein-coupled receptors (GPCRs) in Spodoptera litura (Lepidoptera: Noctuidae). COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY. PART D, GENOMICS & PROTEOMICS 2022; 44:101018. [PMID: 35994891 DOI: 10.1016/j.cbd.2022.101018] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 08/05/2022] [Accepted: 08/05/2022] [Indexed: 01/27/2023]
Abstract
Spodoptera litura is a highly destructive omnivorous pest, and they caused serious damage to various crops. G protein-coupled receptors (GPCRs) mediate dozens of physiological processes including reproduction, development, life span and behaviors, but the information of these receptors has been lacking in S. litura. Here, we methodically identified 122 GPCRs in S. litura and made an assay of their expression patterns in different tissues. Comparing the identified GPCRs with homologous genes of other insects, it is obvious that the subfamily A2 (biogenic amine receptors) and the subfamily A3 (neuropeptide and protein hormone receptors) of S. litura have expanded to a certain extent, which may be related to the omnivorous nature and drought environment resistance of S. litura. Besides, the large Methuselah (Mth)/Methuselah-like (Mthl) subfamily of S. litura may be involved in many physiological functions such as longevity and stress response. Apart from duplicate receptors, the loss of parathyroid hormone receptor (PTHR) and the bride of sevenless (Boss) receptor in the lepidopteran insects may imply a new pattern of wing formation and energy metabolism in lepidopteran insects. In addition, the high expression level of GPCRs in different tissues reflects the functional diversity of GPCRs regulating. Systemic identification and initial characterization of GPCRs in S. litura provide a basis for further studies to reveal the functions of these receptors in regulating physiology and behavior.
Collapse
Affiliation(s)
- Yanxiao Li
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Han Gao
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Hui Zhang
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Runnan Yu
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Fan Feng
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Jing Tang
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Bin Li
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China.
| |
Collapse
|
24
|
Koutsovoulos GD, Granjeon Noriot S, Bailly-Bechet M, Danchin EGJ, Rancurel C. AvP: A software package for automatic phylogenetic detection of candidate horizontal gene transfers. PLoS Comput Biol 2022; 18:e1010686. [PMID: 36350852 PMCID: PMC9678320 DOI: 10.1371/journal.pcbi.1010686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 11/21/2022] [Accepted: 10/26/2022] [Indexed: 11/10/2022] Open
Abstract
Horizontal gene transfer (HGT) is the transfer of genes between species outside the transmission from parent to offspring. Due to their impact on the genome and biology of various species, HGTs have gained broader attention, but high-throughput methods to robustly identify them are lacking. One rapid method to identify HGT candidates is to calculate the difference in similarity between the most similar gene in closely related species and the most similar gene in distantly related species. Although metrics on similarity associated with taxonomic information can rapidly detect putative HGTs, these methods are hampered by false positives that are difficult to track. Furthermore, they do not inform on the evolutionary trajectory and events such as duplications. Hence, phylogenetic analysis is necessary to confirm HGT candidates and provide a more comprehensive view of their origin and evolutionary history. However, phylogenetic reconstruction requires several time-consuming manual steps to retrieve the homologous sequences, produce a multiple alignment, construct the phylogeny and analyze the topology to assess whether it supports the HGT hypothesis. Here, we present AvP which automatically performs all these steps and detects candidate HGTs within a phylogenetic framework.
Collapse
Affiliation(s)
- Georgios D. Koutsovoulos
- Institut Sophia Agrobiotech, Université Côte d’Azur, INRAE, CNRS, Sophia Antipolis, France
- * E-mail:
| | - Solène Granjeon Noriot
- Institut Sophia Agrobiotech, Université Côte d’Azur, INRAE, CNRS, Sophia Antipolis, France
| | - Marc Bailly-Bechet
- Institut Sophia Agrobiotech, Université Côte d’Azur, INRAE, CNRS, Sophia Antipolis, France
| | - Etienne G. J. Danchin
- Institut Sophia Agrobiotech, Université Côte d’Azur, INRAE, CNRS, Sophia Antipolis, France
| | - Corinne Rancurel
- Institut Sophia Agrobiotech, Université Côte d’Azur, INRAE, CNRS, Sophia Antipolis, France
| |
Collapse
|
25
|
Steenwyk JL, Goltz DC, Buida TJ, Li Y, Shen XX, Rokas A. OrthoSNAP: A tree splitting and pruning algorithm for retrieving single-copy orthologs from gene family trees. PLoS Biol 2022; 20:e3001827. [PMID: 36228036 PMCID: PMC9595520 DOI: 10.1371/journal.pbio.3001827] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 10/25/2022] [Accepted: 09/13/2022] [Indexed: 11/19/2022] Open
Abstract
Molecular evolution studies, such as phylogenomic studies and genome-wide surveys of selection, often rely on gene families of single-copy orthologs (SC-OGs). Large gene families with multiple homologs in 1 or more species-a phenomenon observed among several important families of genes such as transporters and transcription factors-are often ignored because identifying and retrieving SC-OGs nested within them is challenging. To address this issue and increase the number of markers used in molecular evolution studies, we developed OrthoSNAP, a software that uses a phylogenetic framework to simultaneously split gene families into SC-OGs and prune species-specific inparalogs. We term SC-OGs identified by OrthoSNAP as SNAP-OGs because they are identified using a splitting and pruning procedure analogous to snapping branches on a tree. From 415,129 orthologous groups of genes inferred across 7 eukaryotic phylogenomic datasets, we identified 9,821 SC-OGs; using OrthoSNAP on the remaining 405,308 orthologous groups of genes, we identified an additional 10,704 SNAP-OGs. Comparison of SNAP-OGs and SC-OGs revealed that their phylogenetic information content was similar, even in complex datasets that contain a whole-genome duplication, complex patterns of duplication and loss, transcriptome data where each gene typically has multiple transcripts, and contentious branches in the tree of life. OrthoSNAP is useful for increasing the number of markers used in molecular evolution data matrices, a critical step for robustly inferring and exploring the tree of life.
Collapse
Affiliation(s)
- Jacob L. Steenwyk
- Vanderbilt University, Department of Biological Sciences, Nashville, Tennessee, United States of America
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, Tennessee, United States of America
- * E-mail: (JLS); (AR)
| | - Dayna C. Goltz
- Independent Researcher, Nashville, Tennessee, United States of America
| | - Thomas J. Buida
- Independent Researcher, Nashville, Tennessee, United States of America
| | - Yuanning Li
- Vanderbilt University, Department of Biological Sciences, Nashville, Tennessee, United States of America
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, Tennessee, United States of America
- Institute of Marine Science and Technology, Shandong University, Qingdao, China
| | - Xing-Xing Shen
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Antonis Rokas
- Vanderbilt University, Department of Biological Sciences, Nashville, Tennessee, United States of America
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, Tennessee, United States of America
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- * E-mail: (JLS); (AR)
| |
Collapse
|
26
|
Nitta JH, Schuettpelz E, Ramírez-Barahona S, Iwasaki W. An open and continuously updated fern tree of life. FRONTIERS IN PLANT SCIENCE 2022; 13:909768. [PMID: 36092417 PMCID: PMC9449725 DOI: 10.3389/fpls.2022.909768] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 07/12/2022] [Indexed: 05/31/2023]
Abstract
Ferns, with about 12,000 species, are the second most diverse lineage of vascular plants after angiosperms. They have been the subject of numerous molecular phylogenetic studies, resulting in the publication of trees for every major clade and DNA sequences from nearly half of all species. Global fern phylogenies have been published periodically, but as molecular systematics research continues at a rapid pace, these become quickly outdated. Here, we develop a mostly automated, reproducible, open pipeline to generate a continuously updated fern tree of life (FTOL) from DNA sequence data available in GenBank. Our tailored sampling strategy combines whole plastomes (few taxa, many loci) with commonly sequenced plastid regions (many taxa, few loci) to obtain a global, species-level fern phylogeny with high resolution along the backbone and maximal sampling across the tips. We use a curated reference taxonomy to resolve synonyms in general compliance with the community-driven Pteridophyte Phylogeny Group I classification. The current FTOL includes 5,582 species, an increase of ca. 40% relative to the most recently published global fern phylogeny. Using an updated and expanded list of 51 fern fossil constraints, we find estimated ages for most families and deeper clades to be considerably older than earlier studies. FTOL and its accompanying datasets, including the fossil list and taxonomic database, will be updated on a regular basis and are available via a web portal (https://fernphy.github.io) and R packages, enabling immediate access to the most up-to-date, comprehensively sampled fern phylogeny. FTOL will be useful for anyone studying this important group of plants over a wide range of taxonomic scales, from smaller clades to the entire tree. We anticipate FTOL will be particularly relevant for macroecological studies at regional to global scales and will inform future taxonomic systems with the most recent hypothesis of fern phylogeny.
Collapse
Affiliation(s)
- Joel H. Nitta
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
- Department of Botany, National Museum of Natural History, Smithsonian Institution, Washington, DC, United States
| | - Eric Schuettpelz
- Department of Botany, National Museum of Natural History, Smithsonian Institution, Washington, DC, United States
| | - Santiago Ramírez-Barahona
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
- Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba, Japan
- Institute for Quantitative Biosciences, The University of Tokyo, Tokyo, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
27
|
García-Cunchillos I, Carlos Zamora J, Ryberg M, Lado C. Phylogeny and evolution of morphological structures in a highly diverse lineage of fruiting-body-forming amoebae, order Trichiales (Myxomycetes, Amoebozoa). Mol Phylogenet Evol 2022; 177:107609. [PMID: 35963588 DOI: 10.1016/j.ympev.2022.107609] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 06/14/2022] [Accepted: 08/05/2022] [Indexed: 11/24/2022]
Abstract
Early phylogenetic studies refuted most previous assumptions concerning the evolution of the morphological traits in the fruiting bodies of the order Trichiales and did not detect discernible evolutionary patterns, yet they were based on a limited number of species. We infer a new Trichiales phylogeny based on three independently inherited genetic regions (nuclear and mitochondrial), with a fair taxonomic sampling encompassing its broad diversity. Besides, we study the evolutionary history of some key morphological characters. According to the new phylogeny, most fruiting body traits in Trichiales systematics do not represent exclusive synapomorphies or autapomorphies for most monophyletic groups. Instead, the evolution of the features derived from the peridium, stalk, capillitium, and spores showed intricate patterns, and character state transitions occurred rather within- than between clades. Thus, we should consider other evolutionary scenarios instead of assuming the homology of some characters. According to these results, we propose a new classification of Trichiales, including the creation of a new genus, Gulielmina, the resurrection of the family Dictydiaethaliaceae and the genus Ophiotheca, and the proporsal of 13 new combinations for species of the genera Arcyria (1), Hemitrichia (2), Ophiotheca (2), Oligonema (4), Gulielmina (3), and Perichaena (1).
Collapse
Affiliation(s)
| | - Juan Carlos Zamora
- Conservatorie et Jardin Botaniques de la Ville de Genève, Chemin de l'Impératrice 1, 1292, Chambésy, Switzerland; Museum of Evolution, Uppsala University, Norbyvägen 16, Uppsala 752 36, Sweden
| | - Martin Ryberg
- Department of Organismal Biology, Systematic Biology, Uppsala University, Norbyvägen '18D, Uppsala 752 36, Sweden
| | - Carlos Lado
- Real Jardín Botánico, CSIC, Plaza de Murillo 2, 28014 Madrid, Spain
| |
Collapse
|
28
|
Gut virome profiling identifies a widespread bacteriophage family associated with metabolic syndrome. Nat Commun 2022; 13:3594. [PMID: 35739117 PMCID: PMC9226167 DOI: 10.1038/s41467-022-31390-5] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 06/14/2022] [Indexed: 11/09/2022] Open
Abstract
There is significant interest in altering the course of cardiometabolic disease development via gut microbiomes. Nevertheless, the highly abundant phage members of the complex gut ecosystem -which impact gut bacteria- remain understudied. Here, we show gut virome changes associated with metabolic syndrome (MetS), a highly prevalent clinical condition preceding cardiometabolic disease, in 196 participants by combined sequencing of bulk whole genome and virus like particle communities. MetS gut viromes exhibit decreased richness and diversity. They are enriched in phages infecting Streptococcaceae and Bacteroidaceae and depleted in those infecting Bifidobacteriaceae. Differential abundance analysis identifies eighteen viral clusters (VCs) as significantly associated with either MetS or healthy viromes. Among these are a MetS-associated Roseburia VC that is related to healthy control-associated Faecalibacterium and Oscillibacter VCs. Further analysis of these VCs revealed the Candidatus Heliusviridae, a highly widespread gut phage lineage found in 90+% of participants. The identification of the temperate Ca. Heliusviridae provides a starting point to studies of phage effects on gut bacteria and the role that this plays in MetS.
Collapse
|
29
|
Czech L, Stamatakis A, Dunthorn M, Barbera P. Metagenomic Analysis Using Phylogenetic Placement-A Review of the First Decade. FRONTIERS IN BIOINFORMATICS 2022; 2:871393. [PMID: 36304302 PMCID: PMC9580882 DOI: 10.3389/fbinf.2022.871393] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 04/11/2022] [Indexed: 12/20/2022] Open
Abstract
Phylogenetic placement refers to a family of tools and methods to analyze, visualize, and interpret the tsunami of metagenomic sequencing data generated by high-throughput sequencing. Compared to alternative (e. g., similarity-based) methods, it puts metabarcoding sequences into a phylogenetic context using a set of known reference sequences and taking evolutionary history into account. Thereby, one can increase the accuracy of metagenomic surveys and eliminate the requirement for having exact or close matches with existing sequence databases. Phylogenetic placement constitutes a valuable analysis tool per se, but also entails a plethora of downstream tools to interpret its results. A common use case is to analyze species communities obtained from metagenomic sequencing, for example via taxonomic assignment, diversity quantification, sample comparison, and identification of correlations with environmental variables. In this review, we provide an overview over the methods developed during the first 10 years. In particular, the goals of this review are 1) to motivate the usage of phylogenetic placement and illustrate some of its use cases, 2) to outline the full workflow, from raw sequences to publishable figures, including best practices, 3) to introduce the most common tools and methods and their capabilities, 4) to point out common placement pitfalls and misconceptions, 5) to showcase typical placement-based analyses, and how they can help to analyze, visualize, and interpret phylogenetic placement data.
Collapse
Affiliation(s)
- Lucas Czech
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, United States
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Micah Dunthorn
- Natural History Museum, University of Oslo, Oslo, Norway
| | | |
Collapse
|
30
|
The phytogeography and genetic diversity of the weedy hydrophyte, Pistia stratiotes L. Biol Invasions 2022. [DOI: 10.1007/s10530-022-02798-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
31
|
Young C, Meng S, Moshiri N. An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology. Viruses 2022; 14:v14040774. [PMID: 35458504 PMCID: PMC9032411 DOI: 10.3390/v14040774] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 04/04/2022] [Accepted: 04/06/2022] [Indexed: 01/25/2023] Open
Abstract
The use of viral sequence data to inform public health intervention has become increasingly common in the realm of epidemiology. Such methods typically utilize multiple sequence alignments and phylogenies estimated from the sequence data. Like all estimation techniques, they are error prone, yet the impacts of such imperfections on downstream epidemiological inferences are poorly understood. To address this, we executed multiple commonly used viral phylogenetic analysis workflows on simulated viral sequence data, modeling Human Immunodeficiency Virus (HIV), Hepatitis C Virus (HCV), and Ebolavirus, and we computed multiple methods of accuracy, motivated by transmission-clustering techniques. For multiple sequence alignment, MAFFT consistently outperformed MUSCLE and Clustal Omega, in both accuracy and runtime. For phylogenetic inference, FastTree 2, IQ-TREE, RAxML-NG, and PhyML had similar topological accuracies, but branch lengths and pairwise distances were consistently most accurate in phylogenies inferred by RAxML-NG. However, FastTree 2 was the fastest, by orders of magnitude, and when the other tools were used to optimize branch lengths along a fixed FastTree 2 topology, the resulting phylogenies had accuracies that were indistinguishable from their original counterparts, but with a fraction of the runtime.
Collapse
|
32
|
Zhang LJ, Wang DG, Zhang P, Wu C, Li YZ. Promiscuity Characteristics of Versatile Plant Glycosyltransferases for Natural Product Glycodiversification. ACS Synth Biol 2022; 11:812-819. [PMID: 35076210 DOI: 10.1021/acssynbio.1c00489] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Glycodiversification can optimize the properties of pharmaceutical compounds, and versatile glycosyltransferases (GTs) are the key enzymatic toolkits to achieve this goal. Plant GTs in the GT1 family (GT1-pGTs) have attracted much attention due to their promising substrate promiscuity, but previous investigations on GT1-pGTs were mainly conducted sporadically and without systematic phylogenetic comparisons. In this study, we exemplified the phylogeny-guided characterization of highly promiscuous GT1-pGTs from the contemporary surge of genomic information. All the available GT1-pGT sequences in the database were analyzed to explore the relationships between the substrate promiscuity and the phylogeny of GT1-pGTs. This systematic phylogenetic analysis directed us to choose 29 anonymous GT sequences from different evolutionary branches to probe their substrate promiscuity toward 10 aromatic compounds differing in chemical scaffolds. We found that promiscuous plant GTs (PPGTs) active toward ≥3 substrates were widely distributed in different clades but particularly enriched in the one containing the known promiscuous enzyme GuGT10. Ten highly promiscuous plant GTs were found to tolerate a wide spectrum (≥8) of substrates and inclusively catalyze the formation of O-, N-, and S-glycosidic bonds. The promiscuity of these 10 PPGTs was further tested using 15 sugar donors. Finally, we characterized FiGT2 that simultaneously exhibited pronounced promiscuity in terms of both the sugar acceptor and sugar donor. All in all, this study paves the way to unearth many more PPGTs and thus strengthen the enzymatic toolkit for the sustainable production of valuable glycosides through a synthetic biological approach.
Collapse
Affiliation(s)
- Li-Juan Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, 266237 Qingdao, P. R. China
| | - De-Gao Wang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, 266237 Qingdao, P. R. China
| | - Peng Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, 266237 Qingdao, P. R. China
| | - Changsheng Wu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, 266237 Qingdao, P. R. China
| | - Yue-Zhong Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, 266237 Qingdao, P. R. China
| |
Collapse
|
33
|
Grealey J, Lannelongue L, Saw WY, Marten J, Méric G, Ruiz-Carmona S, Inouye M. THE CARBON FOOTPRINT OF BIOINFORMATICS. Mol Biol Evol 2022; 39:6526403. [PMID: 35143670 PMCID: PMC8892942 DOI: 10.1093/molbev/msac034] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Bioinformatic research relies on large-scale computational infrastructures which have a nonzero carbon footprint but so far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this work, we estimate the carbon footprint of bioinformatics (in kilograms of CO2 equivalent units, kgCO2e) using the freely available Green Algorithms calculator (www.green-algorithms.org, last accessed 2022). We assessed 1) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics, and molecular simulations, as well as 2) computation strategies, such as parallelization, CPU (central processing unit) versus GPU (graphics processing unit), cloud versus local computing infrastructure, and geography. In particular, we found that biobank-scale GWAS emitted substantial kgCO2e and simple software upgrades could make it greener, for example, upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Moreover, switching from the average data center to a more efficient one can reduce carbon footprint by approximately 34%. Memory over-allocation can also be a substantial contributor to an algorithm’s greenhouse gas emissions. The use of faster processors or greater parallelization reduces running time but can lead to greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimize kgCO2e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research.
Collapse
Affiliation(s)
- Jason Grealey
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Department of Mathematics and Statistics, La Trobe University, Melbourne, Australia
| | - Loïc Lannelongue
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| | - Woei-Yuh Saw
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Jonathan Marten
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Guillaume Méric
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Australia
| | - Sergio Ruiz-Carmona
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.,British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK.,The Alan Turing Institute, London, UK
| |
Collapse
|
34
|
Li Y, Gao H, Yu R, Zhang Y, Feng F, Tang J, Li B. Identification and characterization of G protein-coupled receptors in Spodoptera frugiperda (Insecta: Lepidoptera). Gen Comp Endocrinol 2022; 317:113976. [PMID: 35016911 DOI: 10.1016/j.ygcen.2022.113976] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 12/23/2021] [Accepted: 01/05/2022] [Indexed: 12/13/2022]
Abstract
Spodoptera frugiperda (Insecta: Lepidoptera) is a destructive invasive pest feeding on various plants and causing serious damage to several economically-important crops. G protein-coupled receptors (GPCRs) are cellular receptors that coordinate diverse signaling processes, associated with many physiological processes and disease states. However, less information about GPCRs had been reported in S. frugiperda, limiting the recognition of signaling system and in-depth studies of this pest. Here, a total of 167 GPCRs were identified in S. frugiperda. Compared with other insects, the GPCRs of S. frugiperda were significantly expanded. A large of tandem duplication and segmental duplication events were observed, which may be the key factor to increase the size of GPCR family. In detail, these expansion events mainly concentrate on biogenic amine receptors, neuropeptide and protein hormone receptors, which may be involved in feeding, reproduction, life span, and tolerance of S. frugiperda. Additionally, 17 Mth/Mthl members were identified in S. frugiperda, which may be similar to the evolutionary pattern of 16 Mth/Mthl members in Drosophila. Moreover, the expression patterns across different developmental stages of all GPCR genes were also analyzed. Among these, most of the GPCR genes are poorly expressed in S. frugiperda and some highly expressed GPCR genes help S. frugiperda adapt to the environment better, such as Rh6 and AkhR. In this study, all GPCRs in S. frugiperda were identified for the first time, which provided a basis for further revealing the role of these receptors in the physiological and behavioral regulation of this pest.
Collapse
Affiliation(s)
- Yanxiao Li
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Han Gao
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Runnan Yu
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Yonglei Zhang
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Fan Feng
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Jing Tang
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Bin Li
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China.
| |
Collapse
|
35
|
Kozlov A, Alves JM, Stamatakis A, Posada D. CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data. Genome Biol 2022; 23:37. [PMID: 35081992 PMCID: PMC8790911 DOI: 10.1186/s13059-021-02583-w] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 12/20/2021] [Indexed: 01/15/2023] Open
Abstract
We introduce CellPhy, a maximum likelihood framework for inferring phylogenetic trees from somatic single-cell single-nucleotide variants. CellPhy leverages a finite-site Markov genotype model with 16 diploid states and considers amplification error and allelic dropout. We implement CellPhy into RAxML-NG, a widely used phylogenetic inference package that provides statistical confidence measurements and scales well on large datasets with hundreds or thousands of cells. Comprehensive simulations suggest that CellPhy is more robust to single-cell genomics errors and outperforms state-of-the-art methods under realistic scenarios, both in accuracy and speed. CellPhy is freely available at https://github.com/amkozlov/cellphy .
Collapse
Affiliation(s)
- Alexey Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
| | - Joao M. Alves
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - David Posada
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| |
Collapse
|
36
|
Azeem F, Zameer R, Rehman Rashid MA, Rasul I, Ul-Allah S, Siddique MH, Fiaz S, Raza A, Younas A, Rasool A, Ali MA, Anwar S, Siddiqui MH. Genome-wide analysis of potassium transport genes in Gossypium raimondii suggest a role of GrHAK/KUP/KT8, GrAKT2.1 and GrAKT1.1 in response to abiotic stress. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2022; 170:110-122. [PMID: 34864561 DOI: 10.1016/j.plaphy.2021.11.038] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/22/2021] [Accepted: 11/23/2021] [Indexed: 06/13/2023]
Abstract
Potassium (K+) is an important macro-nutrient for plants, which comprises almost 10% of plant's dry mass. It plays a crucial role in the growth of plants as well as other important processes related to metabolism and stress tolerance. Plants have a complex and well-organized potassium distribution system (channels and transporters). Cotton is the most important economic crop, which is the primary source of natural fiber. Soil deficiency in K+ can negatively affect yield and fiber quality of cotton. However, potassium transport system in cotton is poorly studied. Current study identified 43 Potassium Transport System (PTS) genes in Gossypium raimondii genome. Based on conserved domains, transmembrane domains, and motif structures, these genes were classified as K+ transporters (2 HKTs, 7 KEAs, and 16 KUP/HAK/KTs) and K+ channels (11 Shakers and 7 TPKs/KCO). The phylogenetic comparison of GrPTS genes from Arabidopsis thaliana, Glycine max, Oryza sativa, Medicago truncatula and Cicer arietinum revealed variations in PTS gene conservation. Evolutionary analysis predicted that most GrPTS genes were segmentally duplicated. Gene structure analysis showed that the intron/exon organization of these genes was conserved in specific-family. Chromosomal localization demonstrated a random distribution of PTS genes across all the thirteen chromosomes except chromosome six. Many stress responsive cis-regulatory elements were predicted in promoter regions of GrPTS genes. The RNA-seq data analysis followed by qRT-PCR validation demonstrated that PTS genes potentially work in groups against environmental factors. Moreover, a transporter gene (GrHAK/KUP/KT8) and two channel genes (GrAKT2.1 and GrAKT1.1) are important candidate genes for plant stress response. These results provide useful information for further functional characterization of PTS genes with the breeding aim of stress-resistant cultivars.
Collapse
Affiliation(s)
- Farrukh Azeem
- Department of Bioinformatics and Biotechnology, Govt. College University, Faisalabad, Pakistan
| | - Roshan Zameer
- Department of Bioinformatics and Biotechnology, Govt. College University, Faisalabad, Pakistan
| | | | - Ijaz Rasul
- Department of Bioinformatics and Biotechnology, Govt. College University, Faisalabad, Pakistan
| | - Sami Ul-Allah
- College of Agriculture, Bahauddin Zakariya University, Bahadur Sub-Campus, Layyah, Pakistan
| | | | - Sajid Fiaz
- Department of Plant Breeding and Genetics, The University of Haripur, 22620, Haripir, Pakistan.
| | - Ali Raza
- Fujian Provincial Key Laboratory of Crop Molecular and Cell Biology, Oil Crops Research Institute, Center of Legume Crop Genetics and Systems Biology/College of Agriculture, Fujian Agriculture and Forestry University (FAFU), Fuzhou, Fujian, 350002, China
| | - Afifa Younas
- Department of Botany, Lahore College for Women University, Lahore, Pakistan
| | - Asima Rasool
- Department of Bioinformatics and Biotechnology, Govt. College University, Faisalabad, Pakistan
| | - Muhammad Amjad Ali
- Department of Plant Pathology, University of Agriculture, Faisalabad, Pakistan
| | - Sultana Anwar
- Department of Agronomy, University of Florida, Gainesville, USA
| | - Manzer H Siddiqui
- Department of Botany and Microbiology, College of Science, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
37
|
Ecker N, Azouri D, Bettisworth B, Stamatakis A, Mansour Y, Mayrose I, Pupko T. OUP accepted manuscript. Bioinformatics 2022; 38:i118-i124. [PMID: 35758778 PMCID: PMC9236582 DOI: 10.1093/bioinformatics/btac252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Motivation In recent years, full-genome sequences have become increasingly available and as a result many modern phylogenetic analyses are based on very long sequences, often with over 100 000 sites. Phylogenetic reconstructions of large-scale alignments are challenging for likelihood-based phylogenetic inference programs and usually require using a powerful computer cluster. Current tools for alignment trimming prior to phylogenetic analysis do not promise a significant reduction in the alignment size and are claimed to have a negative effect on the accuracy of the obtained tree. Results Here, we propose an artificial-intelligence-based approach, which provides means to select the optimal subset of sites and a formula by which one can compute the log-likelihood of the entire data based on this subset. Our approach is based on training a regularized Lasso-regression model that optimizes the log-likelihood prediction accuracy while putting a constraint on the number of sites used for the approximation. We show that computing the likelihood based on 5% of the sites already provides accurate approximation of the tree likelihood based on the entire data. Furthermore, we show that using this Lasso-based approximation during a tree search decreased running-time substantially while retaining the same tree-search performance. Availability and implementation The code was implemented in Python version 3.8 and is available through GitHub (https://github.com/noaeker/lasso_positions_sampling). The datasets used in this paper were retrieved from Zhou et al. (2018) as described in section 3. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Noa Ecker
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Azouri
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ben Bettisworth
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
| | - Yishay Mansour
- The Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Itay Mayrose
- To whom correspondence should be addressed. E-mail: or
| | - Tal Pupko
- To whom correspondence should be addressed. E-mail: or
| |
Collapse
|
38
|
Robinson SL, Piel J, Sunagawa S. A roadmap for metagenomic enzyme discovery. Nat Prod Rep 2021; 38:1994-2023. [PMID: 34821235 PMCID: PMC8597712 DOI: 10.1039/d1np00006c] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Indexed: 12/13/2022]
Abstract
Covering: up to 2021Metagenomics has yielded massive amounts of sequencing data offering a glimpse into the biosynthetic potential of the uncultivated microbial majority. While genome-resolved information about microbial communities from nearly every environment on earth is now available, the ability to accurately predict biocatalytic functions directly from sequencing data remains challenging. Compared to primary metabolic pathways, enzymes involved in secondary metabolism often catalyze specialized reactions with diverse substrates, making these pathways rich resources for the discovery of new enzymology. To date, functional insights gained from studies on environmental DNA (eDNA) have largely relied on PCR- or activity-based screening of eDNA fragments cloned in fosmid or cosmid libraries. As an alternative, shotgun metagenomics holds underexplored potential for the discovery of new enzymes directly from eDNA by avoiding common biases introduced through PCR- or activity-guided functional metagenomics workflows. However, inferring new enzyme functions directly from eDNA is similar to searching for a 'needle in a haystack' without direct links between genotype and phenotype. The goal of this review is to provide a roadmap to navigate shotgun metagenomic sequencing data and identify new candidate biosynthetic enzymes. We cover both computational and experimental strategies to mine metagenomes and explore protein sequence space with a spotlight on natural product biosynthesis. Specifically, we compare in silico methods for enzyme discovery including phylogenetics, sequence similarity networks, genomic context, 3D structure-based approaches, and machine learning techniques. We also discuss various experimental strategies to test computational predictions including heterologous expression and screening. Finally, we provide an outlook for future directions in the field with an emphasis on meta-omics, single-cell genomics, cell-free expression systems, and sequence-independent methods.
Collapse
Affiliation(s)
| | - Jörn Piel
- Eidgenössische Technische Hochschule (ETH), Zürich, Switzerland.
| | | |
Collapse
|
39
|
Xue WQ, Wang TM, Huang JW, Zhang JB, He YQ, Wu ZY, Liao Y, Yuan LL, Mu J, Jia WH. A comprehensive analysis of genetic diversity of EBV reveals potential high-risk subtypes associated with nasopharyngeal carcinoma in China. Virus Evol 2021; 7:veab010. [PMID: 34567789 PMCID: PMC8458747 DOI: 10.1093/ve/veab010] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Epstein-Barr virus (EBV), a widespread oncovirus, is associated with multiple cancers including nasopharyngeal carcinoma (NPC), gastric cancer and diverse lymphoid malignancies. Recent studies reveal that specific EBV strains or subtypes are associated with NPC development in endemic regions. However, these NPC specific subtypes were only identified in a portion of infected individuals due possibly to the limited samples size studied or the complicated population structures of the virus. To identify additional high-risk EBV subtypes, we conducted a comprehensive genetic analysis of 22 critical viral proteins by using the largest dataset of 628 EBV genomes and 792 sequences of single target genes/proteins from GenBank. The phylogenetic, principal component and genetic structure analyses of these viral proteins were performed through worldwide populations. In addition to the general Asia-Western/Africa geographic segregation, population structure analysis showed a 'Chinese-unique' cluster (96.57% isolates from China) was highly enriched in the NPC patients, compared to the healthy individuals (89.6% vs. 44.5%, P < 0.001). The newly identified EBV subtypes, which contains four Chinese-specific NPC-associated amino acid substitutions (BALF2 V317M, BNRF1 G696R, V1222I and RPMS1 D51E), showed a robust positive association with the risk of NPC in China (Odds Ratio = 4.80, 20.00, 18.24 and 32.00 for 1, 2, 3 and 4 substitutions, respectively, P trend <0.001). Interestingly, the coincidence of positively selected sites with NPC-associated substitutions suggests that adaptive nonsynonymous mutation on critical proteins, such as BNRF1, may interact with host immune system and contribute to the carcinogenesis of NPC. Our findings provide a comprehensive overview of EBV genetic structure for worldwide populations and offer novel clues to EBV carcinogenesis from the aspect of evolution.
Collapse
Affiliation(s)
- Wen-Qiong Xue
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-Sen University Cancer Center, Guangzhou, Guangdong 510060, China
| | - Tong-Min Wang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-Sen University Cancer Center, Guangzhou, Guangdong 510060, China
| | - Jing-Wen Huang
- School of Public Health, Sun Yat-Sen University, Guangzhou, Guangdong 510080, China
| | - Jiang-Bo Zhang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-Sen University Cancer Center, Guangzhou, Guangdong 510060, China
| | - Yong-Qiao He
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-Sen University Cancer Center, Guangzhou, Guangdong 510060, China
| | - Zi-Yi Wu
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-Sen University Cancer Center, Guangzhou, Guangdong 510060, China
| | - Ying Liao
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-Sen University Cancer Center, Guangzhou, Guangdong 510060, China
| | - Lei-Lei Yuan
- School of Public Health, Sun Yat-Sen University, Guangzhou, Guangdong 510080, China
| | - Jianbing Mu
- Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, NIH, Rockville 20852, MD, USA
| | - Wei-Hua Jia
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-Sen University Cancer Center, Guangzhou, Guangdong 510060, China
- School of Public Health, Sun Yat-Sen University, Guangzhou, Guangdong 510080, China
- Corresponding author: E-mail:
| |
Collapse
|
40
|
Abstract
Identifying our most distant animal relatives has emerged as one of the most challenging problems in phylogenetics. This debate has major implications for our understanding of the origin of multicellular animals and of the earliest events in animal evolution, including the origin of the nervous system. Some analyses identify sponges as our most distant animal relatives (Porifera-sister hypothesis), and others identify comb jellies (Ctenophora-sister hypothesis). These analyses vary in many respects, making it difficult to interpret previous tests of these hypotheses. To gain insight into why different studies yield different results, an important next step in the ongoing debate, we systematically test these hypotheses by synthesizing 15 previous phylogenomic studies and performing new standardized analyses under consistent conditions with additional models. We find that Ctenophora-sister is recovered across the full range of examined conditions, and Porifera-sister is recovered in some analyses under narrow conditions when most outgroups are excluded and site-heterogeneous CAT models are used. We additionally find that the number of categories in site-heterogeneous models is sufficient to explain the Porifera-sister results. Furthermore, our cross-validation analyses show CAT models that recover Porifera-sister have hundreds of additional categories and fail to fit significantly better than site-heterogenuous models with far fewer categories. Systematic and standardized testing of diverse phylogenetic models suggests that we should be skeptical of Porifera-sister results both because they are recovered under such narrow conditions and because the models in these conditions fit the data no better than other models that recover Ctenophora-sister.
Collapse
Affiliation(s)
- Yuanning Li
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Xing-Xing Shen
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Benjamin Evans
- Yale Center for Research Computing, Yale University, New Haven, CT, USA
| | - Casey W Dunn
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
41
|
A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol 2021; 6:946-959. [PMID: 34155373 DOI: 10.1038/s41564-021-00918-8] [Citation(s) in RCA: 177] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 05/10/2021] [Indexed: 02/05/2023]
Abstract
The accrual of genomic data from both cultured and uncultured microorganisms provides new opportunities to develop systematic taxonomies based on evolutionary relationships. Previously, we established a bacterial taxonomy through the Genome Taxonomy Database. Here, we propose a standardized archaeal taxonomy that is derived from a 122-concatenated-protein phylogeny that resolves polyphyletic groups and normalizes ranks based on relative evolutionary divergence. The resulting archaeal taxonomy, which forms part of the Genome Taxonomy Database, is stable for a range of phylogenetic variables including marker gene selection, inference methods, corrections for rate heterogeneity and compositional bias, tree rooting scenarios and expansion of the genome database. Rank normalization is shown to robustly correct for substitution rates varying up to 30-fold using simulated datasets. Taxonomic curation follows the rules of the International Code of Nomenclature of Prokaryotes while taking into account proposals to formally recognize the rank of phylum and to use genome sequences as type material. This taxonomy is based on 2,392 archaeal genomes, 93.3% of which required one or more changes to their existing taxonomy, mainly owing to incomplete classification. We identify 16 archaeal phyla and reclassify 3 major monophyletic units from the former Euryarchaeota and one phylum that unites the Thaumarchaeota-Aigarchaeota-Crenarchaeota-Korarchaeota (TACK) superphylum into a single phylum.
Collapse
|
42
|
Moreira-Filho JT, Silva AC, Dantas RF, Gomes BF, Souza Neto LR, Brandao-Neto J, Owens RJ, Furnham N, Neves BJ, Silva-Junior FP, Andrade CH. Schistosomiasis Drug Discovery in the Era of Automation and Artificial Intelligence. Front Immunol 2021; 12:642383. [PMID: 34135888 PMCID: PMC8203334 DOI: 10.3389/fimmu.2021.642383] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/30/2021] [Indexed: 12/20/2022] Open
Abstract
Schistosomiasis is a parasitic disease caused by trematode worms of the genus Schistosoma and affects over 200 million people worldwide. The control and treatment of this neglected tropical disease is based on a single drug, praziquantel, which raises concerns about the development of drug resistance. This, and the lack of efficacy of praziquantel against juvenile worms, highlights the urgency for new antischistosomal therapies. In this review we focus on innovative approaches to the identification of antischistosomal drug candidates, including the use of automated assays, fragment-based screening, computer-aided and artificial intelligence-based computational methods. We highlight the current developments that may contribute to optimizing research outputs and lead to more effective drugs for this highly prevalent disease, in a more cost-effective drug discovery endeavor.
Collapse
Affiliation(s)
- José T. Moreira-Filho
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Arthur C. Silva
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Rafael F. Dantas
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Barbara F. Gomes
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Lauro R. Souza Neto
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Jose Brandao-Neto
- Diamond Light Source Ltd., Didcot, United Kingdom
- Research Complex at Harwell, Didcot, United Kingdom
| | - Raymond J. Owens
- The Rosalind Franklin Institute, Harwell, United Kingdom
- Division of Structural Biology, The Wellcome Centre for Human Genetic, University of Oxford, Oxford, United Kingdom
| | - Nicholas Furnham
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Bruno J. Neves
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Floriano P. Silva-Junior
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Carolina H. Andrade
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| |
Collapse
|
43
|
Abstract
The estimation of phylogenetic trees for individual genes or multi-locus datasets is a basic part of considerable biological research. In order to enable large trees to be computed, Disjoint Tree Mergers (DTMs) have been developed; these methods operate by dividing the input sequence dataset into disjoint sets, constructing trees on each subset, and then combining the subset trees (using auxiliary information) into a tree on the full dataset. DTMs have been used to advantage for multi-locus species tree estimation, enabling highly accurate species trees at reduced computational effort, compared to leading species tree estimation methods. Here, we evaluate the feasibility of using DTMs to improve the scalability of maximum likelihood (ML) gene tree estimation to large numbers of input sequences. Our study shows distinct differences between the three selected ML codes—RAxML-NG, IQ-TREE 2, and FastTree 2—and shows that good DTM pipeline design can provide advantages over these ML codes on large datasets.
Collapse
|
44
|
Hleap JS, Littlefair JE, Steinke D, Hebert PDN, Cristescu ME. Assessment of current taxonomic assignment strategies for metabarcoding eukaryotes. Mol Ecol Resour 2021; 21:2190-2203. [PMID: 33905615 DOI: 10.1111/1755-0998.13407] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 03/08/2021] [Accepted: 04/19/2021] [Indexed: 01/04/2023]
Abstract
The effective use of metabarcoding in biodiversity science has brought important analytical challenges due to the need to generate accurate taxonomic assignments. The assignment of sequences to genus or species level is critical for biodiversity surveys and biomonitoring, but it is particularly challenging as researchers must select the approach that best recovers information on species composition. This study evaluates the performance and accuracy of seven methods in recovering the species composition of mock communities by using COI barcode fragments. The mock communities varied in species number and specimen abundance, while upstream molecular and bioinformatic variables were held constant, and using a set of COI fragments. We evaluated the impact of parameter optimization on the quality of the predictions. Our results indicate that BLAST top hit competes well with more complex approaches if optimized for the mock community under study. For example, the two machine learning methods that were benchmarked proved more sensitive to reference database heterogeneity and completeness than methods based on sequence similarity. The accuracy of assignments was impacted by both species and specimen counts (query compositional heterogeneity) which ultimately influence the selection of appropriate software. We urge researchers to: (i) use realistic mock communities to allow optimization of parameters, regardless of the taxonomic assignment method employed; (ii) carefully choose and curate the reference databases including completeness; and (iii) use QIIME, BLAST or LCA methods, in conjunction with parameter tuning to better assign taxonomy to diverse communities, especially when information on species diversity is lacking for the area under study.
Collapse
Affiliation(s)
- Jose S Hleap
- Department of Biology, McGill University, Montreal, QC, Canada.,SHARCNET, University of Guelph, Guelph, ON, Canada.,Fundacion SQUALUS, Cali, Colombia
| | - Joanne E Littlefair
- Department of Biology, McGill University, Montreal, QC, Canada.,Queen Mary University of London, London, UK
| | - Dirk Steinke
- Centre for Biodiversity Genomics & Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
| | - Paul D N Hebert
- Centre for Biodiversity Genomics & Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
| | | |
Collapse
|
45
|
Li Y, Steenwyk JL, Chang Y, Wang Y, James TY, Stajich JE, Spatafora JW, Groenewald M, Dunn CW, Hittinger CT, Shen XX, Rokas A. A genome-scale phylogeny of the kingdom Fungi. Curr Biol 2021; 31:1653-1665.e5. [PMID: 33607033 PMCID: PMC8347878 DOI: 10.1016/j.cub.2021.01.074] [Citation(s) in RCA: 126] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 12/10/2020] [Accepted: 01/21/2021] [Indexed: 12/22/2022]
Abstract
Phylogenomic studies using genome-scale amounts of data have greatly improved understanding of the tree of life. Despite the diversity, ecological significance, and biomedical and industrial importance of fungi, evolutionary relationships among several major lineages remain poorly resolved, especially those near the base of the fungal phylogeny. To examine poorly resolved relationships and assess progress toward a genome-scale phylogeny of the fungal kingdom, we compiled a phylogenomic data matrix of 290 genes from the genomes of 1,644 species that includes representatives from most major fungal lineages. We also compiled 11 data matrices by subsampling genes or taxa from the full data matrix based on filtering criteria previously shown to improve phylogenomic inference. Analyses of these 12 data matrices using concatenation- and coalescent-based approaches yielded a robust phylogeny of the fungal kingdom, in which ∼85% of internal branches were congruent across data matrices and approaches used. We found support for several historically poorly resolved relationships as well as evidence for polytomies likely stemming from episodes of ancient diversification. By examining the relative evolutionary divergence of taxonomic groups of equivalent rank, we found that fungal taxonomy is broadly aligned with both genome sequence divergence and divergence time but also identified lineages where current taxonomic circumscription does not reflect their levels of evolutionary divergence. Our results provide a robust phylogenomic framework to explore the tempo and mode of fungal evolution and offer directions for future fungal phylogenetic and taxonomic studies.
Collapse
Affiliation(s)
- Yuanning Li
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Ying Chang
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Yan Wang
- Department of Microbiology and Plant Pathology, Institute for Integrative Genome Biology, University of California, Riverside, CA 92521, USA; Department of Biological Sciences, University of Toronto Scarborough and Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada
| | - Timothy Y James
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jason E Stajich
- Department of Microbiology and Plant Pathology, Institute for Integrative Genome Biology, University of California, Riverside, CA 92521, USA
| | - Joseph W Spatafora
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Marizeth Groenewald
- Westerdijk Fungal Biodiversity Institute, 3584 CT, Utrecht 85167, the Netherlands
| | - Casey W Dunn
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520, USA
| | - Chris Todd Hittinger
- Laboratory of Genetics, Center for Genomic Science Innovation, J.F. Crow Institute for the Study of Evolution, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Xing-Xing Shen
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA.
| |
Collapse
|
46
|
Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol 2021; 37:1530-1534. [PMID: 32011700 PMCID: PMC7182206 DOI: 10.1093/molbev/msaa015] [Citation(s) in RCA: 5226] [Impact Index Per Article: 1742.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.
Collapse
Affiliation(s)
- Bui Quang Minh
- Research School of Computer Science, Australian National University, Canberra, ACT, Australia.,Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Heiko A Schmidt
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Olga Chernomor
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Dominik Schrempf
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria.,Department of Biological Physics, Eötvös Lórand University, Budapest, Hungary
| | - Michael D Woodhams
- Discipline of Mathematics, University of Tasmania, Hobart, TAS, Australia
| | - Arndt von Haeseler
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria.,Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT, Australia
| |
Collapse
|
47
|
Wang BF, Swenson KM. A Faster Algorithm for Computing the Kernel of Maximum Agreement Subtrees. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:416-430. [PMID: 31217125 DOI: 10.1109/tcbb.2019.2922955] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The maximum agreement subtree method determines the consensus of a collection of phylogenetic trees by identifying maximum cardinality subsets of leaves for which all input trees agree. The trees induced by these maximum cardinality subsets are maximum agreement subtrees (MASTs). A single MAST may be misleading, since there can exist two MASTs which share almost no leaves; nevertheless, it may be impossible to inspect all MASTs, since the number of MASTs can be exponential in the number of leaves. To overcome this drawback, Swenson et al. suggested to further summarize the information common to all MASTs by their intersection, which is called the kernel agreement subtree (KAST). The construction of the KAST is the focus of this paper. Swenson et al. had an O(kn3 + n4 + nd + 1) time algorithm for computing the KAST of k trees on n leaves, in which at least one tree has maximum degree d. In this paper, an O(kn3 + nd)-time algorithm is presented. We demonstrate the efficiency of our algorithm on simulated trees as well as on ribosomal RNA alignments, where trees with 13,000 taxa took only hours to process, whereas the previous algorithm did not terminate after a week of computation.
Collapse
|
48
|
Shen XX, Steenwyk JL, Rokas A. Dissecting incongruence between concatenation- and quartet-based approaches in phylogenomic data. Syst Biol 2021; 70:997-1014. [PMID: 33616672 DOI: 10.1093/sysbio/syab011] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 02/10/2021] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open
Abstract
Topological conflict or incongruence is widespread in phylogenomic data. Concatenation- and coalescent-based approaches often result in incongruent topologies, but the causes of this conflict can be difficult to characterize. We examined incongruence stemming from conflict between likelihood-based signal (quantified by the difference in gene-wise log likelihood score or ΔGLS) and quartet-based topological signal (quantified by the difference in gene-wise quartet score or ΔGQS) for every gene in three phylogenomic studies in animals, fungi, and plants, which were chosen because their concatenation-based IQ-TREE (T1) and quartet-based ASTRAL (T2) phylogenies are known to produce eight conflicting internal branches (bipartitions). By comparing the types of phylogenetic signal for all genes in these three data matrices, we found that 30% - 36% of genes in each data matrix are inconsistent, that is, each of these genes has higher log likelihood score for T1 versus T2 (i.e., ΔGLS >0) whereas its T1 topology has lower quartet score than its T2 topology (i.e., ΔGQS <0) or vice versa. Comparison of inconsistent and consistent genes using a variety of metrics (e.g., evolutionary rate, gene tree topology, distribution of branch lengths, hidden paralogy, and gene tree discordance) showed that inconsistent genes are more likely to recover neither T1 nor T2 and have higher levels of gene tree discordance than consistent genes. Simulation analyses demonstrate that removal of inconsistent genes from datasets with low levels of incomplete lineage sorting (ILS) and low and medium levels of gene tree estimation error (GTEE) reduced incongruence and increased accuracy. In contrast, removal of inconsistent genes from datasets with medium and high ILS levels and high GTEE levels eliminated or extensively reduced incongruence, but the resulting congruent species phylogenies were not always topologically identical to the true species trees.
Collapse
Affiliation(s)
- Xing-Xing Shen
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China.,Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
49
|
Minh BQ, Dang CC, Vinh LS, Lanfear R. QMaker: Fast and accurate method to estimate empirical models of protein evolution. Syst Biol 2021; 70:1046-1060. [PMID: 33616668 PMCID: PMC8357343 DOI: 10.1093/sysbio/syab010] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 12/25/2020] [Accepted: 02/10/2021] [Indexed: 11/29/2022] Open
Abstract
Amino acid substitution models play a crucial role in phylogenetic analyses. Maximum likelihood (ML) methods have been proposed to estimate amino acid substitution models; however, they are typically complicated and slow. In this article, we propose QMaker, a new ML method to estimate a general time-reversible \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$Q$\end{document} matrix from a large protein data set consisting of multiple sequence alignments. QMaker combines an efficient ML tree search algorithm, a model selection for handling the model heterogeneity among alignments, and the consideration of rate mixture models among sites. We provide QMaker as a user-friendly function in the IQ-TREE software package (http://www.iqtree.org) supporting the use of multiple CPU cores so that biologists can easily estimate amino acid substitution models from their own protein alignments. We used QMaker to estimate new empirical general amino acid substitution models from the current Pfam database as well as five clade-specific models for mammals, birds, insects, yeasts, and plants. Our results show that the new models considerably improve the fit between model and data and in some cases influence the inference of phylogenetic tree topologies.[Amino acid replacement matrices; amino acid substitution models; maximum likelihood estimation; phylogenetic inferences.]
Collapse
Affiliation(s)
- Bui Quang Minh
- School of Computing, Australian National University, 145 Science Road, Acton, ACT 2601, Canberra, Australia
- Department of Ecology and Evolution, Research School of Biology, Australian National University, 145 Science Road, Acton, ACT 2601, Canberra, Australia
| | - Cuong Cao Dang
- Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, Hanoi, 144 Xuan Thuy, Cau Giay, 10000 Hanoi, Vietnam Bui Quang Minh and Cuong Cao Dang contributed equally to this article
| | - Le Sy Vinh
- Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, Hanoi, 144 Xuan Thuy, Cau Giay, 10000 Hanoi, Vietnam Bui Quang Minh and Cuong Cao Dang contributed equally to this article
- Correspondence to be sent to: University of Engineering and Technology, Vietnam National University, Hanoi, 144 Xuan Thuy, Cau Giay, 10000 Hanoi, Vietnam; E-mail: and Department of Ecology and Evolution, Research School of Biology, Australian National University, 145 Science Road, Acton, ACT 2601, Canberra, Australia; E-mail:
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, 145 Science Road, Acton, ACT 2601, Canberra, Australia
- Correspondence to be sent to: University of Engineering and Technology, Vietnam National University, Hanoi, 144 Xuan Thuy, Cau Giay, 10000 Hanoi, Vietnam; E-mail: and Department of Ecology and Evolution, Research School of Biology, Australian National University, 145 Science Road, Acton, ACT 2601, Canberra, Australia; E-mail:
| |
Collapse
|
50
|
Young JPW, Moeskjær S, Afonin A, Rahi P, Maluk M, James EK, Cavassim MIA, Rashid MHO, Aserse AA, Perry BJ, Wang ET, Velázquez E, Andronov EE, Tampakaki A, Flores Félix JD, Rivas González R, Youseif SH, Lepetit M, Boivin S, Jorrin B, Kenicer GJ, Peix Á, Hynes MF, Ramírez-Bahena MH, Gulati A, Tian CF. Defining the Rhizobium leguminosarum Species Complex. Genes (Basel) 2021; 12:111. [PMID: 33477547 PMCID: PMC7831135 DOI: 10.3390/genes12010111] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 01/08/2021] [Accepted: 01/13/2021] [Indexed: 01/21/2023] Open
Abstract
Bacteria currently included in Rhizobium leguminosarum are too diverse to be considered a single species, so we can refer to this as a species complex (the Rlc). We have found 429 publicly available genome sequences that fall within the Rlc and these show that the Rlc is a distinct entity, well separated from other species in the genus. Its sister taxon is R. anhuiense. We constructed a phylogeny based on concatenated sequences of 120 universal (core) genes, and calculated pairwise average nucleotide identity (ANI) between all genomes. From these analyses, we concluded that the Rlc includes 18 distinct genospecies, plus 7 unique strains that are not placed in these genospecies. Each genospecies is separated by a distinct gap in ANI values, usually at approximately 96% ANI, implying that it is a 'natural' unit. Five of the genospecies include the type strains of named species: R. laguerreae, R. sophorae, R. ruizarguesonis, "R. indicum" and R. leguminosarum itself. The 16S ribosomal RNA sequence is remarkably diverse within the Rlc, but does not distinguish the genospecies. Partial sequences of housekeeping genes, which have frequently been used to characterize isolate collections, can mostly be assigned unambiguously to a genospecies, but alleles within a genospecies do not always form a clade, so single genes are not a reliable guide to the true phylogeny of the strains. We conclude that access to a large number of genome sequences is a powerful tool for characterizing the diversity of bacteria, and that taxonomic conclusions should be based on all available genome sequences, not just those of type strains.
Collapse
Affiliation(s)
| | - Sara Moeskjær
- Department of Molecular Biology and Genetics, Aarhus University, 8000 Aarhus, Denmark;
| | - Alexey Afonin
- Laboratory for Genetics of Plant-Microbe Interactions, ARRIAM, Pushkin, 196608 Saint-Petersburg, Russia;
| | - Praveen Rahi
- National Centre for Microbial Resource, National Centre for Cell Science, Pune 411007, India;
| | - Marta Maluk
- Ecological Sciences, The James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK; (M.M.); (E.K.J.)
| | - Euan K. James
- Ecological Sciences, The James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK; (M.M.); (E.K.J.)
| | - Maria Izabel A. Cavassim
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA;
| | - M. Harun-or Rashid
- Biotechnology Division, Bangladesh Institute of Nuclear Agriculture (BINA), Mymensingh 2202, Bangladesh;
| | - Aregu Amsalu Aserse
- Ecosystems and Environment Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, FI-00014 Helsinki, Finland;
| | - Benjamin J. Perry
- Department of Microbiology and Immunology, University of Otago, Dunedin 9016, New Zealand;
| | - En Tao Wang
- Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad De México 11340, Mexico;
| | - Encarna Velázquez
- Departamento de Microbiología y Genética, Universidad de Salamanca, Instituto Hispanoluso de Investigaciones Agrarias (CIALE), Unidad Asociada Grupo de Interacción planta-microorganismo (Universidad de Salamanca-IRNASA-CSIC), 37007 Salamanca, Spain; (E.V.); (R.R.G.)
| | - Evgeny E. Andronov
- Department of Microbial Monitoring, ARRIAM, Pushkin, 196608 Saint-Petersburg, Russia;
| | - Anastasia Tampakaki
- Department of Crop Science, Agricultural University of Athens, Iera Odos 75, Votanikos, 11855 Athens, Greece;
| | - José David Flores Félix
- CICS-UBI—Health Sciences Research Centre, University of Beira Interior, 6201-506 Covilhã, Portugal;
| | - Raúl Rivas González
- Departamento de Microbiología y Genética, Universidad de Salamanca, Instituto Hispanoluso de Investigaciones Agrarias (CIALE), Unidad Asociada Grupo de Interacción planta-microorganismo (Universidad de Salamanca-IRNASA-CSIC), 37007 Salamanca, Spain; (E.V.); (R.R.G.)
| | - Sameh H. Youseif
- Department of Microbial Genetic Resources, National Gene Bank (NGB), Agricultural Research Center (ARC), Giza 12619, Egypt;
| | - Marc Lepetit
- Institut Sophia Agrobiotech, UMR INRAE 1355, Université Côte d’Azur, CNRS, 06903 Sophia Antipolis, France;
| | - Stéphane Boivin
- Laboratoire des Symbioses Tropicales et Méditerranéennes, UMR INRAE-IRD-CIRAD-UM2-SupAgro, Campus International de Baillarguet, TA-A82/J, CEDEX 05, 34398 Montpellier, France;
| | - Beatriz Jorrin
- Department of Plant Sciences, University of Oxford, Oxford OX1 3RB, UK;
| | - Gregory J. Kenicer
- Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh EH3 5LR, UK;
| | - Álvaro Peix
- Instituto de Recursos Naturales y Agrobiología de Salamanca (IRNASA-CSIC), Unidad Asociada Grupo de Interacción Planta-Microorganismo (Universidad de Salamanca-IRNASA-CSIC), 37008 Salamanca, Spain;
| | - Michael F. Hynes
- Department of Biological Sciences, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4, Canada;
| | - Martha Helena Ramírez-Bahena
- Departamento de Didáctica de las Matemáticas y de las Ciencias Experimentales. Universidad de Salamanca, 37008 Salamanca, Spain;
| | - Arvind Gulati
- Microbial Prospection, CSIR-Institute of Himalayan Bioresource Technology, Palampur (H.P.) 176 061, India;
| | - Chang-Fu Tian
- State Key Laboratory of Agrobiotechnology, Rhizobium Research Center, and College of Biological Sciences, China Agricultural University, Beijing 100193, China;
| |
Collapse
|