1
|
de Vos JM, Streiff SJR, Bachelier JB, Epitawalage N, Maurin O, Forest F, Baker WJ. Phylogenomics of the pantropical Connaraceae: revised infrafamilial classification and the evolution of heterostyly. PLANT SYSTEMATICS AND EVOLUTION = ENTWICKLUNGSGESCHICHTE UND SYSTEMATIK DER PFLANZEN 2024; 310:29. [PMID: 39105137 PMCID: PMC11297820 DOI: 10.1007/s00606-024-01909-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 05/28/2024] [Indexed: 08/07/2024]
Abstract
Connaraceae is a pantropical family of about 200 species containing lianas and small trees with remarkably diverse floral polymorphisms, including distyly, tristyly, homostyly, and dioecy. To date, relationships within the family have not been investigated using a targeted molecular phylogenetic treatment, severely limiting systematic understanding and reconstruction of trait evolution. Accordingly, their last infrafamilial classification was based only on morphological data. Here, we used phylogenomic data obtained using the Angiosperms353 nuclear target sequence capture probes, sampling all tribes and almost all genera, entirely from herbarium specimens, to revise infrafamilial classification and investigate the evolution of heterostyly. The backbone of the resulting molecular phylogenetic tree is almost entirely resolved. Connaraceae consists of two clades, one containing only the African genus Manotes (4 or 5 species), which we newly recognize at the subfamily level. Vegetative and reproductive synapomorphies are proposed for Manotoideae. Within Connaroideae, Connareae is expanded to include the former Jollydoreae. The backbone of Cnestideae, which contains more than half of the Connaraceae species, remains incompletely resolved. Reconstructions of reproductive system evolution are presented that tentatively support tristyly as the ancestral state for the family, with multiple parallel losses, in agreement with previous hypotheses, plus possible re-gains. However, the great diversity of stylar polymorphisms and their phylogenetic lability preclude a definitive answer. Overall, this study reinforces the usefulness of herbarium phylogenomics, and unlocks the reproductive diversity of Connaraceae as a model system for the evolution of complex biological phenomena. Supplementary Information The online version contains supplementary material available at 10.1007/s00606-024-01909-y.
Collapse
Affiliation(s)
- Jurriaan M. de Vos
- Department of Environmental Sciences - Botany, University of Basel, Schönbeinstrasse 6, 4056 Basel, Switzerland
| | - Serafin J. R. Streiff
- Department of Environmental Sciences - Botany, University of Basel, Schönbeinstrasse 6, 4056 Basel, Switzerland
- UMR DIADE, Université de Montpellier, IRD, CIRAD, 911 Avenue Agropolis, 34090 Montpellier, France
| | - Julien B. Bachelier
- Institüt für Biologie/Dahlem Centre of Plant Sciences, Freie Universität Berlin, Altensteinstrasse 6, 14195 Berlin, Germany
| | - Niroshini Epitawalage
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE UK
- The New York Botanical Garden, 2900 Southern Blvd, Bronx, NY 10458 USA
| | - Olivier Maurin
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE UK
| | - Félix Forest
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE UK
| | - William J. Baker
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE UK
- Department of Biology, Aarhus University, Ny Munkegade 116, 8000 Aarhus, Denmark
| |
Collapse
|
2
|
Kharma N, Bédard-Couture R. Robustness and evolvability: Revisited, redefined and applied. Biosystems 2024:105281. [PMID: 39098381 DOI: 10.1016/j.biosystems.2024.105281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 07/27/2024] [Accepted: 07/31/2024] [Indexed: 08/06/2024]
Abstract
Building on and extending existing definitions of robustness and evolvability, we propose and utilize new formal definitions, with matching measures, of robustness and evolvability of systems with genotypes and corresponding phenotypes. We explain and show how these measures are more general and more representative of the concepts they stand for, than the commonly used/referenced measures originally proposed by Wagner. Further, a versatile digital modeling approach (BNK) is proposed that is inspired by NK systems. However, unlike NK systems, BNK incorporates a genotype and a phenotype, in addition to fitness. We develop and apply an Evolutionary Algorithm to a BNK-modeled system to find different types of perfect oscillators. We then map the resulting oscillating systems to possible genetic circuit realizations. Continuing with the synthetic biology theme, we also investigate the effect of noise in DNA synthesis on the predicted functionality of a DNA-based biosensor (i.e., its robustness), and we carry out a theoretical assessment of the evolvability of different types of ribozymes, undergoing directed evolution.
Collapse
Affiliation(s)
- Nawwaf Kharma
- Electrical and Computer Engineering Department, Concordia University, 1455 Blvd. De Maisonneuve Ouest, Montreal, H3G 1M8, Quebec, Canada
| | - Rémi Bédard-Couture
- Département de génie logiciel et des technologies de l'information, École de Technologie Supérieure, 1100 Notre-Dame St W, Montreal, H3C 1K3, Quebec, Canada.
| |
Collapse
|
3
|
Lemos-Costa P, Miller ZR, Allesina S. Phylogeny structures species' interactions in experimental ecological communities. Ecol Lett 2024; 27:e14490. [PMID: 39152685 DOI: 10.1111/ele.14490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 06/24/2024] [Accepted: 07/11/2024] [Indexed: 08/19/2024]
Abstract
Species' traits and interactions are products of evolutionary history. Despite the long-standing hypothesis that closely related species possess similar traits, and thus experience stronger competition, measuring the effect of evolutionary history on the ecology of natural communities remains challenging. We propose a novel framework to test whether phylogeny influences patterns of coexistence and abundance of species assemblages. In our approach, phylogenetic trees are used to parameterize species' interactions, which in turn determine the abundance of species in a given assemblage. We use likelihoods to score models parameterized with a given phylogeny, and contrast them with models built using random trees, allowing us to test whether phylogenetic information helps to predict species' abundances. Our statistical framework reveals that interactions are indeed structured by phylogeny in a large set of experimental plant communities. Our results confirm that evolutionary history can help predict, and potentially manage or conserve, the structure and function of complex ecological communities.
Collapse
Affiliation(s)
- Paula Lemos-Costa
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA
| | - Zachary R Miller
- Department of Earth and Planetary Sciences, Yale University, New Haven, Connecticut, USA
| | - Stefano Allesina
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, Illinois, USA
| |
Collapse
|
4
|
Xiang Z, Liu Z, Dinh KN. Inference of chromosome selection parameters and missegregation rate in cancer from DNA-sequencing data. Sci Rep 2024; 14:17699. [PMID: 39085295 PMCID: PMC11291923 DOI: 10.1038/s41598-024-67842-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 07/16/2024] [Indexed: 08/02/2024] Open
Abstract
Aneuploidy is frequently observed in cancers and has been linked to poor patient outcome. Analysis of aneuploidy in DNA-sequencing (DNA-seq) data necessitates untangling the effects of the Copy Number Aberration (CNA) occurrence rates and the selection coefficients that act upon the resulting karyotypes. We introduce a parameter inference algorithm that takes advantage of both bulk and single-cell DNA-seq cohorts. The method is based on Approximate Bayesian Computation (ABC) and utilizes CINner, our recently introduced simulation algorithm of chromosomal instability in cancer. We examine three groups of statistics to summarize the data in the ABC routine: (A) Copy Number-based measures, (B) phylogeny tip statistics, and (C) phylogeny balance indices. Using these statistics, our method can recover both the CNA probabilities and selection parameters from ground truth data, and performs well even for data cohorts of relatively small sizes. We find that only statistics in groups A and C are well-suited for identifying CNA probabilities, and only group A carries the signals for estimating selection parameters. Moreover, the low number of CNA events at large scale compared to cell counts in single-cell samples means that statistics in group B cannot be estimated accurately using phylogeny reconstruction algorithms at the chromosome level. As data from both bulk and single-cell DNA-sequencing techniques becomes increasingly available, our inference framework promises to facilitate the analysis of distinct cancer types, differentiation between selection and neutral drift, and prediction of cancer clonal dynamics.
Collapse
Affiliation(s)
- Zijin Xiang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY, USA
| | - Zhihan Liu
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY, USA
| | - Khanh N Dinh
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY, USA.
| |
Collapse
|
5
|
Smith MR, Long EJ, Dhungana A, Dobson KJ, Yang J, Zhang X. Organ systems of a Cambrian euarthropod larva. Nature 2024:10.1038/s41586-024-07756-8. [PMID: 39085610 DOI: 10.1038/s41586-024-07756-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 06/26/2024] [Indexed: 08/02/2024]
Abstract
The Cambrian radiation of euarthropods can be attributed to an adaptable body plan. Sophisticated brains and specialized feeding appendages, which are elaborations of serially repeated organ systems and jointed appendages, underpin the dominance of Euarthropoda in a broad suite of ecological settings. The origin of the euarthropod body plan from a grade of vermiform taxa with hydrostatic lobopodous appendages ('lobopodian worms')1,2 is founded on data from Burgess Shale-type fossils. However, the compaction associated with such preservation obscures internal anatomy3-6. Phosphatized microfossils provide a complementary three-dimensional perspective on early crown group euarthropods7, but few lobopodians8,9. Here we describe the internal and external anatomy of a three-dimensionally preserved euarthropod larva with lobopods, midgut glands and a sophisticated head. The architecture of the nervous system informs the early configuration of the euarthropod brain and its associated appendages and sensory organs, clarifying homologies across Panarthropoda. The deep evolutionary position of Youti yuanshi gen. et sp. nov. informs the sequence of character acquisition during arthropod evolution, demonstrating a deep origin of sophisticated haemolymph circulatory systems, and illuminating the internal anatomical changes that propelled the rise and diversification of this enduringly successful group.
Collapse
Affiliation(s)
- Martin R Smith
- Department of Earth Sciences, Durham University, Durham, UK.
| | - Emma J Long
- Department of Earth Sciences, Durham University, Durham, UK
- Science Group, Natural History Museum, London, UK
- Centre for Ecology and Conservation, University of Exeter, Cornwall, UK
| | | | - Katherine J Dobson
- Department of Earth Sciences, Durham University, Durham, UK
- Department of Civil and Environmental Engineering, University of Strathclyde, Glasgow, UK
- Department of Chemical and Process Engineering, University of Strathclyde, Glasgow, UK
| | - Jie Yang
- Institute of Palaeontology, Yunnan University, Chenggong, Kunming, China
| | - Xiguang Zhang
- Institute of Palaeontology, Yunnan University, Chenggong, Kunming, China
| |
Collapse
|
6
|
Rose JP, Kriebel R, Sytsma KJ, Drew BT. Phylogenomic perspectives on speciation and reproductive isolation in a North American biodiversity hotspot: an example using California sages (Salvia subgenus Audibertia: Lamiaceae). ANNALS OF BOTANY 2024; 134:295-310. [PMID: 38733329 PMCID: PMC11232522 DOI: 10.1093/aob/mcae073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 05/07/2024] [Indexed: 05/13/2024]
Abstract
BACKGROUND AND AIMS The California Floristic Province (CA-FP) is the most species-rich region of North America north of Mexico. One of several proposed hypotheses explaining the exceptional diversity of the region is that the CA-FP harbours myriad recently diverged lineages with nascent reproductive barriers. Salvia subgenus Audibertia is a conspicuous element of the CA-FP, with multiple sympatric and compatible species. METHODS Using 305 nuclear loci and both organellar genomes, we reconstruct species trees, examine genomic discordance, conduct divergence-time estimation, and analyse contemporaneous patterns of gene flow and mechanical reproductive isolation. KEY RESULTS Despite strong genomic discordance, an underlying bifurcating tree is supported. Organellar genomes capture additional introgression events not detected in the nuclear genome. Most interfertility is found within clades, indicating that reproductive barriers arise with increasing genetic divergence. Species are generally not mechanically isolated, suggesting that it is unlikely to be the primary factor leading to reproductive isolation. CONCLUSIONS Rapid, recent speciation with some interspecific gene flow in conjunction with the onset of a Mediterranean-like climate is the underlying cause of extant diversity in Salvia subgenus Audibertia. Speciation has largely not been facilitated by gene flow. Its signal in the nuclear genome seems to mostly be erased by backcrossing, but organellar genomes each capture different instances of historical gene flow, probably characteristic of many CA-FP lineages. Mechanical reproductive isolation appears to be only part of a mosaic of factors limiting gene flow.
Collapse
Affiliation(s)
- Jeffrey P Rose
- Department of Biology, University of Nebraska at Kearney, Kearney, NE 68849, USA
- Department of Botany, University of Wisconsin-Madison, 430 Lincoln Drive, Madison, WI 53706, USA
| | - Ricardo Kriebel
- Department of Botany, University of Wisconsin-Madison, 430 Lincoln Drive, Madison, WI 53706, USA
- California Academy of Sciences, San Francisco, CA 94118, USA
| | - Kenneth J Sytsma
- Department of Botany, University of Wisconsin-Madison, 430 Lincoln Drive, Madison, WI 53706, USA
| | - Bryan T Drew
- Department of Biology, University of Nebraska at Kearney, Kearney, NE 68849, USA
| |
Collapse
|
7
|
Naranjo JG, Sither CB, Conant GC. Shared single copy genes are generally reliable for inferring phylogenetic relationships among polyploid taxa. Mol Phylogenet Evol 2024; 196:108087. [PMID: 38677353 DOI: 10.1016/j.ympev.2024.108087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 03/22/2024] [Accepted: 04/24/2024] [Indexed: 04/29/2024]
Abstract
Polyploidy, or whole-genome duplication, is expected to confound the inference of species trees with phylogenetic methods for two reasons. First, the presence of retained duplicated genes requires the reconciliation of the inferred gene trees to a proposed species tree. Second, even if the analyses are restricted to shared single copy genes, the occurrence of reciprocal gene loss, where the surviving genes in different species are paralogs from the polyploidy rather than orthologs, will mean that such genes will not have evolved under the corresponding species tree and may not produce gene trees that allow inference of that species tree. Here we analyze three different ancient polyploidy events, using synteny-based inferences of orthology and paralogy to infer gene trees from nearly 17,000 sets of homologous genes. We find that the simple use of single copy genes from polyploid organisms provides reasonably robust phylogenetic signals, despite the presence of reciprocal gene losses. Such gene trees are also most often in accord with the inferred species relationships inferred from maximum likelihood models of gene loss after polyploidy: a completely distinct phylogenetic signal present in these genomes. As seen in other studies, however, we find that methods for inferring phylogenetic confidence yield high support values even in cases where the underlying data suggest meaningful conflict in the phylogenetic signals.
Collapse
Affiliation(s)
- Jaells G Naranjo
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| | - Charles B Sither
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC, USA
| | - Gavin C Conant
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA; Genetics and Genomics Academy, North Carolina State University, Raleigh, NC, USA; Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
8
|
Ezcurra MD. Exploring the effects of weighting against homoplasy in genealogies of palaeontological phylogenetic matrices. Cladistics 2024; 40:242-281. [PMID: 38728134 DOI: 10.1111/cla.12581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 04/15/2024] [Accepted: 04/16/2024] [Indexed: 05/12/2024] Open
Abstract
Although simulations have shown that implied weighting (IW) outperforms equal weighting (EW) in phylogenetic parsimony analyses, weighting against homoplasy lacks extensive usage in palaeontology. Iterative modifications of several phylogenetic matrices in the last decades resulted in extensive genealogies of datasets that allow the evaluation of differences in the stability of results for alternative character weighting methods directly on empirical data. Each generation was compared against the most recent generation in each genealogy because it is assumed that it is the most comprehensive (higher sampling), revised (fewer misscorings) and complete (lower amount of missing data) matrix of the genealogy. The analyses were conducted on six different genealogies under EW and IW and extended implied weighting (EIW) with a range of concavity constant values (k) between 3 and 30. Pairwise comparisons between trees were conducted using Robinson-Foulds distances normalized by the total number of groups, distortion coefficient, subtree pruning and regrafting moves, and the proportional sum of group dissimilarities. The results consistently show that IW and EIW produce results more similar to those of the last dataset than EW in the vast majority of genealogies and for all comparative measures. This is significant because almost all of these matrices were originally analysed only under EW. Implied weighting and EIW do not outperform each other unambiguously. Euclidean distances based on a principal components analysis of the comparative measures show that different ranges of k-values retrieve the most similar results to the last generation in different genealogies. There is a significant positive linear correlation between the optimal k-values and the number of terminals of the last generations. This could be employed to inform about the range of k-values to be used in phylogenetic analyses based on matrix size but with the caveat that this emergent relationship still relies on a low sample size of genealogies.
Collapse
Affiliation(s)
- Martín D Ezcurra
- Sección Paleontología de Vertebrados, CONICET-Museo Argentino de Ciencias Naturales, Ángel Gallardo 470, C1405DJR, Ciudad Autónoma de Buenos Aires, Argentina
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, B15 2TT, Birmingham, UK
| |
Collapse
|
9
|
Goloboff PA, De Laet J. Farewell to the requirement for character independence: phylogenetic methods to incorporate different types of dependence between characters. Cladistics 2024; 40:209-241. [PMID: 38014464 DOI: 10.1111/cla.12564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 10/15/2023] [Accepted: 10/18/2023] [Indexed: 11/29/2023] Open
Abstract
This paper discusses methods to take into account interactions between characters, in the context of parsimony analysis. These interactions can be in the form of some characters becoming inapplicable given certain states of other, primary characters; in the form of only certain states being allowed in some characters when a given state or set of states occurs for other characters; or in the form of transformation costs in some character being higher or lower when other characters have certain states or transformations between states. Character-state reconstructions and evaluation of trees under the assumption of independence may easily lead to ancestral assignments that violate elementary rules of biomechanics, well-established theories relating form and function or ideas about character co-variation. An obvious example is reconstructing an ancestral bird as wingless and flying at the same time; another is reconstructing a protein-coding gene as having a stop codon in some ancestors. If the characters are optimized independently, such chimeric ancestral reconstructions can occur even when no terminal displays the impossible combination of states. A set of conventions (implemented via new TNT commands and options) allows the definition of complex rules of interaction. By recoding groups of characters with proper step-matrix costs (and excluding impossible combinations from the set of permissible states), it is possible to find the ancestral reconstructions that maximize homology (and thus the degree to which similarities can be explained by common ancestry), within the constraints imposed by the rules specified by the user. We expect that considerations of biomechanics, functional morphology and natural history will be a source of many theories on possible character dependences, and that the present implementation will encourage users to take the possibility of character dependences into account in their phylogenetic analyses.
Collapse
Affiliation(s)
- Pablo A Goloboff
- Unidad Ejecutora Lillo, UEL (CONICET-Fundación Miguel Lillo), Miguel Lillo 251, 4000, S.M. de Tucumán, Argentina
| | - Jan De Laet
- Meise Botanic Garden, Nieuwelaan 38, Meise, Belgium
| |
Collapse
|
10
|
Rick JA, Brock CD, Lewanski AL, Golcher-Benavides J, Wagner CE. Reference Genome Choice and Filtering Thresholds Jointly Influence Phylogenomic Analyses. Syst Biol 2024; 73:76-101. [PMID: 37881861 DOI: 10.1093/sysbio/syad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 09/20/2023] [Accepted: 10/20/2023] [Indexed: 10/27/2023] Open
Abstract
Molecular phylogenies are a cornerstone of modern comparative biology and are commonly employed to investigate a range of biological phenomena, such as diversification rates, patterns in trait evolution, biogeography, and community assembly. Recent work has demonstrated that significant biases may be introduced into downstream phylogenetic analyses from processing genomic data; however, it remains unclear whether there are interactions among bioinformatic parameters or biases introduced through the choice of reference genome for sequence alignment and variant calling. We address these knowledge gaps by employing a combination of simulated and empirical data sets to investigate the extent to which the choice of reference genome in upstream bioinformatic processing of genomic data influences phylogenetic inference, as well as the way that reference genome choice interacts with bioinformatic filtering choices and phylogenetic inference method. We demonstrate that more stringent minor allele filters bias inferred trees away from the true species tree topology, and that these biased trees tend to be more imbalanced and have a higher center of gravity than the true trees. We find the greatest topological accuracy when filtering sites for minor allele count (MAC) >3-4 in our 51-taxa data sets, while tree center of gravity was closest to the true value when filtering for sites with MAC >1-2. In contrast, filtering for missing data increased accuracy in the inferred topologies; however, this effect was small in comparison to the effect of minor allele filters and may be undesirable due to a subsequent mutation spectrum distortion. The bias introduced by these filters differs based on the reference genome used in short read alignment, providing further support that choosing a reference genome for alignment is an important bioinformatic decision with implications for downstream analyses. These results demonstrate that attributes of the study system and dataset (and their interaction) add important nuance for how best to assemble and filter short-read genomic data for phylogenetic inference.
Collapse
Affiliation(s)
- Jessica A Rick
- School of Natural Resources & the Environment, University of Arizona, Tucson, AZ 85719, USA
| | - Chad D Brock
- Department of Biological Sciences, Tarleton State University, Stephenville, TX 76401, USA
| | - Alexander L Lewanski
- Department of Integrative Biology and W.K. Kellogg Biological Station, Michigan State University, East Lansing, MI 48824, USA
| | - Jimena Golcher-Benavides
- Department of Natural Resource Ecology and Management, Iowa State University, Ames, IA 50011, USA
| | - Catherine E Wagner
- Program in Ecology and Evolution, University of Wyoming, Laramie, WY 82071, USA
- Department of Botany, University of Wyoming, Laramie, WY 82071, USA
| |
Collapse
|
11
|
Jensen CG, Sumner JA, Kleinstein SH, Hoehn KB. Inferring B Cell Phylogenies from Paired H and L Chain BCR Sequences with Dowser. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2024; 212:1579-1588. [PMID: 38557795 PMCID: PMC11073909 DOI: 10.4049/jimmunol.2300851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 03/07/2024] [Indexed: 04/04/2024]
Abstract
Abs are vital to human immune responses and are composed of genetically variable H and L chains. These structures are initially expressed as BCRs. BCR diversity is shaped through somatic hypermutation and selection during immune responses. This evolutionary process produces B cell clones, cells that descend from a common ancestor but differ by mutations. Phylogenetic trees inferred from BCR sequences can reconstruct the history of mutations within a clone. Until recently, BCR sequencing technologies separated H and L chains, but advancements in single-cell sequencing now pair H and L chains from individual cells. However, it is unclear how these separate genes should be combined to infer B cell phylogenies. In this study, we investigated strategies for using paired H and L chain sequences to build phylogenetic trees. We found that incorporating L chains significantly improved tree accuracy and reproducibility across all methods tested. This improvement was greater than the difference between tree-building methods and persisted even when mixing bulk and single-cell sequencing data. However, we also found that many phylogenetic methods estimated significantly biased branch lengths when some L chains were missing, such as when mixing single-cell and bulk BCR data. This bias was eliminated using maximum likelihood methods with separate branch lengths for H and L chain gene partitions. Thus, we recommend using maximum likelihood methods with separate H and L chain partitions, especially when mixing data types. We implemented these methods in the R package Dowser: https://dowser.readthedocs.io.
Collapse
Affiliation(s)
- Cole G. Jensen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
| | - Jacob A. Sumner
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
- Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, Connecticut, 06520, USA
| | - Steven H. Kleinstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, USA
- Department of Immunobiology, Yale School of Medicine, New Haven, CT 06520, USA
| | - Kenneth B. Hoehn
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, USA
- Current address: Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| |
Collapse
|
12
|
Tremble K, Henkel T, Bradshaw A, Domnauer C, Brown LM, Thám LX, Furci G, Aime MC, Moncalvo JM, Dentinger B. A revised phylogeny of Boletaceae using whole genome sequences. Mycologia 2024; 116:392-408. [PMID: 38551379 DOI: 10.1080/00275514.2024.2314963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 01/30/2024] [Indexed: 05/01/2024]
Abstract
The porcini mushroom family Boletaceae is a diverse, widespread group of ectomycorrhizal (ECM) mushroom-forming fungi that so far has eluded intrafamilial phylogenetic resolution based on morphology and multilocus data sets. In this study, we present a genome-wide molecular data set of 1764 single-copy gene families from a global sampling of 418 Boletaceae specimens. The resulting phylogenetic analysis has strong statistical support for most branches of the tree, including the first statistically robust backbone. The enigmatic Phylloboletellus chloephorus from non-ECM Argentinian subtropical forests was recovered as a new subfamily sister to the core Boletaceae. Time-calibrated branch lengths estimate that the family first arose in the early to mid-Cretaceous and underwent a rapid radiation in the Eocene, possibly when the ECM nutritional mode arose with the emergence and diversification of ECM angiosperms. Biogeographic reconstructions reveal a complex history of vicariance and episodic long-distance dispersal correlated with historical geologic events, including Gondwanan origins and inferred vicariance associated with its disarticulation. Together, this study represents the most comprehensively sampled, data-rich molecular phylogeny of the Boletaceae to date, establishing a foundation for future robust inferences of biogeography in the group.
Collapse
Affiliation(s)
- Keaton Tremble
- Natural History Museum of Utah and School of Biological Sciences, University of Utah, Salt Lake City, Utah 84108, USA
| | - Terry Henkel
- Department of Biological Sciences, California State Polytechnic University, Humboldt, Arcata 95521, California
| | - Alexander Bradshaw
- Natural History Museum of Utah and School of Biological Sciences, University of Utah, Salt Lake City, Utah 84108, USA
| | - Colin Domnauer
- Natural History Museum of Utah and School of Biological Sciences, University of Utah, Salt Lake City, Utah 84108, USA
| | - Lyda M Brown
- Natural History Museum of Utah and School of Biological Sciences, University of Utah, Salt Lake City, Utah 84108, USA
| | - Lê Xuân Thám
- Laboratory for Computation and Applications in Life Sciences, Institute for Computation Science and Artificial Intelligence, Van Lang University, Ho Chi Minh City 700000, Viet Nam
- Faculty of Applied Technology, School of Technology, Van Lang University, Ho Chi Minh City 700000, Viet Nam
| | | | - M Catherine Aime
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, Indiana 47906, USA
| | - Jean-Marc Moncalvo
- Department of Natural History, Royal Ontario Museum and Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario M5S 2C6, Canada
| | - Bryn Dentinger
- Natural History Museum of Utah and School of Biological Sciences, University of Utah, Salt Lake City, Utah 84108, USA
| |
Collapse
|
13
|
Wagle S, Markin A, Górecki P, Anderson TK, Eulenstein O. Asymmetric Cluster-Based Measures for Comparative Phylogenetics. J Comput Biol 2024; 31:312-327. [PMID: 38634854 PMCID: PMC11057527 DOI: 10.1089/cmb.2023.0338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
Phylogenetic inference and reconstruction methods generate hypotheses on evolutionary history. Competing inference methods are frequently used, and the evaluation of the generated hypotheses is achieved using tree comparison costs. The Robinson-Foulds (RF) distance is a widely used cost to compare the topology of two trees, but this cost is sensitive to tree error and can overestimate tree differences. To overcome this limitation, a refined version of the RF distance called the Cluster Affinity (CA) distance was introduced. However, CA distances are symmetric and cannot compare different types of trees. These asymmetric comparisons occur when gene trees are compared with species trees, when disparate datasets are integrated into a supertree, or when tree comparison measures are used to infer a phylogenetic network. In this study, we introduce a relaxation of the original Affinity distance to compare heterogeneous trees called the asymmetric CA cost. We also develop a biologically interpretable cost, the Cluster Support cost that normalizes by cluster size across gene trees. The characteristics of these costs are similar to the symmetric CA cost. We describe efficient algorithms, derive the exact diameters, and use these to standardize the cost to be applicable in practice. These costs provide objective, fine-scale, and biologically interpretable values that can assess differences and similarities between phylogenetic trees.
Collapse
Affiliation(s)
- Sanket Wagle
- Department of Computer Science, Iowa State University, Ames, Iowa, USA
| | - Alexey Markin
- National Animal Disease Center, USDA-ARS, Ames, Iowa, USA
| | - Paweł Górecki
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | | | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
14
|
Xie O, Morris JM, Hayes AJ, Towers RJ, Jespersen MG, Lees JA, Ben Zakour NL, Berking O, Baines SL, Carter GP, Tonkin-Hill G, Schrieber L, McIntyre L, Lacey JA, James TB, Sriprakash KS, Beatson SA, Hasegawa T, Giffard P, Steer AC, Batzloff MR, Beall BW, Pinho MD, Ramirez M, Bessen DE, Dougan G, Bentley SD, Walker MJ, Currie BJ, Tong SYC, McMillan DJ, Davies MR. Inter-species gene flow drives ongoing evolution of Streptococcus pyogenes and Streptococcus dysgalactiae subsp. equisimilis. Nat Commun 2024; 15:2286. [PMID: 38480728 PMCID: PMC10937727 DOI: 10.1038/s41467-024-46530-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 02/28/2024] [Indexed: 03/17/2024] Open
Abstract
Streptococcus dysgalactiae subsp. equisimilis (SDSE) is an emerging cause of human infection with invasive disease incidence and clinical manifestations comparable to the closely related species, Streptococcus pyogenes. Through systematic genomic analyses of 501 disseminated SDSE strains, we demonstrate extensive overlap between the genomes of SDSE and S. pyogenes. More than 75% of core genes are shared between the two species with one third demonstrating evidence of cross-species recombination. Twenty-five percent of mobile genetic element (MGE) clusters and 16 of 55 SDSE MGE insertion regions were shared across species. Assessing potential cross-protection from leading S. pyogenes vaccine candidates on SDSE, 12/34 preclinical vaccine antigen genes were shown to be present in >99% of isolates of both species. Relevant to possible vaccine evasion, six vaccine candidate genes demonstrated evidence of inter-species recombination. These findings demonstrate previously unappreciated levels of genomic overlap between these closely related pathogens with implications for streptococcal pathobiology, disease surveillance and prevention.
Collapse
Affiliation(s)
- Ouli Xie
- Department of Infectious Diseases, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
- Monash Infectious Diseases, Monash Health, Melbourne, Australia
| | - Jacqueline M Morris
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Andrew J Hayes
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Rebecca J Towers
- Menzies School of Health Research, Charles Darwin University, Darwin, Australia
| | - Magnus G Jespersen
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - John A Lees
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton, Cambridgeshire, UK
| | - Nouri L Ben Zakour
- Australian Infectious Diseases Research Centre and School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Olga Berking
- Australian Infectious Diseases Research Centre and School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Sarah L Baines
- Doherty Applied Microbial Genomics, Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Glen P Carter
- Doherty Applied Microbial Genomics, Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | | | - Layla Schrieber
- Faculty of Veterinary Science, The University of Sydney, Sydney, Australia
| | - Liam McIntyre
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Jake A Lacey
- Department of Infectious Diseases, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Taylah B James
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Kadaba S Sriprakash
- Infection and Inflammation Program, QIMR Berghofer Medical Research Institute, Brisbane, Australia
- School of Science & Technology, University of New England, Armidale, Australia
| | - Scott A Beatson
- Australian Infectious Diseases Research Centre and School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Tadao Hasegawa
- Department of Bacteriology, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan
| | - Phil Giffard
- Menzies School of Health Research, Charles Darwin University, Darwin, Australia
| | - Andrew C Steer
- Tropical Diseases, Murdoch Children's Research Institute, Parkville, Australia
| | - Michael R Batzloff
- Infection and Inflammation Program, QIMR Berghofer Medical Research Institute, Brisbane, Australia
- Institute for Glycomics, Griffith University, Southport, Australia
| | - Bernard W Beall
- Respiratory Disease Branch, National Center for Immunizations and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Marcos D Pinho
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| | - Mario Ramirez
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| | - Debra E Bessen
- Department of Pathology, Microbiology and Immunology, New York Medical College, Valhalla, NY, USA
| | - Gordon Dougan
- Parasites and Microbes, Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Stephen D Bentley
- Parasites and Microbes, Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Mark J Walker
- Australian Infectious Diseases Research Centre and School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Bart J Currie
- Menzies School of Health Research, Charles Darwin University, Darwin, Australia
| | - Steven Y C Tong
- Department of Infectious Diseases, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
- Victorian Infectious Disease Service, The Royal Melbourne Hospital at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - David J McMillan
- School of Science, Technology and Engineering, and Centre for Bioinnovation, University of the Sunshine Coast, Sippy Downs, Australia
| | - Mark R Davies
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia.
| |
Collapse
|
15
|
Kiledal EA, Reitz LA, Kuiper EQ, Evans J, Siddiqui R, Denef VJ, Dick GJ. Comparative genomic analysis of Microcystis strain diversity using conserved marker genes. HARMFUL ALGAE 2024; 132:102580. [PMID: 38331539 DOI: 10.1016/j.hal.2024.102580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 01/08/2024] [Accepted: 01/09/2024] [Indexed: 02/10/2024]
Abstract
Microcystis-dominated cyanobacterial harmful algal blooms (cyanoHABs) have a global impact on freshwater environments, affecting both wildlife and human health. Microcystis diversity and function in field samples and laboratory cultures can be determined by sequencing whole genomes of cultured isolates or natural populations, but these methods remain computationally and financially expensive. Amplicon sequencing of marker genes is a lower cost and higher throughput alternative to characterize strain composition and diversity in mixed samples. However, the selection of appropriate marker gene region(s) and primers requires prior understanding of the relationship between single gene genotype, whole genome content, and phenotype. To identify phylogenetic markers of Microcystis strain diversity, we compared phylogenetic trees built from each of 2,351 individual core genes to an established phylogeny and assessed the ability of these core genes to predict whole genome content and bioactive compound genotypes. We identified single-copy core genes better able to resolve Microcystis phylogenies than previously identified marker genes. We developed primers suitable for current Illumina-based amplicon sequencing with near-complete coverage of available Microcystis genomes and demonstrate that they outperform existing options for assessing Microcystis strain composition. Results showed that genetic markers can be used to infer Microcystis gene content and phenotypes such as potential production of bioactive compounds , although marker performance varies by bioactive compound gene and sequence similarity. Finally, we demonstrate that these markers can be used to characterize the Microcystis strain composition of laboratory or field samples like those collected for surveillance and modeling of Microcystis-dominated cyanobacterial harmful algal blooms.
Collapse
Affiliation(s)
- E Anders Kiledal
- Department of Earth and Environmental Sciences, University of Michigan, 2534 North University Building, 1100 North University Avenue Ave, Rm. 2004, Ann Arbor, MI 48109-1005, USA.
| | - Laura A Reitz
- Department of Earth and Environmental Sciences, University of Michigan, 2534 North University Building, 1100 North University Avenue Ave, Rm. 2004, Ann Arbor, MI 48109-1005, USA
| | - Esmée Q Kuiper
- Department of Earth and Environmental Sciences, University of Michigan, 2534 North University Building, 1100 North University Avenue Ave, Rm. 2004, Ann Arbor, MI 48109-1005, USA
| | - Jacob Evans
- Department of Ecology and Evolutionary Biology, University of Michigan, 2220 Biological Sciences Building, 1105 North University Avenue, Ann Arbor, MI 48109-1005, USA
| | - Ruqaiya Siddiqui
- Microbiome Core, University of Michigan, 1500 MSRB 1, 1150W Medical Center Drive, Ann Arbor, MI 48109-5666, USA
| | - Vincent J Denef
- Department of Ecology and Evolutionary Biology, University of Michigan, 2220 Biological Sciences Building, 1105 North University Avenue, Ann Arbor, MI 48109-1005, USA
| | - Gregory J Dick
- Department of Earth and Environmental Sciences, University of Michigan, 2534 North University Building, 1100 North University Avenue Ave, Rm. 2004, Ann Arbor, MI 48109-1005, USA; Cooperative Institute for Great Lakes Research, University of Michigan, 4040 Dana Building, 440 Church Street, Ann Arbor, MI 48109-1041, USA
| |
Collapse
|
16
|
Ma B, Gong H, Xu Q, Gao Y, Guan A, Wang H, Hua K, Luo R, Jin H. Bases-dependent Rapid Phylogenetic Clustering (Bd-RPC) enables precise and efficient phylogenetic estimation in viruses. Virus Evol 2024; 10:veae005. [PMID: 38361823 PMCID: PMC10868571 DOI: 10.1093/ve/veae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/06/2024] [Accepted: 01/22/2024] [Indexed: 02/17/2024] Open
Abstract
Understanding phylogenetic relationships among species is essential for many biological studies, which call for an accurate phylogenetic tree to understand major evolutionary transitions. The phylogenetic analyses present a major challenge in estimation accuracy and computational efficiency, especially recently facing a wave of severe emerging infectious disease outbreaks. Here, we introduced a novel, efficient framework called Bases-dependent Rapid Phylogenetic Clustering (Bd-RPC) for new sample placement for viruses. In this study, a brand-new recoding method called Frequency Vector Recoding was implemented to approximate the phylogenetic distance, and the Phylogenetic Simulated Annealing Search algorithm was developed to match the recoded distance matrix with the phylogenetic tree. Meanwhile, the indel (insertion/deletion) was heuristically introduced to foreign sequence recognition for the first time. Here, we compared the Bd-RPC with the recent placement software (PAGAN2, EPA-ng, TreeBeST) and evaluated it in Alphacoronavirus, Alphaherpesvirinae, and Betacoronavirus by using Split and Robinson-Foulds distances. The comparisons showed that Bd-RPC maintained the highest precision with great efficiency, demonstrating good performance in new sample placement on all three virus genera. Finally, a user-friendly website (http://www.bd-rpc.xyz) is available for users to classify new samples instantly and facilitate exploration of the phylogenetic research in viruses, and the Bd-RPC is available on GitHub (http://github.com/Bin-Ma/bd-rpc).
Collapse
Affiliation(s)
- Bin Ma
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Huimin Gong
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Qianshuai Xu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Yuan Gao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Aohan Guan
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Haoyu Wang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Kexin Hua
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Rui Luo
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Hui Jin
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| |
Collapse
|
17
|
Wang L, Dong W, Yin Z, Sheng J, Ezeana CF, Yang L, Yu X, Wong SSY, Wan Z, Danforth RL, Han K, Gao D, Wong STC. Charting Single Cell Lineage Dynamics and Mutation Networks via Homing CRISPR. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.05.574236. [PMID: 38260351 PMCID: PMC10802354 DOI: 10.1101/2024.01.05.574236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Single cell lineage tracing, essential for unraveling cellular dynamics in disease evolution is critical for developing targeted therapies. CRISPR-Cas9, known for inducing permanent and cumulative mutations, is a cornerstone in lineage tracing. The novel homing guide RNA (hgRNA) technology enhances this by enabling dynamic retargeting and facilitating ongoing genetic modifications. Charting these mutations, especially through successive hgRNA edits, poses a significant challenge. Our solution, LINEMAP, is a computational framework designed to trace and map these mutations with precision. LINEMAP meticulously discerns mutation alleles at single-cell resolution and maps their complex interrelationships through a mutation evolution network. By utilizing a Markov Process model, we can predict mutation transition probabilities, revealing potential mutational routes and pathways. Our reconstruction algorithm, anchored in the Markov model's attributes, reconstructs cellular lineage pathways, shedding light on the cell's evolutionary journey to the minutiae of single-cell division. Our findings reveal an intricate network of mutation evolution paired with a predictive Markov model, advancing our capability to reconstruct single-cell lineage via hgRNA. This has substantial implications for advancing our understanding of biological mechanisms and propelling medical research forward.
Collapse
Affiliation(s)
- Lin Wang
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Wenjuan Dong
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Zheng Yin
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
- Biostatistics and Bioinformatics Shared Resource, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Jianting Sheng
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Chika F. Ezeana
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Li Yang
- T.T. and W. F. Chao Center for BRAIN, Houston Methodist Research Institute, Houston, Texas 77030
| | - Xiaohui Yu
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | | | - Zhihao Wan
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Rebecca L. Danforth
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Kun Han
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Dingcheng Gao
- Department of Cell & Development Biology, Weill Cornell Medical College, New York, NY 10065
| | - Stephen T. C. Wong
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
- Departments of Radiology, Pathology and Genomic Medicine, Houston Methodist Hospital, Weill Cornell Medical College, Houston, TX 77030
| |
Collapse
|
18
|
Pan X, Li H, Putta P, Zhang X. LinRace: cell division history reconstruction of single cells using paired lineage barcode and gene expression data. Nat Commun 2023; 14:8388. [PMID: 38104156 PMCID: PMC10725445 DOI: 10.1038/s41467-023-44173-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 12/03/2023] [Indexed: 12/19/2023] Open
Abstract
Lineage tracing technology using CRISPR/Cas9 genome editing has enabled simultaneous readouts of gene expressions and lineage barcodes in single cells, which allows for inference of cell lineage and cell types at the whole organism level. While most state-of-the-art methods for lineage reconstruction utilize only the lineage barcode data, methods that incorporate gene expressions are emerging. Effectively incorporating the gene expression data requires a reasonable model of how gene expression data changes along generations of divisions. Here, we present LinRace (Lineage Reconstruction with asymmetric cell division model), which integrates lineage barcode and gene expression data using asymmetric cell division model and infers cell lineages and ancestral cell states using Neighbor-Joining and maximum-likelihood heuristics. On both simulated and real data, LinRace outputs more accurate cell division trees than existing methods. With inferred ancestral states, LinRace can also show how a progenitor cell generates a large population of cells with various functionalities.
Collapse
Affiliation(s)
- Xinhai Pan
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Hechen Li
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Pranav Putta
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Xiuwei Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
| |
Collapse
|
19
|
Morreale DP, St Geme III JW, Planet PJ. Phylogenomic analysis of the understudied Neisseriaceae species reveals a poly- and paraphyletic Kingella genus. Microbiol Spectr 2023; 11:e0312323. [PMID: 37882538 PMCID: PMC10715097 DOI: 10.1128/spectrum.03123-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 09/15/2023] [Indexed: 10/27/2023] Open
Abstract
IMPORTANCE Understanding the evolutionary relationships between the species in the Neisseriaceae family has been a persistent challenge in bacterial systematics due to high recombination rates in these species. Previous studies of this family have focused on Neisseria meningitidis and N. gonorrhoeae. However, previously understudied Neisseriaceae species are gaining new attention, with Kingella kingae now recognized as a common human pathogen and with Alysiella and Simonsiella being unique in the bacterial world as multicellular organisms. A better understanding of the genomic evolution of the Neisseriaceae can lead to the identification of specific genes and traits that underlie the remarkable diversity of this family.
Collapse
Affiliation(s)
- Daniel P. Morreale
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Division of Infectious Diseases, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Joseph W. St Geme III
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Division of Infectious Diseases, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Paul J. Planet
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Division of Infectious Diseases, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
- Comparative Genomics, American Museum of Natural History, New York, New York, USA
| |
Collapse
|
20
|
Li X, Trovão NS, Wertheim JO, Baele G, de Bernardi Schneider A. Optimizing ancestral trait reconstruction of large HIV Subtype C datasets through multiple-trait subsampling. Virus Evol 2023; 9:vead069. [PMID: 38046219 PMCID: PMC10691791 DOI: 10.1093/ve/vead069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/29/2023] [Accepted: 11/20/2023] [Indexed: 12/05/2023] Open
Abstract
Large datasets along with sampling bias represent a challenge for phylodynamic reconstructions, particularly when the study data are obtained from various heterogeneous sources and/or through convenience sampling. In this study, we evaluate the presence of unbalanced sampled distribution by collection date, location, and risk group of human immunodeficiency virus Type 1 Subtype C using a comprehensive subsampling strategy and assess their impact on the reconstruction of the viral spatial and risk group dynamics using phylogenetic comparative methods. Our study shows that a most suitable dataset for ancestral trait reconstruction can be obtained through subsampling by all available traits, particularly using multigene datasets. We also demonstrate that sampling bias is inflated when considerable information for a given trait is unavailable or of poor quality, as we observed for the trait risk group. In conclusion, we suggest that, even if traits are not well recorded, including them deliberately optimizes the representativeness of the original dataset rather than completely excluding them. Therefore, we advise the inclusion of as many traits as possible with the aid of subsampling approaches in order to optimize the dataset for phylodynamic analysis while reducing the computational burden. This will benefit research communities investigating the evolutionary and spatio-temporal patterns of infectious diseases.
Collapse
Affiliation(s)
| | - Nídia S Trovão
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, 31 Center Dr, Bethesda, MA 20892, USA
| | - Joel O Wertheim
- Department of Medicine, University of California, La Jolla, San Diego, CA 92093, USA
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven BE-3000, Belgium
| | - Adriano de Bernardi Schneider
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Ningbo No.2 Hospital, Ningbo 315010, China
- Ningbo Institute of Life and Health Industry, University of Chinese Academy of Sciences, Ningbo 315000, China
| |
Collapse
|
21
|
Hendriks KP, Kiefer C, Al-Shehbaz IA, Bailey CD, Hooft van Huysduynen A, Nikolov LA, Nauheimer L, Zuntini AR, German DA, Franzke A, Koch MA, Lysak MA, Toro-Núñez Ó, Özüdoğru B, Invernón VR, Walden N, Maurin O, Hay NM, Shushkov P, Mandáková T, Schranz ME, Thulin M, Windham MD, Rešetnik I, Španiel S, Ly E, Pires JC, Harkess A, Neuffer B, Vogt R, Bräuchler C, Rainer H, Janssens SB, Schmull M, Forrest A, Guggisberg A, Zmarzty S, Lepschi BJ, Scarlett N, Stauffer FW, Schönberger I, Heenan P, Baker WJ, Forest F, Mummenhoff K, Lens F. Global Brassicaceae phylogeny based on filtering of 1,000-gene dataset. Curr Biol 2023; 33:4052-4068.e6. [PMID: 37659415 DOI: 10.1016/j.cub.2023.08.026] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 06/22/2023] [Accepted: 08/08/2023] [Indexed: 09/04/2023]
Abstract
The mustard family (Brassicaceae) is a scientifically and economically important family, containing the model plant Arabidopsis thaliana and numerous crop species that feed billions worldwide. Despite its relevance, most phylogenetic trees of the family are incompletely sampled and often contain poorly supported branches. Here, we present the most complete Brassicaceae genus-level family phylogenies to date (Brassicaceae Tree of Life or BrassiToL) based on nuclear (1,081 genes, 319 of the 349 genera; 57 of the 58 tribes) and plastome (60 genes, 265 genera; all tribes) data. We found cytonuclear discordance between the two, which is likely a result of rampant hybridization among closely and more distantly related lineages. To evaluate the impact of such hybridization on the nuclear phylogeny reconstruction, we performed five different gene sampling routines, which increasingly removed putatively paralog genes. Our cleaned subset of 297 genes revealed high support for the tribes, whereas support for the main lineages (supertribes) was moderate. Calibration based on the 20 most clock-like nuclear genes suggests a late Eocene to late Oligocene origin of the family. Finally, our results strongly support a recently published new family classification, dividing the family into two subfamilies (one with five supertribes), together representing 58 tribes. This includes five recently described or re-established tribes, including Arabidopsideae, a monogeneric tribe accommodating Arabidopsis without any close relatives. With a worldwide community of thousands of researchers working on Brassicaceae and its diverse members, our new genus-level family phylogeny will be an indispensable tool for studies on biodiversity and plant biology.
Collapse
Affiliation(s)
- Kasper P Hendriks
- Department of Biology, Botany, University of Osnabrück, Barbarastraße 11, 49076 Osnabrück, Germany; Functional Traits Group, Naturalis Biodiversity Center, Darwinweg 2, 2333 CR Leiden, the Netherlands.
| | - Christiane Kiefer
- Centre for Organismal Studies (COS), Heidelberg University, Im Neuenheimer Feld 345, 69120 Heidelberg, Germany
| | | | - C Donovan Bailey
- Department of Biology, New Mexico State University, PO Box 30001, MSC 3AF, Las Cruces, NM 88003, USA
| | - Alex Hooft van Huysduynen
- Functional Traits Group, Naturalis Biodiversity Center, Darwinweg 2, 2333 CR Leiden, the Netherlands; Department of Biology, University of Antwerp, Groenenborgerlaan 171, 2020 Antwerp, Belgium
| | - Lachezar A Nikolov
- Department of Molecular, Cell and Developmental Biology, University of California, 610 Charles E. Young Dr. S., Los Angeles, CA 90095, USA
| | - Lars Nauheimer
- Australian Tropical Herbarium, James Cook University, PO Box 6811, Cairns, QLD 4870, Australia
| | | | - Dmitry A German
- South-Siberian Botanical Garden, Altai State University, Barnaul, Lesosechnaya Ulitsa, 25, Barnaul, Altai Krai, Russia
| | - Andreas Franzke
- Heidelberg Botanic Garden, Heidelberg University, Im Neuenheimer Feld 361, 69120 Heidelberg, Germany
| | - Marcus A Koch
- Centre for Organismal Studies (COS), Heidelberg University, Im Neuenheimer Feld 345, 69120 Heidelberg, Germany
| | - Martin A Lysak
- CEITEC-Central European Institute of Technology, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
| | - Óscar Toro-Núñez
- Departamento de Botánica, Universidad de Concepción, Barrio Universitario, Concepción, Chile
| | - Barış Özüdoğru
- Department of Biology, Hacettepe University, Beytepe, Ankara 06800, Türkiye
| | - Vanessa R Invernón
- Sorbonne Université, Muséum National d'Histoire Naturelle, Institut de Systématique, Évolution, Biodiversité (ISYEB), CP 39, 57 rue Cuvier, 75231 Paris Cedex 05, France
| | - Nora Walden
- Centre for Organismal Studies (COS), Heidelberg University, Im Neuenheimer Feld 345, 69120 Heidelberg, Germany
| | - Olivier Maurin
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK
| | - Nikolai M Hay
- Department of Biology, Duke University, Durham, NC 27708, USA
| | - Philip Shushkov
- Department of Chemistry, Indiana University, 800 E. Kirkwood Ave., Bloomington, IN 47405, USA
| | - Terezie Mandáková
- CEITEC-Central European Institute of Technology, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
| | - M Eric Schranz
- Biosystematics Group, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
| | - Mats Thulin
- Department of Organismal Biology, Uppsala University, Norbyvägen 18, 752 36 Uppsala, Sweden
| | | | - Ivana Rešetnik
- Department of Biology, University of Zagreb, Marulićev trg 20/II, 10000 Zagreb, Croatia
| | - Stanislav Španiel
- Institute of Botany, Slovak Academy of Sciences, Plant Science and Biodiversity Centre, Dúbravská cesta 9, 845 23 Bratislava, Slovakia
| | - Elfy Ly
- Functional Traits Group, Naturalis Biodiversity Center, Darwinweg 2, 2333 CR Leiden, the Netherlands; Wetsus, European Centre of Excellence for Sustainable Water Technology, Oostergoweg 9, 8911 MA Leeuwarden, the Netherlands; Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, the Netherlands
| | - J Chris Pires
- Soil and Crop Sciences, Colorado State University, 307 University Ave., Fort Collins, CO 80523-1170, USA
| | - Alex Harkess
- HudsonAlpha Institute for Biotechnology, 601 Genome Way Northwest, Huntsville, AL 35806, USA
| | - Barbara Neuffer
- Department of Biology, Botany, University of Osnabrück, Barbarastraße 11, 49076 Osnabrück, Germany
| | - Robert Vogt
- Botanischer Garten und Botanisches Museum, Freie Universität Berlin, Königin-Luise-Straße 6-8, 14195 Berlin, Germany
| | - Christian Bräuchler
- Department of Botany, Natural History Museum Vienna, Burgring 7, 1010 Vienna, Austria
| | - Heimo Rainer
- Department of Botany, Natural History Museum Vienna, Burgring 7, 1010 Vienna, Austria
| | - Steven B Janssens
- Department of Biology, KU Leuven, Kasteelpark Arenberg 31 - box 2435, 3001 Leuven, Belgium; Meise Botanic Garden, Nieuwelaan 38, 1860 Meise, Belgium
| | - Michaela Schmull
- Harvard University Herbaria, 22 Divinity Ave., Cambridge, MA 02138, USA
| | - Alan Forrest
- Centre for Middle Eastern Plants, Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh EH3 5LR, UK
| | - Alessia Guggisberg
- ETH Zürich, Institut für Integrative Biologie, Universitätstrasse 16, 8092 Zürich, Switzerland
| | - Sue Zmarzty
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK
| | - Brendan J Lepschi
- Australian National Herbarium, Centre for Australian National Biodiversity Research, Clunies Ross St, Acton, ACT 2601, Australia
| | - Neville Scarlett
- La Trobe University, Plenty Road and Kingsbury Dr., Bundoora, VIC 3086, Australia
| | - Fred W Stauffer
- Conservatory and Botanic Gardens of Geneva, CP 60, Chambésy, 1292 Geneva, Switzerland
| | - Ines Schönberger
- Manaaki Whenua Landcare Research, Allan Herbarium, PO Box 69040, Lincoln, New Zealand
| | - Peter Heenan
- Manaaki Whenua Landcare Research, Allan Herbarium, PO Box 69040, Lincoln, New Zealand
| | | | - Félix Forest
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK
| | - Klaus Mummenhoff
- Department of Biology, Botany, University of Osnabrück, Barbarastraße 11, 49076 Osnabrück, Germany.
| | - Frederic Lens
- Functional Traits Group, Naturalis Biodiversity Center, Darwinweg 2, 2333 CR Leiden, the Netherlands; Institute of Biology Leiden, Plant Sciences, Leiden University, Sylviusweg 72, 2333 BE Leiden, the Netherlands.
| |
Collapse
|
22
|
Jensen CG, Sumner JA, Kleinstein SH, Hoehn KB. Inferring B cell phylogenies from paired heavy and light chain BCR sequences with Dowser. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.29.560187. [PMID: 37873135 PMCID: PMC10592837 DOI: 10.1101/2023.09.29.560187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Antibodies are vital to human immune responses and are composed of genetically variable heavy and light chains. These structures are initially expressed as B cell receptors (BCRs). BCR diversity is shaped through somatic hypermutation and selection during immune responses. This evolutionary process produces B cell clones, cells that descend from a common ancestor but differ by mutations. Phylogenetic trees inferred from BCR sequences can reconstruct the history of mutations within a clone. Until recently, BCR sequencing technologies separated heavy and light chains, but advancements in single cell sequencing now pair heavy and light chains from individual cells. However, it is unclear how these separate genes should be combined to infer B cell phylogenies. In this study, we investigated strategies for using paired heavy and light chain sequences to build phylogenetic trees. We found incorporating light chains significantly improved tree accuracy and reproducibility across all methods tested. This improvement was greater than the difference between tree building methods and persisted even when mixing bulk and single cell sequencing data. However, we also found that many phylogenetic methods estimated significantly biased branch lengths when some light chains were missing, such as when mixing single cell and bulk BCR data. This bias was eliminated using maximum likelihood methods with separate branch lengths for heavy and light chain gene partitions. Thus, we recommend using maximum likelihood methods with separate heavy and light chain partitions, especially when mixing data types. We implemented these methods in the R package Dowser: https://dowser.readthedocs.io.
Collapse
Affiliation(s)
- Cole G. Jensen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
| | - Jacob A. Sumner
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
- Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, Connecticut, 06520, USA
| | - Steven H. Kleinstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, USA
- Department of Immunobiology, Yale School of Medicine, New Haven, CT 06520, USA
| | - Kenneth B. Hoehn
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, USA
- Current address: Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| |
Collapse
|
23
|
Simmons MP, Goloboff PA, Stöver BC, Springer MS, Gatesy J. Quantification of congruence among gene trees with polytomies using overall success of resolution for phylogenomic coalescent analyses. Cladistics 2023; 39:418-436. [PMID: 37096985 DOI: 10.1111/cla.12540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/22/2023] [Accepted: 03/24/2023] [Indexed: 04/26/2023] Open
Abstract
Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (congsort), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson-Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO, 80523, USA
| | - Pablo A Goloboff
- CONICET, INSUE, Fundación Miguel Lillo, Miguel Lillo 251, 4000, S.M. de Tucumán, Argentina
| | - Ben C Stöver
- Institute for Evolution and Biodiversity, WMU Münster, 48149, Münster, Germany
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA, 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, 10024, USA
| |
Collapse
|
24
|
Prusokiene A, Prusokas A, Retkute R. Machine learning based lineage tree reconstruction improved with knowledge of higher level relationships between cells and genomic barcodes. NAR Genom Bioinform 2023; 5:lqad077. [PMID: 37608801 PMCID: PMC10440785 DOI: 10.1093/nargab/lqad077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 06/26/2023] [Accepted: 08/11/2023] [Indexed: 08/24/2023] Open
Abstract
Tracking cells as they divide and progress through differentiation is a fundamental step in understanding many biological processes, such as the development of organisms and progression of diseases. In this study, we investigate a machine learning approach to reconstruct lineage trees in experimental systems based on mutating synthetic genomic barcodes. We refine previously proposed methodology by embedding information of higher level relationships between cells and single-cell barcode values into a feature space. We test performance of the algorithm on shallow trees (up to 100 cells) and deep trees (up to 10 000 cells). Our proposed algorithm can improve tree reconstruction accuracy in comparison to reconstructions based on a maximum parsimony method, but this comes at a higher computational time requirement.
Collapse
Affiliation(s)
- Alisa Prusokiene
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK
| | | | - Renata Retkute
- Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge CB2 3EA, UK
| |
Collapse
|
25
|
Baldwin E, McNair M, Leebens-Mack J. Rampant chloroplast capture in Sarracenia revealed by plastome phylogeny. FRONTIERS IN PLANT SCIENCE 2023; 14:1237749. [PMID: 37711293 PMCID: PMC10497973 DOI: 10.3389/fpls.2023.1237749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 07/20/2023] [Indexed: 09/16/2023]
Abstract
Introgression can produce novel genetic variation in organisms that hybridize. Sympatric species pairs in the carnivorous plant genus Sarracenia L. frequently hybridize, and all known hybrids are fertile. Despite being a desirable system for studying the evolutionary consequences of hybridization, the extent to which introgression occurs in the genus is limited to a few species in only two field sites. Previous phylogenomic analysis of Sarracenia estimated a highly resolved species tree from 199 nuclear genes, but revealed a plastid genome that is highly discordant with the species tree. Such cytonuclear discordance could be caused by chloroplast introgression (i.e. chloroplast capture) or incomplete lineage sorting (ILS). To better understand the extent to which introgression is occurring in Sarracenia, the chloroplast capture and ILS hypotheses were formally evaluated. Plastomes were assembled de-novo from sequencing reads generated from 17 individuals in addition to reads obtained from the previous study. Assemblies of 14 whole plastomes were generated and annotated, and the remaining fragmented assemblies were scaffolded to these whole-plastome assemblies. Coding sequence from 79 homologous genes were aligned and concatenated for maximum-likelihood phylogeny estimation. The plastome tree is extremely discordant with the published species tree. Plastome trees were simulated under the coalescent and tree distance from the species tree was calculated to generate a null distribution of discordance that is expected under ILS alone. A t-test rejected the null hypothesis that ILS could cause the level of discordance seen in the plastome tree, suggesting that chloroplast capture must be invoked to explain the discordance. Due to the extreme level of discordance in the plastome tree, it is likely that chloroplast capture has been common in the evolutionary history of Sarracenia.
Collapse
Affiliation(s)
- Ethan Baldwin
- Department of Plant Biology, University of Georgia, Athens, GA, United States
| | - Mason McNair
- Department of Plant & Environmental Science, Clemson University, Florence, SC, United States
| | - Jim Leebens-Mack
- Department of Plant Biology, University of Georgia, Athens, GA, United States
| |
Collapse
|
26
|
Struck TH, Golombek A, Hoesel C, Dimitrov D, Elgetany AH. Mitochondrial Genome Evolution in Annelida-A Systematic Study on Conservative and Variable Gene Orders and the Factors Influencing its Evolution. Syst Biol 2023; 72:925-945. [PMID: 37083277 PMCID: PMC10405356 DOI: 10.1093/sysbio/syad023] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/15/2023] [Accepted: 04/18/2023] [Indexed: 04/22/2023] Open
Abstract
The mitochondrial genomes of Bilateria are relatively conserved in their protein-coding, rRNA, and tRNA gene complement, but the order of these genes can range from very conserved to very variable depending on the taxon. The supposedly conserved gene order of Annelida has been used to support the placement of some taxa within Annelida. Recently, authors have cast doubts on the conserved nature of the annelid gene order. Various factors may influence gene order variability including, among others, increased substitution rates, base composition differences, structure of noncoding regions, parasitism, living in extreme habitats, short generation times, and biomineralization. However, these analyses were neither done systematically nor based on well-established reference trees. Several focused on only a few of these factors and biological factors were usually explored ad-hoc without rigorous testing or correlation analyses. Herein, we investigated the variability and evolution of the annelid gene order and the factors that potentially influenced its evolution, using a comprehensive and systematic approach. The analyses were based on 170 genomes, including 33 previously unrepresented species. Our analyses included 706 different molecular properties, 20 life-history and ecological traits, and a reference tree corresponding to recent improvements concerning the annelid tree. The results showed that the gene order with and without tRNAs is generally conserved. However, individual taxa exhibit higher degrees of variability. None of the analyzed life-history and ecological traits explained the observed variability across mitochondrial gene orders. In contrast, the combination and interaction of the best-predicting factors for substitution rate and base composition explained up to 30% of the observed variability. Accordingly, correlation analyses of different molecular properties of the mitochondrial genomes showed an intricate network of direct and indirect correlations between the different molecular factors. Hence, gene order evolution seems to be driven by molecular evolutionary aspects rather than by life history or ecology. On the other hand, variability of the gene order does not predict if a taxon is difficult to place in molecular phylogenetic reconstructions using sequence data or not. We also discuss the molecular properties of annelid mitochondrial genomes considering canonical views on gene evolution and potential reasons why the canonical views do not always fit to the observed patterns without making some adjustments. [Annelida; compositional biases; ecology; gene order; life history; macroevolution; mitochondrial genomes; substitution rates.].
Collapse
Affiliation(s)
- Torsten H Struck
- Natural History Museum, University of Oslo, P.O. Box 1172, Blindern, 0318 Oslo, Norway
- Centre of Molecular Biodiversity Research, Zoological Research Museum Alexander KoenigBonn 53113, Germany
- FB05 Biology/Chemistry; University of Osnabrück, Osnabrück 49069, Germany
| | - Anja Golombek
- Centre of Molecular Biodiversity Research, Zoological Research Museum Alexander KoenigBonn 53113, Germany
- FB05 Biology/Chemistry; University of Osnabrück, Osnabrück 49069, Germany
| | - Christoph Hoesel
- FB05 Biology/Chemistry; University of Osnabrück, Osnabrück 49069, Germany
| | - Dimitar Dimitrov
- Department of Natural History, University Museum of Bergen, University of Bergen, P.O. Box 7800, 5020 Bergen, Norway
| | - Asmaa Haris Elgetany
- Natural History Museum, University of Oslo, P.O. Box 1172, Blindern, 0318 Oslo, Norway
- Zoology Department, Faculty of Science, Damietta University, New Damietta, Central zone, 34517, Egypt
| |
Collapse
|
27
|
Guerrini V, Conte A, Grossi R, Liti G, Rosone G, Tattini L. phyBWT2: phylogeny reconstruction via eBWT positional clustering. Algorithms Mol Biol 2023; 18:11. [PMID: 37537624 PMCID: PMC10399073 DOI: 10.1186/s13015-023-00232-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 06/10/2023] [Indexed: 08/05/2023] Open
Abstract
BACKGROUND Molecular phylogenetics studies the evolutionary relationships among the individuals of a population through their biological sequences. It may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. A key task is inferring phylogenetic trees from any type of sequencing data, including raw short reads. Yet, several tools require pre-processed input data e.g. from complex computational pipelines based on de novo assembly or from mappings against a reference genome. As sequencing technologies keep becoming cheaper, this puts increasing pressure on designing methods that perform analysis directly on their outputs. From this viewpoint, there is a growing interest in alignment-, assembly-, and reference-free methods that could work on several data including raw reads data. RESULTS We present phyBWT2, a newly improved version of phyBWT (Guerrini et al. in 22nd International Workshop on Algorithms in Bioinformatics (WABI) 242:23-12319, 2022). Both of them directly reconstruct phylogenetic trees bypassing both the alignment against a reference genome and de novo assembly. They exploit the combinatorial properties of the extended Burrows-Wheeler Transform (eBWT) and the corresponding eBWT positional clustering framework to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori). As a result, they provide novel alignment-, assembly-, and reference-free methods that build partition trees without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. In addition, phyBWT2 outperforms phyBWT in terms of running time, as the former reconstructs phylogenetic trees step-by-step by considering multiple partitions, instead of just one partition at a time, as previously done by the latter. CONCLUSIONS Based on the results of the experiments on sequencing data, we conclude that our method can produce trees of quality comparable to the benchmark phylogeny by handling datasets of different types (short reads, contigs, or entire genomes). Overall, the experiments confirm the effectiveness of phyBWT2 that improves the performance of its previous version phyBWT, while preserving the accuracy of the results.
Collapse
Affiliation(s)
| | - Alessio Conte
- Dipartimento di Informatica, University of Pisa, Pisa, Italy.
| | - Roberto Grossi
- Dipartimento di Informatica, University of Pisa, Pisa, Italy.
| | - Gianni Liti
- CNRS UMR 7284, INSERM U1081 Université Côte d'Azu, Nice, France
| | - Giovanna Rosone
- Dipartimento di Informatica, University of Pisa, Pisa, Italy.
| | - Lorenzo Tattini
- CNRS UMR 7284, INSERM U1081 Université Côte d'Azu, Nice, France
| |
Collapse
|
28
|
Weisbecker V, Beck RMD, Guillerme T, Harrington AR, Lange-Hodgson L, Lee MSY, Mardon K, Phillips MJ. Multiple modes of inference reveal less phylogenetic signal in marsupial basicranial shape compared with the rest of the cranium. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220085. [PMID: 37183893 PMCID: PMC10184248 DOI: 10.1098/rstb.2022.0085] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 12/17/2022] [Indexed: 05/16/2023] Open
Abstract
Incorporating morphological data into modern phylogenies allows integration of fossil evidence, facilitating divergence dating and macroevolutionary inferences. Improvements in the phylogenetic utility of morphological data have been sought via Procrustes-based geometric morphometrics (GMM), but with mixed success and little clarity over what anatomical areas are most suitable. Here, we assess GMM-based phylogenetic reconstructions in a heavily sampled source of discrete characters for mammalian phylogenetics-the basicranium-in 57 species of marsupial mammals, compared with the remainder of the cranium. We show less phylogenetic signal in the basicranium compared with a 'Rest of Cranium' partition, using diverse metrics of phylogenetic signal (Kmult, phylogenetically aligned principal components analysis, comparisons of UPGMA/neighbour-joining/parsimony trees and cophenetic distances to a reference phylogeny) for scaled, Procrustes-aligned landmarks and allometry-corrected residuals. Surprisingly, a similar pattern emerged from parsimony-based analyses of discrete cranial characters. The consistent results across methods suggest that easily computed metrics such as Kmult can provide good guidance on phylogenetic information in a landmarking configuration. In addition, GMM data may be less informative for intricate but conservative anatomical regions such as the basicranium, while better-but not necessarily novel-phylogenetic information can be expected for broadly characterized shapes such as entire bones. This article is part of the theme issue 'The mammalian skull: development, structure and function'.
Collapse
Affiliation(s)
- Vera Weisbecker
- College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Robin M. D. Beck
- School of Science, Engineering and Environment, University of Salford, Salford, M5 4WT, UK
| | - Thomas Guillerme
- School of Biosciences, University of Sheffield, Sheffield, S10 2TN, UK
| | | | - Leonie Lange-Hodgson
- School of Biological Sciences, University of Queensland, Saint Lucia, Queensland, 4072, Australia
| | - Michael S. Y. Lee
- College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
- Earth Sciences Section, South Australian Museum, Adelaide, South Australia, 5000 Australia
| | - Karine Mardon
- Centre of Advanced Imaging, University of Queensland, Saint Lucia, Queensland, 4072, Australia
| | - Matthew J. Phillips
- School of Biology & Environmental Science, Queensland University of Technology, Brisbane, Queensland, 4000, Australia
| |
Collapse
|
29
|
Arora J, Buček A, Hellemans S, Beránková T, Arias JR, Fisher BL, Clitheroe C, Brune A, Kinjo Y, Šobotník J, Bourguignon T. Evidence of cospeciation between termites and their gut bacteria on a geological time scale. Proc Biol Sci 2023; 290:20230619. [PMID: 37339742 DOI: 10.1098/rspb.2023.0619] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 05/24/2023] [Indexed: 06/22/2023] Open
Abstract
Termites host diverse communities of gut microbes, including many bacterial lineages only found in this habitat. The bacteria endemic to termite guts are transmitted via two routes: a vertical route from parent colonies to daughter colonies and a horizontal route between colonies sometimes belonging to different termite species. The relative importance of both transmission routes in shaping the gut microbiota of termites remains unknown. Using bacterial marker genes derived from the gut metagenomes of 197 termites and one Cryptocercus cockroach, we show that bacteria endemic to termite guts are mostly transferred vertically. We identified 18 lineages of gut bacteria showing cophylogenetic patterns with termites over tens of millions of years. Horizontal transfer rates estimated for 16 bacterial lineages were within the range of those estimated for 15 mitochondrial genes, suggesting that horizontal transfers are uncommon and vertical transfers are the dominant transmission route in these lineages. Some of these associations probably date back more than 150 million years and are an order of magnitude older than the cophylogenetic patterns between mammalian hosts and their gut bacteria. Our results suggest that termites have cospeciated with their gut bacteria since first appearing in the geological record.
Collapse
Affiliation(s)
- Jigyasa Arora
- Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
| | - Aleš Buček
- Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
- Faculty of Tropical AgriScience, Czech University of Life Sciences, Kamýcká 129, Suchdol, 165 00, Prague 6, Czech Republic
| | - Simon Hellemans
- Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
| | - Tereza Beránková
- Faculty of Tropical AgriScience, Czech University of Life Sciences, Kamýcká 129, Suchdol, 165 00, Prague 6, Czech Republic
| | - Johanna Romero Arias
- Faculty of Tropical AgriScience, Czech University of Life Sciences, Kamýcká 129, Suchdol, 165 00, Prague 6, Czech Republic
| | - Brian L Fisher
- Madagascar Biodiversity Center, Parc Botanique et Zoologique de Tsimbazaza, Antananarivo 101, Madagascar
- California Academy of Sciences, San Francisco, CA, USA
| | - Crystal Clitheroe
- Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
| | - Andreas Brune
- Research Group Insect Gut Microbiology and Symbiosis, Max Planck Institute for Terrestrial Microbiology, Marburg, 35043, Germany
| | - Yukihiro Kinjo
- Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
- College of Economics and Environmental Policy, Okinawa International University, 2-6-1 Ginowan, Ginowan, 901-2701, Okinawa, Japan
| | - Jan Šobotník
- Faculty of Tropical AgriScience, Czech University of Life Sciences, Kamýcká 129, Suchdol, 165 00, Prague 6, Czech Republic
- College of Economics and Environmental Policy, Okinawa International University, 2-6-1 Ginowan, Ginowan, 901-2701, Okinawa, Japan
| | - Thomas Bourguignon
- Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
- Faculty of Tropical AgriScience, Czech University of Life Sciences, Kamýcká 129, Suchdol, 165 00, Prague 6, Czech Republic
| |
Collapse
|
30
|
Simões TR, Vernygora OV, de Medeiros BAS, Wright AM. Handling Logical Character Dependency in Phylogenetic Inference: Extensive Performance Testing of Assumptions and Solutions Using Simulated and Empirical Data. Syst Biol 2023; 72:662-680. [PMID: 36773019 PMCID: PMC10276625 DOI: 10.1093/sysbio/syad006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 12/08/2022] [Accepted: 02/09/2023] [Indexed: 02/12/2023] Open
Abstract
Logical character dependency is a major conceptual and methodological problem in phylogenetic inference of morphological data sets, as it violates the assumption of character independence that is common to all phylogenetic methods. It is more frequently observed in higher-level phylogenies or in data sets characterizing major evolutionary transitions, as these represent parts of the tree of life where (primary) anatomical characters either originate or disappear entirely. As a result, secondary traits related to these primary characters become "inapplicable" across all sampled taxa in which that character is absent. Various solutions have been explored over the last three decades to handle character dependency, such as alternative character coding schemes and, more recently, new algorithmic implementations. However, the accuracy of the proposed solutions, or the impact of character dependency across distinct optimality criteria, has never been directly tested using standard performance measures. Here, we utilize simple and complex simulated morphological data sets analyzed under different maximum parsimony optimization procedures and Bayesian inference to test the accuracy of various coding and algorithmic solutions to character dependency. This is complemented by empirical analyses using a recoded data set on palaeognathid birds. We find that in small, simulated data sets, absent coding performs better than other popular coding strategies available (contingent and multistate), whereas in more complex simulations (larger data sets controlled for different tree structure and character distribution models) contingent coding is favored more frequently. Under contingent coding, a recently proposed weighting algorithm produces the most accurate results for maximum parsimony. However, Bayesian inference outperforms all parsimony-based solutions to handle character dependency due to fundamental differences in their optimization procedures-a simple alternative that has been long overlooked. Yet, we show that the more primary characters bearing secondary (dependent) traits there are in a data set, the harder it is to estimate the true phylogenetic tree, regardless of the optimality criterion, owing to a considerable expansion of the tree parameter space. [Bayesian inference, character dependency, character coding, distance metrics, morphological phylogenetics, maximum parsimony, performance, phylogenetic accuracy.].
Collapse
Affiliation(s)
- Tiago R Simões
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts, USA
| | - Oksana V Vernygora
- Department of Entomology, University of Kentucky, Lexington, Kentucky, USA
| | | | - April M Wright
- Department of Biological Sciences, Southeastern Louisiana University, Hammond, Louisiana, USA
| |
Collapse
|
31
|
Stokes MF, Kim D, Gallen SF, Benavides E, Keck BP, Wood J, Goldberg SL, Larsen IJ, Mollish JM, Simmons JW, Near TJ, Perron JT. Erosion of heterogeneous rock drives diversification of Appalachian fishes. Science 2023; 380:855-859. [PMID: 37228195 DOI: 10.1126/science.add9791] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Accepted: 04/25/2023] [Indexed: 05/27/2023]
Abstract
The high levels of biodiversity supported by mountains suggest a possible link between geologic processes and biological evolution. Freshwater biodiversity is high not only in tectonically active settings but also in tectonically quiescent montane regions such as the Appalachian Mountains. We show that erosion through different rock types drove allopatric divergence between lineages of the Greenfin Darter (Nothonotus chlorobranchius), a fish species endemic to rivers draining metamorphic rocks in the Tennessee River basin in the United States. In the past, metamorphic rock preferred by N. chlorobranchius was more widespread, but as erosion exposed other rock types, lineages of this species were progressively isolated in tributaries farther upstream, where metamorphic rock remained. Our results suggest a geologic mechanism for initiating allopatric diversification in mountains long after tectonic activity ceases.
Collapse
Affiliation(s)
- Maya F Stokes
- Yale Institute for Biospheric Studies, New Haven, CT 06511, USA
- Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Earth, Ocean, and Atmospheric Science, Florida State University, Tallahassee, FL 32304, USA
| | - Daemin Kim
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - Sean F Gallen
- Department of Geosciences, Colorado State University, Fort Collins, CO 80523, USA
| | - Edgar Benavides
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - Benjamin P Keck
- Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, TN 37996, USA
| | - Julia Wood
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - Samuel L Goldberg
- Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Isaac J Larsen
- Department of Earth, Geographic, and Climate Sciences, University of Massachusetts, Amherst, MA 01003, USA
| | - Jon Michael Mollish
- Fisheries and Aquatic Monitoring, Tennessee Valley Authority, Chattanooga, TN 37415, USA
| | - Jeffrey W Simmons
- Fisheries and Aquatic Monitoring, Tennessee Valley Authority, Chattanooga, TN 37415, USA
| | - Thomas J Near
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - J Taylor Perron
- Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
32
|
Ziegler C, Martin J, Sinner C, Morcos F. Latent generative landscapes as maps of functional diversity in protein sequence space. Nat Commun 2023; 14:2222. [PMID: 37076519 PMCID: PMC10113739 DOI: 10.1038/s41467-023-37958-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 04/05/2023] [Indexed: 04/21/2023] Open
Abstract
Variational autoencoders are unsupervised learning models with generative capabilities, when applied to protein data, they classify sequences by phylogeny and generate de novo sequences which preserve statistical properties of protein composition. While previous studies focus on clustering and generative features, here, we evaluate the underlying latent manifold in which sequence information is embedded. To investigate properties of the latent manifold, we utilize direct coupling analysis and a Potts Hamiltonian model to construct a latent generative landscape. We showcase how this landscape captures phylogenetic groupings, functional and fitness properties of several systems including Globins, β-lactamases, ion channels, and transcription factors. We provide support on how the landscape helps us understand the effects of sequence variability observed in experimental data and provides insights on directed and natural protein evolution. We propose that combining generative properties and functional predictive power of variational autoencoders and coevolutionary analysis could be beneficial in applications for protein engineering and design.
Collapse
Affiliation(s)
- Cheyenne Ziegler
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Jonathan Martin
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Claude Sinner
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA.
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, 75080, USA.
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, 75080, USA.
| |
Collapse
|
33
|
Moravec JC, Lanfear R, Spector DL, Diermeier SD, Gavryushkin A. Testing for Phylogenetic Signal in Single-Cell RNA-Seq Data. J Comput Biol 2023; 30:518-537. [PMID: 36475926 PMCID: PMC10125402 DOI: 10.1089/cmb.2022.0357] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Phylogenetic methods are emerging as a useful tool to understand cancer evolutionary dynamics, including tumor structure, heterogeneity, and progression. Most currently used approaches utilize either bulk whole genome sequencing or single-cell DNA sequencing and are based on calling copy number alterations and single nucleotide variants (SNVs). Single-cell RNA sequencing (scRNA-seq) is commonly applied to explore differential gene expression of cancer cells throughout tumor progression. The method exacerbates the single-cell sequencing problem of low yield per cell with uneven expression levels. This accounts for low and uneven sequencing coverage and makes SNV detection and phylogenetic analysis challenging. In this article, we demonstrate for the first time that scRNA-seq data contain sufficient evolutionary signal and can also be utilized in phylogenetic analyses. We explore and compare results of such analyses based on both expression levels and SNVs called from scRNA-seq data. Both techniques are shown to be useful for reconstructing phylogenetic relationships between cells, reflecting the clonal composition of a tumor. Both standardized expression values and SNVs appear to be equally capable of reconstructing a similar pattern of phylogenetic relationship. This pattern is stable even when phylogenetic uncertainty is taken in account. Our results open up a new direction of somatic phylogenetics based on scRNA-seq data. Further research is required to refine and improve these approaches to capture the full picture of somatic evolutionary dynamics in cancer.
Collapse
Affiliation(s)
- Jiří C. Moravec
- Department of Computer Science, University of Otago, Dunedin, New Zealand
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Robert Lanfear
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia
| | | | | | - Alex Gavryushkin
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
34
|
Laslo M, Just J, Angelini DR. Theme and variation in the evolution of insect sex determination. JOURNAL OF EXPERIMENTAL ZOOLOGY. PART B, MOLECULAR AND DEVELOPMENTAL EVOLUTION 2023; 340:162-181. [PMID: 35239250 PMCID: PMC10078687 DOI: 10.1002/jez.b.23125] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Revised: 11/24/2021] [Accepted: 01/03/2022] [Indexed: 11/07/2022]
Abstract
The development of dimorphic adult sexes is a critical process for most animals, one that is subject to intense selection. Work in vertebrate and insect model species has revealed that sex determination mechanisms vary widely among animal groups. However, this variation is not uniform, with a limited number of conserved factors. Therefore, sex determination offers an excellent context to consider themes and variations in gene network evolution. Here we review the literature describing sex determination in diverse insects. We have screened public genomic sequence databases for orthologs and duplicates of 25 genes involved in insect sex determination, identifying patterns of presence and absence. These genes and a 3.5 reference set of 43 others were used to infer phylogenies and compared to accepted organismal relationships to examine patterns of congruence and divergence. The function of candidate genes for roles in sex determination (virilizer, female-lethal-2-d, transformer-2) and sex chromosome dosage compensation (male specific lethal-1, msl-2, msl-3) were tested using RNA interference in the milkweed bug, Oncopeltus fasciatus. None of these candidate genes exhibited conserved roles in these processes. Amidst this variation we wish to highlight the following themes for the evolution of sex determination: (1) Unique features within taxa influence network evolution. (2) Their position in the network influences a component's evolution. Our analyses also suggest an inverse association of protein sequence conservation with functional conservation.
Collapse
Affiliation(s)
- Mara Laslo
- Department of Cell Biology, Curriculum Fellows ProgramHarvard Medical School25 Shattuck StBostonMassachusettsUSA
| | - Josefine Just
- Department of Organismic and Evolutionary BiologyHarvard University26 Oxford StCambridgeMassachusettsUSA
- Department of BiologyColby College5734 Mayflower Hill DrWatervilleMaineUSA
| | - David R. Angelini
- Department of BiologyColby College5734 Mayflower Hill DrWatervilleMaineUSA
| |
Collapse
|
35
|
Zhang Z, Smith MR, Ren X. The Cambrian cirratuliform Iotuba denotes an early annelid radiation. Proc Biol Sci 2023; 290:20222014. [PMID: 36722078 PMCID: PMC9890102 DOI: 10.1098/rspb.2022.2014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
The principal animal lineages (phyla) diverged in the Cambrian, but most diversity at lower taxonomic ranks arose more gradually over the subsequent 500 Myr. Annelid worms seem to exemplify this pattern, based on molecular analyses and the fossil record: Cambrian Burgess Shale-type deposits host a single, early-diverging crown-group annelid alongside a morphologically and taxonomically conservative stem group; the polychaete sub-classes diverge in the Ordovician; and many orders and families are first documented in Carboniferous Lagerstätten. Fifteen new fossils of the 'phoronid' Iotuba (=Eophoronis) chengjiangensis from the early Cambrian Chengjiang Lagerstätte challenge this picture. A chaetal cephalic cage surrounds a retractile head with branchial plates, affiliating Iotuba with the derived polychaete families 'Flabelligeridae' and Acrocirridae. Unless this similarity represents profound convergent evolution, this relationship would pull back the origin of the nested crown groups of Cirratuliformia, Sedentaria and Pleistoannelida by tens of millions of years-indicating a dramatic unseen origin of modern annelid diversity in the heat of the Cambrian 'explosion'.
Collapse
Affiliation(s)
- ZhiFei Zhang
- State Key Laboratory of Continental Dynamics, Shaanxi Key Laboratory of Early Life and Environments and Department of Geology, Northwest University, Xi'an 710069, People's Republic of China
| | - Martin R. Smith
- Department of Earth Sciences, Durham University, Mountjoy Site, South Road, Durham DH1 3LE, UK
| | - XinYi Ren
- State Key Laboratory of Continental Dynamics, Shaanxi Key Laboratory of Early Life and Environments and Department of Geology, Northwest University, Xi'an 710069, People's Republic of China
| |
Collapse
|
36
|
Sharifi Far S, Inácio V, Paulin D, de Carvalho M, Augustin N, Allerhand M, Robertson G. Consultancy Style Dissertations in Statistics and Data Science: Why and How. AM STAT 2023. [DOI: 10.1080/00031305.2022.2163689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Serveh Sharifi Far
- School of Mathematics and Maxwell Institute for Mathematical Sciences, University of Edinburgh, Edinburgh, UK
| | - Vanda Inácio
- School of Mathematics and Maxwell Institute for Mathematical Sciences, University of Edinburgh, Edinburgh, UK
| | - Daniel Paulin
- School of Mathematics and Maxwell Institute for Mathematical Sciences, University of Edinburgh, Edinburgh, UK
| | - Miguel de Carvalho
- School of Mathematics and Maxwell Institute for Mathematical Sciences, University of Edinburgh, Edinburgh, UK
| | - Nicole Augustin
- School of Mathematics and Maxwell Institute for Mathematical Sciences, University of Edinburgh, Edinburgh, UK
| | - Mike Allerhand
- School of Mathematics and Maxwell Institute for Mathematical Sciences, University of Edinburgh, Edinburgh, UK
| | - Gail Robertson
- School of Mathematics and Maxwell Institute for Mathematical Sciences, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
37
|
Ancient origin and constrained evolution of the division and cell wall gene cluster in Bacteria. Nat Microbiol 2022; 7:2114-2127. [PMID: 36411352 DOI: 10.1038/s41564-022-01257-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 09/23/2022] [Indexed: 11/22/2022]
Abstract
The division and cell wall (dcw) gene cluster in Bacteria comprises 17 genes encoding key steps in peptidoglycan synthesis and cytokinesis. To understand the origin and evolution of this cluster, we analysed its presence in over 1,000 bacterial genomes. We show that the dcw gene cluster is strikingly conserved in both gene content and gene order across all Bacteria and has undergone only a few rearrangements in some phyla, potentially linked to cell envelope specificities, but not directly to cell shape. A large concatenation of the 12 most conserved dcw cluster genes produced a robust tree of Bacteria that is largely consistent with recent phylogenies based on frequently used markers. Moreover, evolutionary divergence analyses show that the dcw gene cluster offers advantages in defining high-rank taxonomic boundaries and indicate at least two main phyla in the Candidate Phyla Radiation (CPR) matching a sharp dichotomy in dcw gene cluster arrangement. Our results place the origin of the dcw gene cluster in the Last Bacterial Common Ancestor and show that it has evolved vertically for billions of years, similar to major cellular machineries such as the ribosome. The strong phylogenetic signal, combined with conserved genomic synteny at large evolutionary distances, makes the dcw gene cluster a robust alternative set of markers to resolve the ever-growing tree of Bacteria.
Collapse
|
38
|
Li T, Yin Y. Critical assessment of pan-genomic analysis of metagenome-assembled genomes. Brief Bioinform 2022; 23:6702672. [PMID: 36124775 PMCID: PMC9677465 DOI: 10.1093/bib/bbac413] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 08/23/2022] [Accepted: 08/26/2022] [Indexed: 12/30/2022] Open
Abstract
Pan-genome analyses of metagenome-assembled genomes (MAGs) may suffer from the known issues with MAGs: fragmentation, incompleteness and contamination. Here, we conducted a critical assessment of pan-genomics of MAGs, by comparing pan-genome analysis results of complete bacterial genomes and simulated MAGs. We found that incompleteness led to significant core gene (CG) loss. The CG loss remained when using different pan-genome analysis tools (Roary, BPGA, Anvi'o) and when using a mixture of MAGs and complete genomes. Contamination had little effect on core genome size (except for Roary due to in its gene clustering issue) but had major influence on accessory genomes. Importantly, the CG loss was partially alleviated by lowering the CG threshold and using gene prediction algorithms that consider fragmented genes, but to a less degree when incompleteness was higher than 5%. The CG loss also led to incorrect pan-genome functional predictions and inaccurate phylogenetic trees. Our main findings were supported by a study of real MAG-isolate genome data. We conclude that lowering CG threshold and predicting genes in metagenome mode (as Anvi'o does with Prodigal) are necessary in pan-genome analysis of MAGs. Development of new pan-genome analysis tools specifically for MAGs are needed in future studies.
Collapse
Affiliation(s)
- Tang Li
- Nebraska Food for Health Center, Department of Food Science and Technology, University of Nebraska - Lincoln, Lincoln, NE, 68508, USA
| | - Yanbin Yin
- Corresponding author. Yanbin Yin, Nebraska Food for Health Center, Department of Food Science and Technology, University of Nebraska - Lincoln, Lincoln, NE 68508, USA. Tel.: +1-402-472-4303; E-mail:
| |
Collapse
|
39
|
Kaufmann TL, Petkovic M, Watkins TBK, Colliver EC, Laskina S, Thapa N, Minussi DC, Navin N, Swanton C, Van Loo P, Haase K, Tarabichi M, Schwarz RF. MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution. Genome Biol 2022; 23:241. [PMID: 36376909 PMCID: PMC9661799 DOI: 10.1186/s13059-022-02794-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 10/12/2022] [Indexed: 11/16/2022] Open
Abstract
Aneuploidy, chromosomal instability, somatic copy-number alterations, and whole-genome doubling (WGD) play key roles in cancer evolution and provide information for the complex task of phylogenetic inference. We present MEDICC2, a method for inferring evolutionary trees and WGD using haplotype-specific somatic copy-number alterations from single-cell or bulk data. MEDICC2 eschews simplifications such as the infinite sites assumption, allowing multiple mutations and parallel evolution, and does not treat adjacent loci as independent, allowing overlapping copy-number events. Using simulations and multiple data types from 2780 tumors, we use MEDICC2 to demonstrate accurate inference of phylogenies, clonal and subclonal WGD, and ancestral copy-number states.
Collapse
Affiliation(s)
- Tom L Kaufmann
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125, Berlin, Germany.
- Department of Electrical Engineering & Computer Science, Technische Universität Berlin, Marchstr. 23, 10587, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
| | - Marina Petkovic
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125, Berlin, Germany
- Department of Biology, Humboldt University of Berlin, Unter den Linden 6, 10099, Berlin, Germany
- Division of Oncology and Hematology, Department of Pediatrics, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Augustenburger Platz 1, 13353, Berlin, Germany
| | | | | | - Sofya Laskina
- Department of Mathematics and Computer Science, Free University of Berlin, Berlin, Germany
| | - Nisha Thapa
- UCL Medical School, University College London, London, UK
| | - Darlan C Minussi
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Nicholas Navin
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Charles Swanton
- The Francis Crick Institute, London, UK
- Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK
- Department of Medical Oncology, University College London Hospitals, London, UK
| | - Peter Van Loo
- The Francis Crick Institute, London, UK
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Kerstin Haase
- Division of Oncology and Hematology, Department of Pediatrics, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Augustenburger Platz 1, 13353, Berlin, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Maxime Tarabichi
- The Francis Crick Institute, London, UK
- Institute for Interdisciplinary Research, Université Libre de Bruxelles, Brussels, Belgium
| | - Roland F Schwarz
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Institute for Computational Cancer Biology, Center for Integrated Oncology (CIO) and Cancer Research Center Cologne Essen (CCCE), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.
| |
Collapse
|
40
|
Balancing Trade-Offs Imposed by Growth Media and Mass Spectrometry for Bacterial Exometabolomics. Appl Environ Microbiol 2022; 88:e0092222. [PMID: 36197102 PMCID: PMC9599359 DOI: 10.1128/aem.00922-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The bacterial exometabolome consists of a vast array of specialized metabolites, many of which are only produced in response to specific environmental stimuli. For this reason, it is desirable to control the extracellular environment with a defined growth medium composed of pure ingredients. However, complex (undefined) media are expected to support the robust growth of a greater variety of microorganisms than defined media. Here, we investigate the trade-offs inherent to a range of complex and defined solid media for the growth of soil microorganisms, production of specialized metabolites, and detection of these compounds using direct infusion mass spectrometry. We find that complex media support growth of more soil microorganisms, as well as allowing for the detection of more previously discovered natural products as a fraction of total m/z features detected in each sample. However, the use of complex media often caused mass spectrometer injection failures and poor-quality mass spectra, which in some cases resulted in over a quarter of samples being removed from analysis. Defined media, while more limiting in growth, generated higher quality spectra and yielded more m/z features after background subtraction. These results inform future exometabolomic experiments requiring a medium that supports the robust growth of many soil microorganisms. IMPORTANCE Bacteria are capable of producing and secreting a rich diversity of specialized metabolites. Yet, much of their exometabolome remains hidden due to challenges associated with eliciting specialized metabolite production, labor-intensive sample preparation, and time-consuming analysis techniques. Using our versatile three-dimensional (3D)-printed culturing platform, SubTap, we demonstrate that rapid exometabolomic data collection from a diverse set of environmental bacteria is feasible. We optimized our platform by surveying Streptomyces isolated from soil on a variety of media types to assess viability, degree of specialized metabolite production, and compatibility with downstream LESA-DIMS analysis. Ultimately, this will enable data-rich experimentation, allowing for a better understanding of bacterial exometabolomes.
Collapse
|
41
|
Smith BT, Merwin J, Provost KL, Thom G, Brumfield RT, Ferreira M, Mauck Iii WM, Moyle RG, Wright T, Joseph L. Phylogenomic analysis of the parrots of the world distinguishes artifactual from biological sources of gene tree discordance. Syst Biol 2022; 72:228-241. [PMID: 35916751 DOI: 10.1093/sysbio/syac055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 02/22/2022] [Accepted: 07/22/2022] [Indexed: 11/14/2022] Open
Abstract
Gene tree discordance is expected in phylogenomic trees and biological processes are often invoked to explain it. However, heterogeneous levels of phylogenetic signal among individuals within datasets may cause artifactual sources of topological discordance. We examined how the information content in tips and subclades impacts topological discordance in the parrots (Order: Psittaciformes), a diverse and highly threatened clade of nearly 400 species. Using ultraconserved elements from 96% of the clade's species-level diversity, we estimated concatenated and species trees for 382 ingroup taxa. We found that discordance among tree topologies was most common at nodes dating between the late Miocene and Pliocene, and often at the taxonomic level of genus. Accordingly, we used two metrics to characterize information content in tips and assess the degree to which conflict between trees was being driven by lower quality samples. Most instances of topological conflict and non-monophyletic genera in the species tree could be objectively identified using these metrics. For subclades still discordant after tip-based filtering, we used a machine learning approach to determine whether phylogenetic signal or noise was the more important predictor of metrics supporting the alternative topologies. We found that when signal favored one of the topologies, noise was the most important variable in poorly performing models that favored the alternative topology. In sum, we show that artifactual sources of gene tree discordance, which are likely a common phenomenon in many datasets, can be distinguished from biological sources by quantifying the information content in each tip and modeling which factors support each topology.
Collapse
Affiliation(s)
- Brian Tilston Smith
- Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA
| | - Jon Merwin
- Department of Ornithology, Academy of Natural Sciences of Drexel University, 1900 Benjamin Franklin Parkway, Philadelphia, PA 19103, USA.,Department of Biodiversity, Earth, and Environmental Science, Drexel University, Philadelphia, PA 19103, USA
| | - Kaiya L Provost
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, 318 W. 12th Avenue, Columbus, OH 43210, USA
| | - Gregory Thom
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Robb T Brumfield
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Mateus Ferreira
- Centro de Estudos da Biodiversidade, Universidade Federal de Roraima, Av. Cap. Ene Garcez, 2413, Boa Vista, RR, Brazil
| | - William M Mauck Iii
- Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA
| | - Robert G Moyle
- Department of Ecology and Evolutionary Biology and Biodiversity Institute, University of Kansas, 1345 Jayhawk Blvd., Lawrence, KS 66045, USA
| | - Timothy Wright
- Department of Biology, New Mexico State University, Las Cruces, NM, 88003, USA
| | - Leo Joseph
- Australian National Wildlife Collection, National Research Collections Australia, CSIRO, GPO Box 1700, Canberra, ACT, 2601, Australia
| |
Collapse
|
42
|
Raimondi S, Candeliere F, Amaretti A, Costa S, Vertuani S, Spampinato G, Rossi M. Phylogenomic analysis of the genus Leuconostoc. Front Microbiol 2022; 13:897656. [PMID: 35958134 PMCID: PMC9358442 DOI: 10.3389/fmicb.2022.897656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 06/28/2022] [Indexed: 11/17/2022] Open
Abstract
Leuconostoc is a genus of saccharolytic heterofermentative lactic acid bacteria that inhabit plant-derived matrices and a variety of fermented foods (dairy products, dough, milk, vegetables, and meats), contributing to desired fermentation processes or playing a role in food spoilage. At present, the genus encompasses 17 recognized species. In total, 216 deposited genome sequences of Leuconostoc were analyzed, to check the delineation of species and to infer their evolutive genealogy utilizing a minimum evolution tree of Average Nucleotide Identity (ANI) and the core genome alignment. Phylogenomic relationships were compared to those obtained from the analysis of 16S rRNA, pheS, and rpoA genes. All the phylograms were subjected to split decomposition analysis and their topologies were compared to check the ambiguities in the inferred phylogenesis. The minimum evolution ANI tree exhibited the most similar topology with the core genome tree, while single gene trees were less adherent and provided a weaker phylogenetic signal. In particular, the 16S rRNA gene failed to resolve several bifurcations and Leuconostoc species. Based on an ANI threshold of 95%, the organization of the genus Leuconostoc could be amended, redefining the boundaries of the species L. inhae, L. falkenbergense, L. gelidum, L. lactis, L. mesenteroides, and L. pseudomesenteroides. Two strains currently recognized as L. mesenteroides were split into a separate lineage representing a putative species (G16), phylogenetically related to both L. mesenteroides (G18) and L. suionicum (G17). Differences among the four subspecies of L. mesenteroides were not pinpointed by ANI or by the conserved genes. The strains of L. pseudomesenteroides were ascribed to two putative species, G13 and G14, the former including also all the strains presently belonging to L. falkenbergense. L. lactis was split into two phylogenetically related lineages, G9 and G10, putatively corresponding to separate species and both including subgroups that may correspond to subspecies. The species L. gelidum and L. gasicomitatum were closely related but separated into different species, the latter including also L. inhae strains. These results, integrating information of ANI, core genome, and housekeeping genes, complemented the taxonomic delineation with solid information on the phylogenetic lineages evolved within the genus Leuconostoc.
Collapse
Affiliation(s)
- Stefano Raimondi
- Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy
| | - Francesco Candeliere
- Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy
| | - Alberto Amaretti
- Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy
- Biogest Siteia, University of Modena and Reggio Emilia, Reggio Emilia, Italy
| | - Stefania Costa
- Department of Chemical, Pharmaceutical and Agricultural Sciences—DOCPAS, University of Ferrara, Ferrara, Italy
| | - Silvia Vertuani
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| | - Gloria Spampinato
- Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy
| | - Maddalena Rossi
- Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy
- Biogest Siteia, University of Modena and Reggio Emilia, Reggio Emilia, Italy
- *Correspondence: Maddalena Rossi
| |
Collapse
|
43
|
Aledo JC. Phylogenies from unaligned proteomes using sequence environments of amino acid residues. Sci Rep 2022; 12:7497. [PMID: 35523825 PMCID: PMC9076898 DOI: 10.1038/s41598-022-11370-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 04/21/2022] [Indexed: 11/09/2022] Open
Abstract
Alignment-free methods for sequence comparison and phylogeny inference have attracted a great deal of attention in recent years. Several algorithms have been implemented in diverse software packages. Despite the great number of existing methods, most of them are based on word statistics. Although they propose different filtering and weighting strategies and explore different metrics, their performance may be limited by the phylogenetic signal preserved in these words. Herein, we present a different approach based on the species-specific amino acid neighborhood preferences. These differential preferences can be assessed in the context of vector spaces. In this way, a distance-based method to build phylogenies has been developed and implemented into an easy-to-use R package. Tests run on real-world datasets show that this method can reconstruct phylogenetic relationships with high accuracy, and often outperforms other alignment-free approaches. Furthermore, we present evidence that the new method can perform reliably on datasets formed by non-orthologous protein sequences, that is, the method not only does not require the identification of orthologous proteins, but also does not require their presence in the analyzed dataset. These results suggest that the neighborhood preference of amino acids conveys a phylogenetic signal that may be of great utility in phylogenomics.
Collapse
Affiliation(s)
- Juan Carlos Aledo
- Department of Molecular Biology and Biochemistry, University of Málaga, 29071, Málaga, Spain.
| |
Collapse
|
44
|
Briand S, Dessimoz C, El-Mabrouk N, Nevers Y. A Linear Time Solution to the Labeled Robinson-Foulds Distance Problem. Syst Biol 2022; 71:1391-1403. [PMID: 35426933 PMCID: PMC9557742 DOI: 10.1093/sysbio/syac028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 04/02/2022] [Accepted: 04/07/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION A large variety of pairwise measures of similarity or dissimilarity have been developed for comparing phylogenetic trees, e.g. species trees or gene trees. Due to its intuitive definition in terms of tree clades and bipartitions and its computational efficiency, the Robinson-Foulds (RF) distance is the most widely used for trees with unweighted edges and labels restricted to leaves (representing the genetic elements being compared). However, in the case of gene trees, an important information revealing the nature of the homologous relation between gene pairs (orthologs, paralogs, xenologs) is the type of event associated to each internal node of the tree, typically speciations or duplications, but other types of events may also be considered, such as horizontal gene transfers. This labeling of internal nodes is usually inferred from a gene tree/species tree reconciliation method. Here, we address the problem of comparing such event-labeled trees. The problem differs from the classical problem of comparing uniformly labeled trees (all labels belonging to the same alphabet) that may be done using the Tree Edit Distance (TED) mainly due to the fact that, in our case, two different alphabets are considered for the leaves and internal nodes of the tree, and leaves are not affected by edit operations. RESULTS We propose an extension of the RF distance to event-labeled trees, based on edit operations comparable to those considered for TED: node insertion, node deletion and label substitution. We show that this new Labeled Robinson Foulds (LRF) distance can be computed in linear time, in addition of maintaining other desirable properties: being a metric, reducing to RF for trees with no labels on internal nodes and maintaining an intuitive interpretation. The algorithm for computing the LRF distance enables novel analyses on event-label trees such as reconciled gene trees. Here, we use it to study the impact of taxon sampling on labeled gene tree inference, and conclude that denser taxon sampling yields trees with better topology but worse labeling.
Collapse
Affiliation(s)
- Samuel Briand
- Département d'informatique et de recherche opérationnelle (DIRO), Universit de Montral, Canada
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Switzerland.,Centre for Lifes Origins and Evolution, Genetics Evolution and Environment, University College London, UK.,Department of Computer Science, University College London, UK.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nadia El-Mabrouk
- Département d'informatique et de recherche opérationnelle (DIRO), Universit de Montral, Canada
| | - Yannis Nevers
- Department of Computational Biology, University of Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
45
|
Pohle A, Kröger B, Warnock RCM, King AH, Evans DH, Aubrechtová M, Cichowolski M, Fang X, Klug C. Early cephalopod evolution clarified through Bayesian phylogenetic inference. BMC Biol 2022; 20:88. [PMID: 35421982 PMCID: PMC9008929 DOI: 10.1186/s12915-022-01284-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 03/22/2022] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Despite the excellent fossil record of cephalopods, their early evolution is poorly understood. Different, partly incompatible phylogenetic hypotheses have been proposed in the past, which reflected individual author's opinions on the importance of certain characters but were not based on thorough cladistic analyses. At the same time, methods of phylogenetic inference have undergone substantial improvements. For fossil datasets, which typically only include morphological data, Bayesian inference and in particular the introduction of the fossilized birth-death model have opened new possibilities. Nevertheless, many tree topologies recovered from these new methods reflect large uncertainties, which have led to discussions on how to best summarize the information contained in the posterior set of trees. RESULTS We present a large, newly compiled morphological character matrix of Cambrian and Ordovician cephalopods to conduct a comprehensive phylogenetic analysis and resolve existing controversies. Our results recover three major monophyletic groups, which correspond to the previously recognized Endoceratoidea, Multiceratoidea, and Orthoceratoidea, though comprising slightly different taxa. In addition, many Cambrian and Early Ordovician representatives of the Ellesmerocerida and Plectronocerida were recovered near the root. The Ellesmerocerida is para- and polyphyletic, with some of its members recovered among the Multiceratoidea and early Endoceratoidea. These relationships are robust against modifications of the dataset. While our trees initially seem to reflect large uncertainties, these are mainly a consequence of the way clade support is measured. We show that clade posterior probabilities and tree similarity metrics often underestimate congruence between trees, especially if wildcard taxa are involved. CONCLUSIONS Our results provide important insights into the earliest evolution of cephalopods and clarify evolutionary pathways. We provide a classification scheme that is based on a robust phylogenetic analysis. Moreover, we provide some general insights on the application of Bayesian phylogenetic inference on morphological datasets. We support earlier findings that quartet similarity metrics should be preferred over the Robinson-Foulds distance when higher-level phylogenetic relationships are of interest and propose that using a posteriori pruned maximum clade credibility trees help in assessing support for phylogenetic relationships among a set of relevant taxa, because they provide clade support values that better reflect the phylogenetic signal.
Collapse
Affiliation(s)
- Alexander Pohle
- Paläontologisches Institut und Museum, Universität Zürich, Karl-Schmid-Strasse 4, CH-8006, Zürich, Switzerland.
| | - Björn Kröger
- Finnish Museum of Natural History, University of Helsinki, P.O. Box 44, Jyrängöntie 2, FI-00014, Helsinki, Finland
| | - Rachel C M Warnock
- GeoZentrum Nordbayern, Friedrich-Alexander Universität Erlangen-Nürnberg, Loewenichstrasse 28, 91054, Erlangen, Germany
| | - Andy H King
- Geckoella Ltd, Suite 323, 7 Bridge Street, Taunton, TA1 1TG, UK
| | - David H Evans
- Natural England, Rivers House, East Quay, Bridgwater, TA6 4YS, UK
| | - Martina Aubrechtová
- Institute of Geology and Palaeontology, Faculty of Science, Charles University, Albertov 6, 12843, Prague, Czech Republic
- Institute of Geology, Czech Academy of Sciences, Rozvojová 269, 16500, Prague, Czech Republic
| | - Marcela Cichowolski
- Instituto de Estudios Andinos "Don Pablo Groeber", CONICET and Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pab. 2, C1428EGA, Buenos Aires, Argentina
| | - Xiang Fang
- State Key Laboratory of Palaeobiology and Stratigraphy, Nanjing Institute of Geology and Palaeontology and Center for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, 39 East Beijing Road, Nanjing, 210008, China
| | - Christian Klug
- Paläontologisches Institut und Museum, Universität Zürich, Karl-Schmid-Strasse 4, CH-8006, Zürich, Switzerland
| |
Collapse
|
46
|
Acker M, Hogle SL, Berube PM, Hackl T, Coe A, Stepanauskas R, Chisholm SW, Repeta DJ. Phosphonate production by marine microbes: Exploring new sources and potential function. Proc Natl Acad Sci U S A 2022; 119:e2113386119. [PMID: 35254902 PMCID: PMC8931226 DOI: 10.1073/pnas.2113386119] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 01/20/2022] [Indexed: 12/12/2022] Open
Abstract
SignificancePhosphonates are a class of phosphorus metabolites characterized by a highly stable C-P bond. Phosphonates accumulate to high concentrations in seawater, fuel a large fraction of marine methane production, and serve as a source of phosphorus to microbes inhabiting nutrient-limited regions of the oligotrophic ocean. Here, we show that 15% of all bacterioplankton in the surface ocean have genes phosphonate synthesis and that most belong to the abundant groups Prochlorococcus and SAR11. Genomic and chemical evidence suggests that phosphonates are incorporated into cell-surface phosphonoglycoproteins that may act to mitigate cell mortality by grazing and viral lysis. These results underscore the large global biogeochemical impact of relatively rare but highly expressed traits in numerically abundant groups of marine bacteria.
Collapse
Affiliation(s)
- Marianne Acker
- Massachusetts Institute of Technology-Woods Hole Oceanographic Institution Joint Program in Oceanography/Applied Ocean Science and Engineering, Woods Hole Oceanographic Institution, Woods Hole, MA 02543
- Department of Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, MA 02543
| | - Shane L. Hogle
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139
- Department of Biology, University of Turku, Turku 20500, Finland
| | - Paul M. Berube
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Thomas Hackl
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Allison Coe
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Ramunas Stepanauskas
- Single Cell Genomics Center, Bigelow Laboratory for Ocean Sciences, East Boothbay, ME 04544
| | - Sallie W. Chisholm
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Daniel J. Repeta
- Department of Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, MA 02543
| |
Collapse
|
47
|
Monti M, Fiorentino J, Milanetti E, Gosti G, Tartaglia GG. Prediction of Time Series Gene Expression and Structural Analysis of Gene Regulatory Networks Using Recurrent Neural Networks. ENTROPY (BASEL, SWITZERLAND) 2022; 24:141. [PMID: 35205437 PMCID: PMC8871363 DOI: 10.3390/e24020141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 01/14/2022] [Accepted: 01/15/2022] [Indexed: 11/17/2022]
Abstract
Methods for time series prediction and classification of gene regulatory networks (GRNs) from gene expression data have been treated separately so far. The recent emergence of attention-based recurrent neural network (RNN) models boosted the interpretability of RNN parameters, making them appealing for the understanding of gene interactions. In this work, we generated synthetic time series gene expression data from a range of archetypal GRNs and we relied on a dual attention RNN to predict the gene temporal dynamics. We show that the prediction is extremely accurate for GRNs with different architectures. Next, we focused on the attention mechanism of the RNN and, using tools from graph theory, we found that its graph properties allow one to hierarchically distinguish different architectures of the GRN. We show that the GRN responded differently to the addition of noise in the prediction by the RNN and we related the noise response to the analysis of the attention mechanism. In conclusion, this work provides a way to understand and exploit the attention mechanism of RNNs and it paves the way to RNN-based methods for time series prediction and inference of GRNs from gene expression data.
Collapse
Affiliation(s)
- Michele Monti
- RNA System Biology Lab, Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genoa, Italy
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Jonathan Fiorentino
- Center for Life Nano- & Neuro-Science, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161 Rome, Italy; (J.F.); (E.M.); (G.G.)
| | - Edoardo Milanetti
- Center for Life Nano- & Neuro-Science, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161 Rome, Italy; (J.F.); (E.M.); (G.G.)
- Department of Physics, Sapienza University of Rome, 00185 Rome, Italy
| | - Giorgio Gosti
- Center for Life Nano- & Neuro-Science, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161 Rome, Italy; (J.F.); (E.M.); (G.G.)
- Department of Physics, Sapienza University of Rome, 00185 Rome, Italy
| | - Gian Gaetano Tartaglia
- RNA System Biology Lab, Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genoa, Italy
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Center for Life Nano- & Neuro-Science, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161 Rome, Italy; (J.F.); (E.M.); (G.G.)
- Department of Biology and Biotechnology Charles Darwin, Sapienza University of Rome, 00185 Rome, Italy
| |
Collapse
|
48
|
A multi-modal algorithm based on an NSGA-II scheme for phylogenetic tree inference. Biosystems 2022; 213:104606. [DOI: 10.1016/j.biosystems.2022.104606] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 11/30/2021] [Accepted: 01/05/2022] [Indexed: 12/14/2022]
|
49
|
Smith MR. Robust analysis of phylogenetic tree space. Syst Biol 2021; 71:1255-1270. [PMID: 34963003 PMCID: PMC9366458 DOI: 10.1093/sysbio/syab100] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 12/03/2021] [Accepted: 12/23/2021] [Indexed: 11/13/2022] Open
Abstract
Phylogenetic analyses often produce large numbers of trees. Mapping trees' distribution in 'tree space' can illuminate the behaviour and performance of search strategies, reveal distinct clusters of optimal trees, and expose differences between different data sources or phylogenetic methods - but the high-dimensional spaces defined by metric distances are necessarily distorted when represented in fewer dimensions. Here, I explore the consequences of this transformation in phylogenetic search results from 128 morphological datasets, using stratigraphic congruence - a complementary aspect of tree similarity - to evaluate the utility of low-dimensional mappings. I find that phylogenetic similarities between cladograms are most accurately depicted in tree spaces derived from information-theoretic tree distances or the quartet distance. Robinson-Foulds tree spaces exhibit prominent distortions and often fail to group trees according to phylogenetic similarity, whereas the strong influence of tree shape on the Kendall-Colijn distance makes its tree space unsuitable for many purposes. Distances mapped into two or even three dimensions often display little correspondence with true distances, which can lead to profound misrepresentation of clustering structure. Without explicit testing, one cannot be confident that a tree space mapping faithfully represents the true distribution of trees, nor that visually evident structure is valid. My recommendations for tree space validation and visualization are implemented in a new graphical user interface in the 'TreeDist' R package.
Collapse
Affiliation(s)
- Martin R Smith
- Department of Earth Sciences, Durham University, Lower Mountjoy, Durham, DH1 3LE, UK
| |
Collapse
|
50
|
Smith MR. Using information theory to detect rogue taxa and improve consensus trees. Syst Biol 2021; 71:1088-1094. [PMID: 34951650 PMCID: PMC9366444 DOI: 10.1093/sysbio/syab099] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 11/29/2021] [Accepted: 12/17/2021] [Indexed: 11/28/2022] Open
Abstract
“Rogue” taxa of uncertain affinity can confound attempts to summarize the results of phylogenetic analyses. Rogues reduce resolution and support values in consensus trees, potentially obscuring strong evidence for relationships between other taxa. Information theory provides a principled means of assessing the congruence between a set of trees and their consensus, allowing rogue taxa to be identified more effectively than when using ad hoc measures of tree quality. A basic implementation of this approach in R recovers reduced consensus trees that are better resolved, more accurate, and more informative than those generated by existing methods. [Consensus trees; information theory; phylogenetic software; Rogue taxa.]
Collapse
Affiliation(s)
- Martin R Smith
- Department of Earth Sciences, Durham University, Lower Mountjoy, Durham, DH1 3LE, UK
| |
Collapse
|