1
|
Rubert DP, Braga MDV. Efficient gene orthology inference via large-scale rearrangements. Algorithms Mol Biol 2023; 18:14. [PMID: 37770945 PMCID: PMC10540461 DOI: 10.1186/s13015-023-00238-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 08/17/2023] [Indexed: 09/30/2023] Open
Abstract
BACKGROUND Recently we developed a gene orthology inference tool based on genome rearrangements (Journal of Bioinformatics and Computational Biology 19:6, 2021). Given a set of genomes our method first computes all pairwise gene similarities. Then it runs pairwise ILP comparisons to compute optimal gene matchings, which minimize, by taking the similarities into account, the weighted rearrangement distance between the analyzed genomes (a problem that is NP-hard). The gene matchings are then integrated into gene families in the final step. The mentioned ILP includes an optimal capping that connects each end of a linear segment of one genome to an end of a linear segment in the other genome, producing an exponential increase of the search space. RESULTS In this work, we design and implement a heuristic capping algorithm that replaces the optimal capping by clustering (based on their gene content intersections) the linear segments into [Formula: see text] subsets, whose ends are capped independently. Furthermore, in each subset, instead of allowing all possible connections, we let only the ends of content-related segments be connected. Although there is no guarantee that m is much bigger than one, and with the possible side effect of resulting in sub-optimal instead of optimal gene matchings, the heuristic works very well in practice, from both the speed performance and the quality of computed solutions. Our experiments on primate and fruit fly genomes show two positive results. First, for complete assemblies of five primates the version with heuristic capping reports orthologies that are very similar to the orthologies computed by the version of our tool with optimal capping. Second, we were able to efficiently analyze fruit fly genomes with incomplete assemblies distributed in hundreds or even thousands of contigs, obtaining gene families that are very similar to [Formula: see text] families. Indeed, our tool inferred a higher number of complete cliques, with a higher intersection with [Formula: see text], when compared to gene families computed by other inference tools. We added a post-processing for refining, with the aid of the [Formula: see text] algorithm, our ambiguous families (those with more than one gene per genome), improving even more the accuracy of our results. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities and the post-processing refinement of ambiguous families with [Formula: see text]. Both the original version with optimal capping and the new modified version with heuristic capping can be downloaded, together with their detailed documentations, at https://gitlab.ub.uni-bielefeld.de/gi/FFGC or as a Conda package at https://anaconda.org/bioconda/ffgc .
Collapse
Affiliation(s)
- Diego P Rubert
- Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Campo Grande, Brazil
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Marília D V Braga
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
2
|
Wiberg RAW, Brand JN, Viktorin G, Mitchell JO, Beisel C, Schärer L. Genome assemblies of the simultaneously hermaphroditic flatworms Macrostomum cliftonense and Macrostomum hystrix. G3 (BETHESDA, MD.) 2023; 13:jkad149. [PMID: 37398989 PMCID: PMC10468722 DOI: 10.1093/g3journal/jkad149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 06/05/2023] [Accepted: 06/21/2023] [Indexed: 07/04/2023]
Abstract
The free-living, simultaneously hermaphroditic flatworms of the genus Macrostomum are increasingly used as model systems in various contexts. In particular, Macrostomum lignano, the only species of this group with a published genome assembly, has emerged as a model for the study of regeneration, reproduction, and stem-cell function. However, challenges have emerged due to M. lignano being a hidden polyploid, having recently undergone whole-genome duplication and chromosome fusion events. This complex genome architecture presents a significant roadblock to the application of many modern genetic tools. Hence, additional genomic resources for this genus are needed. Here, we present such resources for Macrostomum cliftonense and Macrostomum hystrix, which represent the contrasting mating behaviors of reciprocal copulation and hypodermic insemination found in the genus. We use a combination of PacBio long-read sequencing and Illumina shot-gun sequencing, along with several RNA-Seq data sets, to assemble and annotate highly contiguous genomes for both species. The assemblies span ∼227 and ∼220 Mb and are represented by 399 and 42 contigs for M. cliftonense and M. hystrix, respectively. Furthermore, high BUSCO completeness (∼84-85%), low BUSCO duplication rates (8.3-6.2%), and low k-mer multiplicity indicate that these assemblies do not suffer from the same assembly ambiguities of the M. lignano genome assembly, which can be attributed to the complex karyology of this species. We also show that these resources, in combination with the prior resources from M. lignano, offer an excellent foundation for comparative genomic research in this group of organisms.
Collapse
Affiliation(s)
- R Axel W Wiberg
- Department of Environmental Sciences, Zoological Institute, University of Basel, Basel 4051, Switzerland
| | - Jeremias N Brand
- Department of Environmental Sciences, Zoological Institute, University of Basel, Basel 4051, Switzerland
| | - Gudrun Viktorin
- Department of Environmental Sciences, Zoological Institute, University of Basel, Basel 4051, Switzerland
| | - Jack O Mitchell
- Department of Environmental Sciences, Zoological Institute, University of Basel, Basel 4051, Switzerland
| | - Christian Beisel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland
| | - Lukas Schärer
- Department of Environmental Sciences, Zoological Institute, University of Basel, Basel 4051, Switzerland
| |
Collapse
|
3
|
Perez M, Aroh O, Sun Y, Lan Y, Juniper SK, Young CR, Angers B, Qian PY. Third-Generation Sequencing Reveals the Adaptive Role of the Epigenome in Three Deep-Sea Polychaetes. Mol Biol Evol 2023; 40:msad172. [PMID: 37494294 PMCID: PMC10414810 DOI: 10.1093/molbev/msad172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 06/16/2023] [Accepted: 07/17/2023] [Indexed: 07/28/2023] Open
Abstract
The roles of DNA methylation in invertebrates are poorly characterized, and critical data are missing for the phylum Annelida. We fill this knowledge gap by conducting the first genome-wide survey of DNA methylation in the deep-sea polychaetes dominant in deep-sea vents and seeps: Paraescarpia echinospica, Ridgeia piscesae, and Paralvinella palmiformis. DNA methylation calls were inferred from Oxford Nanopore sequencing after assembling high-quality genomes of these animals. The genomes of these worms encode all the key enzymes of the DNA methylation metabolism and possess a mosaic methylome similar to that of other invertebrates. Transcriptomic data of these polychaetes support the hypotheses that gene body methylation strengthens the expression of housekeeping genes and that promoter methylation acts as a silencing mechanism but not the hypothesis that DNA methylation suppresses the activity of transposable elements. The conserved epigenetic profiles of genes responsible for maintaining homeostasis under extreme hydrostatic pressure suggest DNA methylation plays an important adaptive role in these worms.
Collapse
Affiliation(s)
- Maeva Perez
- Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, China
- Department of Ocean Science, The Hong Kong University of Science and Technology, Kowloon, China
- Department of Biological Sciences, Université de Montréal, Montréal, Canada
| | - Oluchi Aroh
- Department of Biological Sciences, Auburn University, Auburn, AL, USA
| | - Yanan Sun
- Laboratory of Marine Organism Taxonomy and Phylogeny, Chinese Academy of Sciences, Institute of Oceanology, Qingdao, China
| | - Yi Lan
- Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, China
- Department of Ocean Science, The Hong Kong University of Science and Technology, Kowloon, China
| | - Stanley Kim Juniper
- School of Earth and Ocean Sciences, University of Victoria, Victoria, Canada
| | | | - Bernard Angers
- Department of Biological Sciences, Université de Montréal, Montréal, Canada
| | - Pei-Yuan Qian
- Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, China
- Department of Ocean Science, The Hong Kong University of Science and Technology, Kowloon, China
| |
Collapse
|
4
|
Darnet E, Teixeira B, Schaller H, Rogez H, Darnet S. Elucidating the Mesocarp Drupe Transcriptome of Açai ( Euterpe oleracea Mart.): An Amazonian Tree Palm Producer of Bioactive Compounds. Int J Mol Sci 2023; 24:ijms24119315. [PMID: 37298279 DOI: 10.3390/ijms24119315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 05/13/2023] [Accepted: 05/16/2023] [Indexed: 06/12/2023] Open
Abstract
Euterpe oleracea palm, endemic to the Amazon region, is well known for açai, a fruit violet beverage with nutritional and medicinal properties. During E. oleracea fruit ripening, anthocyanin accumulation is not related to sugar production, contrarily to grape and blueberry. Ripened fruits have a high content of anthocyanins, isoprenoids, fibers, and proteins, and are poor in sugars. E. oleracea is proposed as a new genetic model for metabolism partitioning in the fruit. Approximately 255 million single-end-oriented reads were generated on an Ion Proton NGS platform combining fruit cDNA libraries at four ripening stages. The de novo transcriptome assembly was tested using six assemblers and 46 different combinations of parameters, a pre-processing and a post-processing step. The multiple k-mer approach with TransABySS as an assembler and Evidential Gene as a post-processer have shown the best results, with an N50 of 959 bp, a read coverage mean of 70x, a BUSCO complete sequence recovery of 36% and an RBMT of 61%. The fruit transcriptome dataset included 22,486 transcripts representing 18 Mbp, of which a proportion of 87% had significant homology with other plant sequences. Approximately 904 new EST-SSRs were described, and were common and transferable to Phoenix dactylifera and Elaeis guineensis, two other palm trees. The global GO classification of transcripts showed similar categories to that in P. dactylifera and E. guineensis fruit transcriptomes. For an accurate annotation and functional description of metabolism genes, a bioinformatic pipeline was developed to precisely identify orthologs, such as one-to-one orthologs between species, and to infer multigenic family evolution. The phylogenetic inference confirmed an occurrence of duplication events in the Arecaceae lineage and the presence of orphan genes in E. oleracea. Anthocyanin and tocopherol pathways were annotated entirely. Interestingly, the anthocyanin pathway showed a high number of paralogs, similar to in grape, whereas the tocopherol pathway exhibited a low and conserved gene number and the prediction of several splicing forms. The release of this exhaustively annotated molecular dataset of E. oleracea constitutes a valuable tool for further studies in metabolism partitioning and opens new great perspectives to study fruit physiology with açai as a model.
Collapse
Affiliation(s)
- Elaine Darnet
- Centre for Valorization of Amazonian Bioactive Compounds (CVACBA) & Institute of Biological Sciences, Federal University of Pará (UFPA), Belém 66075-750, PA, Brazil
- International Associated Laboratory PALMHEAT, Frech Scientific Research National Center (CNRS)/UFPA, 75016 Paris, France
| | - Bruno Teixeira
- Centre for Valorization of Amazonian Bioactive Compounds (CVACBA) & Institute of Biological Sciences, Federal University of Pará (UFPA), Belém 66075-750, PA, Brazil
| | - Hubert Schaller
- International Associated Laboratory PALMHEAT, Frech Scientific Research National Center (CNRS)/UFPA, 75016 Paris, France
- Plant Isoprenoid Biology, Institute of Molecular Biology of Plants of the Scientific Research National Center, Strasbourg University, 67081 Strasbourg, France
| | - Hervé Rogez
- Centre for Valorization of Amazonian Bioactive Compounds (CVACBA) & Institute of Biological Sciences, Federal University of Pará (UFPA), Belém 66075-750, PA, Brazil
| | - Sylvain Darnet
- Centre for Valorization of Amazonian Bioactive Compounds (CVACBA) & Institute of Biological Sciences, Federal University of Pará (UFPA), Belém 66075-750, PA, Brazil
- International Associated Laboratory PALMHEAT, Frech Scientific Research National Center (CNRS)/UFPA, 75016 Paris, France
- Plant Isoprenoid Biology, Institute of Molecular Biology of Plants of the Scientific Research National Center, Strasbourg University, 67081 Strasbourg, France
| |
Collapse
|
5
|
Gout JF, Hao Y, Johri P, Arnaiz O, Doak TG, Bhullar S, Couloux A, Guérin F, Malinsky S, Potekhin A, Sawka N, Sperling L, Labadie K, Meyer E, Duharcourt S, Lynch M. Dynamics of Gene Loss following Ancient Whole-Genome Duplication in the Cryptic Paramecium Complex. Mol Biol Evol 2023; 40:msad107. [PMID: 37154524 PMCID: PMC10195154 DOI: 10.1093/molbev/msad107] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 03/30/2023] [Accepted: 05/05/2023] [Indexed: 05/10/2023] Open
Abstract
Whole-genome duplications (WGDs) have shaped the gene repertoire of many eukaryotic lineages. The redundancy created by WGDs typically results in a phase of massive gene loss. However, some WGD-derived paralogs are maintained over long evolutionary periods, and the relative contributions of different selective pressures to their maintenance are still debated. Previous studies have revealed a history of three successive WGDs in the lineage of the ciliate Paramecium tetraurelia and two of its sister species from the Paramecium aurelia complex. Here, we report the genome sequence and analysis of 10 additional P. aurelia species and 1 additional out group, revealing aspects of post-WGD evolution in 13 species sharing a common ancestral WGD. Contrary to the morphological radiation of vertebrates that putatively followed two WGD events, members of the cryptic P. aurelia complex have remained morphologically indistinguishable after hundreds of millions of years. Biases in gene retention compatible with dosage constraints appear to play a major role opposing post-WGD gene loss across all 13 species. In addition, post-WGD gene loss has been slower in Paramecium than in other species having experienced genome duplication, suggesting that the selective pressures against post-WGD gene loss are especially strong in Paramecium. A near complete lack of recent single-gene duplications in Paramecium provides additional evidence for strong selective pressures against gene dosage changes. This exceptional data set of 13 species sharing an ancestral WGD and 2 closely related out group species will be a useful resource for future studies on Paramecium as a major model organism in the evolutionary cell biology.
Collapse
Affiliation(s)
- Jean-Francois Gout
- Department of Biology, Indiana University, Bloomington, IN
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ
- Department of Biological Sciences, Mississippi State University, Starkville, MS
| | - Yue Hao
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ
- Cancer and Cell Biology Division, Translational Genomics Research Institute, Phoenix, AZ
| | - Parul Johri
- Department of Biology, Indiana University, Bloomington, IN
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ
- School of Life Sciences, Arizona State University, Tempe, AZ
| | - Olivier Arnaiz
- Institute for Integrative Biology of the Cell (I2BC), Commissariat à l'Energie Atomique (CEA), CNRS, Université Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Thomas G Doak
- Department of Biology, Indiana University, Bloomington, IN
- National Center for Genome Analysis Support, Indiana University, Bloomington, IN
| | - Simran Bhullar
- Institut de biologie de l’ENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, Université PSL, Paris, France
| | - Arnaud Couloux
- Génomique Métabolique, Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), CNRS, Univ Evry, Université Paris-Saclay, Evry, France
| | - Fréderic Guérin
- Université Paris Cité, CNRS, Institut Jacques Monod, Paris, France
| | - Sophie Malinsky
- Institut de biologie de l’ENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, Université PSL, Paris, France
| | - Alexey Potekhin
- Department of Microbiology, Faculty of Biology, Saint Petersburg State University, Saint Petersburg, Russia
- Laboratory of Cellular and Molecular Protistology, Zoological Institute RAS, Saint Petersburg, Russia
| | - Natalia Sawka
- Institute of Systematics and Evolution of Animals, Polish Academy of Sciences, Krakow, Poland
| | - Linda Sperling
- Institute for Integrative Biology of the Cell (I2BC), Commissariat à l'Energie Atomique (CEA), CNRS, Université Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Karine Labadie
- Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), Université Paris-Saclay, Evry, France
| | - Eric Meyer
- Institut de biologie de l’ENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, Université PSL, Paris, France
| | | | - Michael Lynch
- Department of Biology, Indiana University, Bloomington, IN
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ
| |
Collapse
|
6
|
Jia GS, Zhang WC, Liang Y, Liu XH, Rhind N, Pidoux A, Brysch-Herzberg M, Du LL. A high-quality reference genome for the fission yeast Schizosaccharomyces osmophilus. G3 (BETHESDA, MD.) 2023; 13:jkad028. [PMID: 36748990 PMCID: PMC10085805 DOI: 10.1093/g3journal/jkad028] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 01/23/2023] [Accepted: 01/23/2023] [Indexed: 02/08/2023]
Abstract
Fission yeasts are an ancient group of fungal species that diverged from each other from tens to hundreds of million years ago. Among them is the preeminent model organism Schizosaccharomyces pombe, which has significantly contributed to our understandings of molecular mechanisms underlying fundamental cellular processes. The availability of the genomes of S. pombe and 3 other fission yeast species S. japonicus, S. octosporus, and S. cryophilus has enabled cross-species comparisons that provide insights into the evolution of genes, pathways, and genomes. Here, we performed genome sequencing on the type strain of the recently identified fission yeast species S. osmophilus and obtained a complete mitochondrial genome and a nuclear genome assembly with gaps only at rRNA gene arrays. A total of 5,098 protein-coding nuclear genes were annotated and orthologs for more than 95% of them were identified. Genome-based phylogenetic analysis showed that S. osmophilus is most closely related to S. octosporus and these 2 species diverged around 16 million years ago. To demonstrate the utility of this S. osmophilus reference genome, we conducted cross-species comparative analyses of centromeres, telomeres, transposons, the mating-type region, Cbp1 family proteins, and mitochondrial genomes. These analyses revealed conservation of repeat arrangements and sequence motifs in centromere cores, identified telomeric sequences composed of 2 types of repeats, delineated relationships among Tf1/sushi group retrotransposons, characterized the evolutionary origins and trajectories of Cbp1 family domesticated transposases, and discovered signs of interspecific transfer of 2 types of mitochondrial selfish elements.
Collapse
Affiliation(s)
- Guo-Song Jia
- National Institute of Biological Sciences, Beijing 102206, China
| | - Wen-Cai Zhang
- National Institute of Biological Sciences, Beijing 102206, China
| | - Yue Liang
- National Institute of Biological Sciences, Beijing 102206, China
| | - Xi-Han Liu
- National Institute of Biological Sciences, Beijing 102206, China
| | - Nicholas Rhind
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Alison Pidoux
- Wellcome Centre for Cell Biology, Institute of Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3BF, Scotland, UK
| | - Michael Brysch-Herzberg
- Laboratory for Wine Microbiology, Department International Business, Heilbronn University, Heilbronn 74081, Germany
| | - Li-Lin Du
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| |
Collapse
|
7
|
Zhilina TN, Sorokin DY, Toshchakov SV, Kublanov IV, Zavarzina DG. Natronogracilivirga saccharolytica gen. nov., sp. nov. and Cyclonatronum proteinivorum gen. nov., sp. nov., haloalkaliphilic organotrophic bacteroidetes from hypersaline soda lakes forming a new family Cyclonatronaceae fam. nov. in the order Balneolales. Syst Appl Microbiol 2023; 46:126403. [PMID: 36736145 DOI: 10.1016/j.syapm.2023.126403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 01/16/2023] [Accepted: 01/22/2023] [Indexed: 01/26/2023]
Abstract
Two heterotrophic bacteroidetes strains were isolated as satellites from autotrophic enrichments inoculated with samples from hypersaline soda lakes in southwestern Siberia. Strain Z-1702T is an obligate anaerobic fermentative saccharolytic bacterium from an iron-reducing enrichment culture, while Ca. Cyclonatronum proteinivorum OmegaT is an obligate aerobic proteolytic microorganism from a cyanobacterial enrichment. Cells of isolated bacteria are characterized by highly variable morphology. Both strains are chloride-independent moderate salt-tolerant obligate alkaliphiles and mesophiles. Strain Z-1702T ferments glucose, maltose, fructose, mannose, sorbose, galactose, cellobiose, N-acetyl-glucosamine and alpha-glucans, including starch, glycogen, dextrin, and pullulan. Strain OmegaT is strictly proteolytic utilizing a range of proteins and peptones. The main polar lipid fatty acid in both strains is iso-C15:0, while other major components are various C16 and C17 isomers. According to pairwise sequence alignments using BLAST Gracilimonas was the nearest cultured relative to both strains (<90% of 16S rRNA gene sequence identity). Phylogenetic analysis placed strain Z-1702T and strain OmegaT as two different genera in a deep-branching clade of the new family level within the order Balneolales with genus. Based on physiological characteristics and phylogenetic position of strain Z-1702T it was proposed to represent a novel genus and species Natronogracilivirga saccharolityca gen. nov., sp. nov. (= DSMZ 109061T =JCM 32930T =VKM B 3262T). Furthermore, phylogenetic and phenotypic parameters of N. saccharolityca and C. proteinivorum gen. nov., sp. nov., strain OmegaT (=JCM 31662T, =UNIQEM U979T), make it possible to include them into a new family with a proposed designation Cyclonatronaceae fam. nov..
Collapse
Affiliation(s)
- Tatjana N Zhilina
- Winogradsky Institute of Microbiology, Federal Research Centre of Biotechnology RAS, 7/2 Prospekt 60-letiya Oktyabrya, 117312 Moscow, Russia
| | - Dimitry Y Sorokin
- Winogradsky Institute of Microbiology, Federal Research Centre of Biotechnology RAS, 7/2 Prospekt 60-letiya Oktyabrya, 117312 Moscow, Russia; Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Stepan V Toshchakov
- Kurchatov Center for Genome Research, National Research Center "Kurchatov Institute", 1 ac. Kurchatov square, 123098 Moscow, Russia
| | - Ilya V Kublanov
- Winogradsky Institute of Microbiology, Federal Research Centre of Biotechnology RAS, 7/2 Prospekt 60-letiya Oktyabrya, 117312 Moscow, Russia; Microbiology Department, Faculty of Biology, Lomonosov Moscow State University, Leninskie Gory 1 bld. 12, 119234 Moscow, Russia
| | - Daria G Zavarzina
- Winogradsky Institute of Microbiology, Federal Research Centre of Biotechnology RAS, 7/2 Prospekt 60-letiya Oktyabrya, 117312 Moscow, Russia.
| |
Collapse
|
8
|
Tanabe TS, Dahl C. HMS-S-S: a tool for the identification of sulfur metabolism-related genes and analysis of operon structures in genome and metagenome assemblies. Mol Ecol Resour 2022; 22:2758-2774. [PMID: 35579058 DOI: 10.1111/1755-0998.13642] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/25/2022] [Accepted: 05/11/2022] [Indexed: 11/26/2022]
Abstract
Sulfur compounds are used in a variety of biological processes including respiration and photosynthesis. Sulfide and sulfur compounds of intermediary oxidation state can serve as electron donors for lithotrophic growth while sulfate, thiosulfate and sulfur are used as electron acceptors in anaerobic respiration. The biochemistry underlying the manifold transformations of inorganic sulfur compounds occurring in sulfur metabolizing prokaryotes is astonishingly complex and knowledge about it has immensely increased over the last years. The advent of next-generation sequencing approaches as well as the significant increase of data availability in public databases has driven focus of environmental microbiology to probing the metabolic capacity of microbial communities by analysis of this sequence data. To facilitate these analyses, we created HMS-S-S, a comprehensive equivalogous hidden Markov model (HMM)-supported tool. Protein sequences related to sulfur compound oxidation, reduction, transport and intracellular transfer are efficiently detected and related enzymes involved in dissimilatory sulfur oxidation as opposed to sulfur compound reduction can be confidently distinguished. HMM search results are coupled to corresponding genes, which allows analysis of co-occurrence, synteny and genomic neighborhood. The HMMs were validated on an annotated test dataset and by cross-validation. We also proved its performance by exploring meta-assembled genomes isolated from samples from environments with active sulfur cycling, including members of the cable bacteria, novel Acidobacteria and assemblies from a sulfur-rich glacier, and were able to replicate and extend previous reports.
Collapse
Affiliation(s)
- Tomohisa Sebastian Tanabe
- Institut für Mikrobiologie & Biotechnologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Christiane Dahl
- Institut für Mikrobiologie & Biotechnologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| |
Collapse
|
9
|
Wu Y, Ren WT, Zhong YW, Guo LL, Zhou P, Xu XW. Thiosulfatihalobacter marinus gen. nov. sp. nov., a novel member of the family Roseobacteraceae, isolated from the West Pacific Ocean. Int J Syst Evol Microbiol 2022; 72. [DOI: 10.1099/ijsem.0.005286] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Two strains (GL-11-2T and ZH2-Y79) were isolated from the seawater collected from the West Pacific Ocean and the East China Sea, respectively. Cells were Gram-stain-negative, strictly aerobic, non-motile and rod-shaped. Cells grew in the medium containing 0.5–7.5 % NaCl (w/v, optimum, 1.0–3.0 %), at pH 6.0–8.0 (optimum, pH 6.5–7.0) and at 4–40 °C (optimum, 30 °C). H2S production occurred in marine broth supplemented with sodium thiosulphate. The almost-complete 16S rRNA gene sequences of the two isolates were identical, and exhibited the highest similarity to
Pseudoruegeria aquimaris
JCM 13603T (97.5 %), followed by
Ruegeria conchae
TW15T (97.2%),
Shimia aestuarii
DSM 15283T (97.1 %) and
Ruegeria lacuscaerulensis
ITI-1157T (97.0 %). Phylogenetic analysis revealed that the isolates were affiliated with the family
Roseobacteraceae
and represented an independent lineage. The sole isoprenoid quinone was ubiquinone 10. The principal fatty acids were summed feature 8 (C18 : 1
ω7c and/or C18 : 1
ω6c) and cyclo-C19 : 0
ω8c. The major polar lipids were phosphatidylglycerol, phosphatidylethanolamine, phosphatidylcholine and diphosphatidylglycerol. The DNA G+C content was 62.3 mol%. The orthologous average nucleotide identity, in silico DNA–DNA hybridization and average amino acid identity values among the genomes of strain GL-11-2T and the reference strains were 73.2–79.0, 20.3–22.5 and 66.0–80.8 %, respectively. Strains GL-11-2ᵀ and ZH2-Y79 possessed complete metabolic pathways for thiosulphate oxidation, dissimilatory nitrate reduction and denitrification. Phylogenetic distinctiveness, chemotaxonomic differences and phenotypic properties revealed that the isolates represent a novel genus and species of the family
Roseobacteraceae
, belonging to the class
Alphaproteobacteria
, for which the name Thiosulfatihalobacter marinus gen. nov., sp. nov. (type strain, GL-11–2T=KCTC 82723T=MCCC M20691T) is proposed.
Collapse
Affiliation(s)
- Yuehong Wu
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, PR China
- School of Oceanography, Shanghai Jiao Tong University, Shanghai 200240, PR China
| | - Wen-Ting Ren
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, PR China
| | - Ying-Wen Zhong
- School of Oceanography, Shanghai Jiao Tong University, Shanghai 200240, PR China
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, PR China
| | - Li-Li Guo
- College of Life and Environmental Science, Hunan University of Arts and Science, Changde 415000, PR China
| | - Peng Zhou
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, PR China
| | - Xue-Wei Xu
- School of Oceanography, Shanghai Jiao Tong University, Shanghai 200240, PR China
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, PR China
| |
Collapse
|
10
|
Ahmed M, Roberts NG, Adediran F, Smythe AB, Kocot KM, Holovachov O. Phylogenomic Analysis of the Phylum Nematoda: Conflicts and Congruences With Morphology, 18S rRNA, and Mitogenomes. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2021.769565] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Phylogenetic relationships within many lineages of the phylum Nematoda remain unresolved, despite numerous morphology-based and molecular analyses. We performed several phylogenomic analyses using 286 published genomes and transcriptomes and 19 new transcriptomes by focusing on Trichinellida, Spirurina, Rhabditina, and Tylenchina separately, and by analyzing a selection of species from the whole phylum Nematoda. The phylogeny of Trichinellida supported the division of Trichinella into encapsulated and non-encapsulated species and placed them as sister to Trichuris. The Spirurina subtree supported the clades formed by species from Ascaridomorpha and Spiruromorpha respectively, but did not support Dracunculoidea. The analysis of Tylenchina supported a clade that included all sampled species from Tylenchomorpha and placed it as sister to clades that included sampled species from Cephalobomorpha and Panagrolaimomorpha, supporting the hypothesis that postulates the single origin of the stomatostylet. The Rhabditina subtree placed a clade composed of all sampled species from Diplogastridae as sister to a lineage consisting of paraphyletic Rhabditidae, a single representative of Heterorhabditidae and a clade composed of sampled species belonging to Strongylida. It also strongly supported all suborders within Strongylida. In the phylum-wide analysis, a clade composed of all sampled species belonging to Enoplia were consistently placed as sister to Dorylaimia + Chromadoria. The topology of the Nematoda backbone was consistent with previous studies, including polyphyletic placement of sampled representatives of Monhysterida and Araeolaimida.
Collapse
|
11
|
Rubert DP, Doerr D, Braga MDV. The potential of family-free rearrangements towards gene orthology inference. J Bioinform Comput Biol 2021; 19:2140014. [PMID: 34775922 DOI: 10.1142/s021972002140014x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Recently, we proposed an efficient ILP formulation [Rubert DP, Martinez FV, Braga MDV, Natural family-free genomic distance, Algorithms Mol Biol 16:4, 2021] for exactly computing the rearrangement distance of two genomes in a family-free setting. In such a setting, neither prior classification of genes into families, nor further restrictions on the genomes are imposed. Given two genomes, the mentioned ILP computes an optimal matching of the genes taking into account simultaneously local mutations, given by gene similarities, and large-scale genome rearrangements. Here, we explore the potential of using this ILP for inferring groups of orthologs across several species. More precisely, given a set of genomes, our method first computes all pairwise optimal gene matchings, which are then integrated into gene families in the second step. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities. It can be downloaded from gitlab.ub.uni-bielefeld.de/gi/FFGC. We obtained promising results with experiments on both simulated and real data.
Collapse
Affiliation(s)
- Diego P Rubert
- Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Campo Grande, Brazil
| | - Daniel Doerr
- Faculty of Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Marília D V Braga
- Faculty of Technology and CeBiTec, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
12
|
Ren WT, Meng FX, Guo LL, Sun L, Xu XW, Zhou P, Wu YH. Luteirhabdus pelagi gen. nov., sp. nov., a novel member of the family Flavobacteriaceae, isolated from the West Pacific Ocean. Arch Microbiol 2021; 203:6021-6031. [PMID: 34698880 PMCID: PMC8590676 DOI: 10.1007/s00203-021-02557-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 08/23/2021] [Accepted: 08/26/2021] [Indexed: 12/02/2022]
Abstract
A Gram-stain-negative, aerobic, and yellow-pigmented bacterium, designated A3-108T, was isolated from seawater of the West Pacific Ocean. Cells were non-motile and rod-shaped, with carotenoid-type pigments. Strain A3-108T grew at pH 6.0–8.5 (optimum 6.5) and 15–40 °C (optimum 28 °C), in the presence of 0.5–10% (w/v) NaCl (optimum 1.0%). It possessed the ability to produce H2S. Based on the 16S rRNA gene analysis, strain A3-108T exhibited highest similarity with Aureisphaera salina A6D-50T (90.6%). Phylogenetic analysis shown that strain A3-108T affiliated with members of the family Flavobacteriaceae and represented an independent lineage. The principal fatty acids were iso-C15:0, iso-C17:0 3-OH, iso-C15:1 G, and summed feature 3 (C16:1ω7c and/or C16:1ω6c). The sole isoprenoid quinone was MK-6. The major polar lipids were phosphatidylethanolamine, one unidentified aminophospholipid, one unidentified aminolipid and one unidentified lipid. The ANIb, in silico DDH and AAI values among the genomes of strain A3-108T and three reference strains were 67.3–71.1%, 18.7–22.1%, and 58.8–71.4%, respectively. The G + C content was 41.0%. Distinctness of the phylogenetic position as well as differentiating chemotaxonomic and other phenotypic traits revealed that strain A3-108T represented a novel genus and species of the family Flavobacteriaceae, for which the name Luteirhabdus pelagi gen. nov., sp. nov. is proposed (type strain, A3-108T = CGMCC 1.18821T = KCTC 82563T).
Collapse
Affiliation(s)
- Wen-Ting Ren
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou, 310012, People's Republic of China
| | - Fan-Xu Meng
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou, 310012, People's Republic of China
| | - Li-Li Guo
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou, 310012, People's Republic of China.,College of Life and Environmental Science, Hunan University of Arts and Science, Changde, 415000, People's Republic of China
| | - Li Sun
- State Research Center of Island Exploitation and Management, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou, 310012, People's Republic of China
| | - Xue-Wei Xu
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou, 310012, People's Republic of China.,School of Oceanography, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China
| | - Peng Zhou
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou, 310012, People's Republic of China.
| | - Yue-Hong Wu
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou, 310012, People's Republic of China. .,School of Oceanography, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China.
| |
Collapse
|
13
|
Cavassim MIA, Andersen SU, Bataillon T, Schierup MH. Recombination facilitates adaptive evolution in rhizobial soil bacteria. Mol Biol Evol 2021; 38:5480-5490. [PMID: 34410427 PMCID: PMC8662638 DOI: 10.1093/molbev/msab247] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Homologous recombination is expected to increase natural selection efficacy by decoupling the fate of beneficial and deleterious mutations and by readily creating new combinations of beneficial alleles. Here, we investigate how the proportion of amino acid substitutions fixed by adaptive evolution (α) depends on the recombination rate in bacteria. We analyze 3,086 core protein-coding sequences from 196 genomes belonging to five closely related species of the genus Rhizobium. These genes are found in all species and do not display any signs of introgression between species. We estimate α using the site frequency spectrum (SFS) and divergence data for all pairs of species. We evaluate the impact of recombination within each species by dividing genes into three equally sized recombination classes based on their average level of intragenic linkage disequilibrium. We find that α varies from 0.07 to 0.39 across species and is positively correlated with the level of recombination. This is both due to a higher estimated rate of adaptive evolution and a lower estimated rate of nonadaptive evolution, suggesting that recombination both increases the fixation probability of advantageous variants and decreases the probability of fixation of deleterious variants. Our results demonstrate that homologous recombination facilitates adaptive evolution measured by α in the core genome of prokaryote species in agreement with studies in eukaryotes.
Collapse
Affiliation(s)
- Maria Izabel A Cavassim
- Bioinformatics Research Centre, Aarhus University, Aarhus, 8000, Denmark.,Department of Molecular Biology and Genetics, Aarhus University, Aarhus, 8000, Denmark
| | - Stig U Andersen
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, 8000, Denmark
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus, 8000, Denmark
| | | |
Collapse
|
14
|
Schaller D, Geiß M, Hellmuth M, Stadler PF. Heuristic algorithms for best match graph editing. Algorithms Mol Biol 2021; 16:19. [PMID: 34404422 PMCID: PMC8369769 DOI: 10.1186/s13015-021-00196-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 06/26/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. RESULTS Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho's supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing. CONCLUSION Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.
Collapse
|
15
|
Aviña-Padilla K, Ramírez-Rafael JA, Herrera-Oropeza GE, Muley VY, Valdivia DI, Díaz-Valenzuela E, García-García A, Varela-Echavarría A, Hernández-Rosales M. Evolutionary Perspective and Expression Analysis of Intronless Genes Highlight the Conservation of Their Regulatory Role. Front Genet 2021; 12:654256. [PMID: 34306008 PMCID: PMC8302217 DOI: 10.3389/fgene.2021.654256] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 06/01/2021] [Indexed: 11/13/2022] Open
Abstract
The structure of eukaryotic genes is generally a combination of exons interrupted by intragenic non-coding DNA regions (introns) removed by RNA splicing to generate the mature mRNA. A fraction of genes, however, comprise a single coding exon with introns in their untranslated regions or are intronless genes (IGs), lacking introns entirely. The latter code for essential proteins involved in development, growth, and cell proliferation and their expression has been proposed to be highly specialized for neuro-specific functions and linked to cancer, neuropathies, and developmental disorders. The abundant presence of introns in eukaryotic genomes is pivotal for the precise control of gene expression. Notwithstanding, IGs exempting splicing events entail a higher transcriptional fidelity, making them even more valuable for regulatory roles. This work aimed to infer the functional role and evolutionary history of IGs centered on the mouse genome. IGs consist of a subgroup of genes with one exon including coding genes, non-coding genes, and pseudogenes, which conform approximately 6% of a total of 21,527 genes. To understand their prevalence, biological relevance, and evolution, we identified and studied 1,116 IG functional proteins validating their differential expression in transcriptomic data of embryonic mouse telencephalon. Our results showed that overall expression levels of IGs are lower than those of MEGs. However, strongly up-regulated IGs include transcription factors (TFs) such as the class 3 of POU (HMG Box), Neurog1, Olig1, and BHLHe22, BHLHe23, among other essential genes including the β-cluster of protocadherins. Most striking was the finding that IG-encoded BHLH TFs fit the criteria to be classified as microproteins. Finally, predicted protein orthologs in other six genomes confirmed high conservation of IGs associated with regulating neural processes and with chromatin organization and epigenetic regulation in Vertebrata. Moreover, this study highlights that IGs are essential modulators of regulatory processes, such as the Wnt signaling pathway and biological processes as pivotal as sensory organ developing at a transcriptional and post-translational level. Overall, our results suggest that IG proteins have specialized, prevalent, and unique biological roles and that functional divergence between IGs and MEGs is likely to be the result of specific evolutionary constraints.
Collapse
Affiliation(s)
- Katia Aviña-Padilla
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, Mexico
- Centro de Investigacioìn y de Estudios Avanzados del IPN, Unidad Irapuato, Guanajuato, Mexico
| | | | - Gabriel Emilio Herrera-Oropeza
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, Mexico
- Centre for Developmental Neurobiology, Institute of Psychiatry, Psychology, and Neuroscience, King’s College London, London, United Kingdom
| | | | - Dulce I. Valdivia
- Centro de Investigacioìn y de Estudios Avanzados del IPN, Unidad Irapuato, Guanajuato, Mexico
| | - Erik Díaz-Valenzuela
- Centro de Investigacioìn y de Estudios Avanzados del IPN, Unidad Irapuato, Guanajuato, Mexico
| | - Andrés García-García
- Centro de Física Aplicada y Tecnología Avanzada, Universidad Nacional Autónoma de México, Querétaro, Mexico
| | | | | |
Collapse
|
16
|
Berkemer SJ, McGlynn SE. A New Analysis of Archaea-Bacteria Domain Separation: Variable Phylogenetic Distance and the Tempo of Early Evolution. Mol Biol Evol 2021; 37:2332-2340. [PMID: 32316034 PMCID: PMC7403611 DOI: 10.1093/molbev/msaa089] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Comparative genomics and molecular phylogenetics are foundational for understanding biological evolution. Although many studies have been made with the aim of understanding the genomic contents of early life, uncertainty remains. A study by Weiss et al. (Weiss MC, Sousa FL, Mrnjavac N, Neukirchen S, Roettger M, Nelson-Sathi S, Martin WF. 2016. The physiology and habitat of the last universal common ancestor. Nat Microbiol. 1(9):16116.) identified a number of protein families in the last universal common ancestor of archaea and bacteria (LUCA) which were not found in previous works. Here, we report new research that suggests the clustering approaches used in this previous study undersampled protein families, resulting in incomplete phylogenetic trees which do not reflect protein family evolution. Phylogenetic analysis of protein families which include more sequence homologs rejects a simple LUCA hypothesis based on phylogenetic separation of the bacterial and archaeal domains for a majority of the previously identified LUCA proteins (∼82%). To supplement limitations of phylogenetic inference derived from incompletely populated orthologous groups and to test the hypothesis of a period of rapid evolution preceding the separation of the domains, we compared phylogenetic distances both within and between domains, for thousands of orthologous groups. We find a substantial diversity of interdomain versus intradomain branch lengths, even among protein families which exhibit a single domain separating branch and are thought to be associated with the LUCA. Additionally, phylogenetic trees with long interdomain branches relative to intradomain branches are enriched in information categories of protein families in comparison to those associated with metabolic functions. These results provide a new view of protein family evolution and temper claims about the phenotype and habitat of the LUCA.
Collapse
Affiliation(s)
- Sarah J Berkemer
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.,Bioinformatics Group, Department of Computer Science, University Leipzig, Leipzig, Germany.,Competence Center for Scalable Data Services and Solutions, Dresden/Leipzig, Germany
| | - Shawn E McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan.,Blue Marble Space Institute of Science, Seattle, WA.,RIKEN Center for Sustainable Resource Science (CSRS), Saitama, Japan
| |
Collapse
|
17
|
Feurtey A, Lorrain C, Croll D, Eschenbrenner C, Freitag M, Habig M, Haueisen J, Möller M, Schotanus K, Stukenbrock EH. Genome compartmentalization predates species divergence in the plant pathogen genus Zymoseptoria. BMC Genomics 2020; 21:588. [PMID: 32842972 PMCID: PMC7448473 DOI: 10.1186/s12864-020-06871-w] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 06/26/2020] [Indexed: 11/25/2022] Open
Abstract
Background Antagonistic co-evolution can drive rapid adaptation in pathogens and shape genome architecture. Comparative genome analyses of several fungal pathogens revealed highly variable genomes, for many species characterized by specific repeat-rich genome compartments with exceptionally high sequence variability. Dynamic genome structure may enable fast adaptation to host genetics. The wheat pathogen Zymoseptoria tritici with its highly variable genome, has emerged as a model organism to study genome evolution of plant pathogens. Here, we compared genomes of Z. tritici isolates and of sister species infecting wild grasses to address the evolution of genome composition and structure. Results Using long-read technology, we sequenced and assembled genomes of Z. ardabiliae, Z. brevis, Z. pseudotritici and Z. passerinii, together with two isolates of Z. tritici. We report a high extent of genome collinearity among Zymoseptoria species and high conservation of genomic, transcriptomic and epigenomic signatures of compartmentalization. We identify high gene content variability both within and between species. In addition, such variability is mainly limited to the accessory chromosomes and accessory compartments. Despite strong host specificity and non-overlapping host-range between species, predicted effectors are mainly shared among Zymoseptoria species, yet exhibiting a high level of presence-absence polymorphism within Z. tritici. Using in planta transcriptomic data from Z. tritici, we suggest different roles for the shared orthologs and for the accessory genes during infection of their hosts. Conclusion Despite previous reports of high genomic plasticity in Z. tritici, we describe here a high level of conservation in genomic, epigenomic and transcriptomic composition and structure across the genus Zymoseptoria. The compartmentalized genome allows the maintenance of a functional core genome co-occurring with a highly variable accessory genome.
Collapse
Affiliation(s)
- Alice Feurtey
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany.,Environmental Genomics, Christian-Albrechts University of Kiel, 24118, Kiel, Germany
| | - Cécile Lorrain
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany. .,Environmental Genomics, Christian-Albrechts University of Kiel, 24118, Kiel, Germany. .,INRA Centre Grand Est - Nancy, UMR 1136 INRA/Universite de Lorraine Interactions Arbres/Microorganismes, 54280, Champenoux, France.
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, 2000, Neuchâtel, Switzerland
| | - Christoph Eschenbrenner
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany.,Environmental Genomics, Christian-Albrechts University of Kiel, 24118, Kiel, Germany
| | - Michael Freitag
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR, USA
| | - Michael Habig
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany.,Environmental Genomics, Christian-Albrechts University of Kiel, 24118, Kiel, Germany
| | - Janine Haueisen
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany.,Environmental Genomics, Christian-Albrechts University of Kiel, 24118, Kiel, Germany
| | - Mareike Möller
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany.,Environmental Genomics, Christian-Albrechts University of Kiel, 24118, Kiel, Germany.,Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR, USA
| | - Klaas Schotanus
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany.,Environmental Genomics, Christian-Albrechts University of Kiel, 24118, Kiel, Germany.,Department of Molecular Genetics and Microbiology, Duke University, Duke University Medical Center, Durham, NC, 27710, USA
| | - Eva H Stukenbrock
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany.,Environmental Genomics, Christian-Albrechts University of Kiel, 24118, Kiel, Germany
| |
Collapse
|
18
|
Lafond M, Hellmuth M. Reconstruction of time-consistent species trees. Algorithms Mol Biol 2020; 15:16. [PMID: 32843891 PMCID: PMC7439642 DOI: 10.1186/s13015-020-00175-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 07/25/2020] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND The history of gene families-which are equivalent to event-labeled gene trees-can to some extent be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are "biologically feasible" which is the case if one can find a species tree with which the gene tree can be reconciled in a time-consistent way. RESULTS In this contribution, we consider event-labeled gene trees that contain speciations, duplications as well as horizontal gene transfer (HGT) and we assume that the species tree is unknown. Although many problems become NP-hard as soon as HGT and time-consistency are involved, we show, in contrast, that the problem of finding a time-consistent species tree for a given event-labeled gene can be solved in polynomial-time. We provide a cubic-time algorithm to decide whether a "time-consistent" species tree for a given event-labeled gene tree exists and, in the affirmative case, to construct the species tree within the same time-complexity.
Collapse
Affiliation(s)
- Manuel Lafond
- Department of Computer Science, Université de Sherbrooke, 2500 Boul. de l’Université, Sherbrooke, J1K 2R1 Canada
| | - Marc Hellmuth
- School of Computing, University of Leeds, E C Stoner Building, Leeds, LS2 9JT UK
| |
Collapse
|
19
|
Dual RNA-seq of Orientia tsutsugamushi informs on host-pathogen interactions for this neglected intracellular human pathogen. Nat Commun 2020; 11:3363. [PMID: 32620750 PMCID: PMC7335160 DOI: 10.1038/s41467-020-17094-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 06/11/2020] [Indexed: 12/12/2022] Open
Abstract
Studying emerging or neglected pathogens is often challenging due to insufficient information and absence of genetic tools. Dual RNA-seq provides insights into host-pathogen interactions, and is particularly informative for intracellular organisms. Here we apply dual RNA-seq to Orientia tsutsugamushi (Ot), an obligate intracellular bacterium that causes the vector-borne human disease scrub typhus. Half the Ot genome is composed of repetitive DNA, and there is minimal collinearity in gene order between strains. Integrating RNA-seq, comparative genomics, proteomics, and machine learning to study the transcriptional architecture of Ot, we find evidence for wide-spread post-transcriptional antisense regulation. Comparing the host response to two clinical isolates, we identify distinct immune response networks for each strain, leading to predictions of relative virulence that are validated in a mouse infection model. Thus, dual RNA-seq can provide insight into the biology and host-pathogen interactions of a poorly characterized and genetically intractable organism such as Ot.
Collapse
|
20
|
The first transcriptomic resource for the flatworm Triaenophorus nodulosus (Cestoda: Bothriocephalidea), a common parasite of holarctic freshwater fish. Mar Genomics 2020. [DOI: 10.1016/j.margen.2019.100702] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
21
|
Galperin MY, Kristensen DM, Makarova KS, Wolf YI, Koonin EV. Microbial genome analysis: the COG approach. Brief Bioinform 2020; 20:1063-1070. [PMID: 28968633 DOI: 10.1093/bib/bbx117] [Citation(s) in RCA: 152] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 08/01/2017] [Indexed: 11/15/2022] Open
Abstract
For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis.
Collapse
|
22
|
Stadler PF, Geiß M, Schaller D, López Sánchez A, González Laffitte M, Valdivia DI, Hellmuth M, Hernández Rosales M. From pairs of most similar sequences to phylogenetic best matches. Algorithms Mol Biol 2020; 15:5. [PMID: 32308731 PMCID: PMC7147060 DOI: 10.1186/s13015-020-00165-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 03/26/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many of the commonly used methods for orthology detection start from mutually most similar pairs of genes (reciprocal best hits) as an approximation for evolutionary most closely related pairs of genes (reciprocal best matches). This approximation of best matches by best hits becomes exact for ultrametric dissimilarities, i.e., under the Molecular Clock Hypothesis. It fails, however, whenever there are large lineage specific rate variations among paralogous genes. In practice, this introduces a high level of noise into the input data for best-hit-based orthology detection methods. RESULTS If additive distances between genes are known, then evolutionary most closely related pairs can be identified by considering certain quartets of genes provided that in each quartet the outgroup relative to the remaining three genes is known. A priori knowledge of underlying species phylogeny greatly facilitates the identification of the required outgroup. Although the workflow remains a heuristic since the correct outgroup cannot be determined reliably in all cases, simulations with lineage specific biases and rate asymmetries show that nearly perfect results can be achieved. In a realistic setting, where distances data have to be estimated from sequence data and hence are noisy, it is still possible to obtain highly accurate sets of best matches. CONCLUSION Improvements of tree-free orthology assessment methods can be expected from a combination of the accurate inference of best matches reported here and recent mathematical advances in the understanding of (reciprocal) best match graphs and orthology relations. AVAILABILITY Accompanying software is available at https://github.com/david-schaller/AsymmeTree.
Collapse
Affiliation(s)
- Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, 04107 Leipzig, Germany
- Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Interdisciplinary Center for Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv), and Leipzig Research Center for Civilization Diseases, Universität Leipzig, Augustusplatz 12, 04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, 1090 Vienna, Austria
- Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Ciudad Universitaria, 111321 Bogotá, D.C. Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501 USA
| | - Manuela Geiß
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, 04107 Leipzig, Germany
- Software Competence Center Hagenberg GmbH, Softwarepark 21, 4232 Hagenberg, Austria
| | - David Schaller
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, 04107 Leipzig, Germany
| | - Alitzel López Sánchez
- CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO México
| | - Marcos González Laffitte
- CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO México
| | - Dulce I. Valdivia
- Departamento de Ingeniería Genética, Centro de Investigación y de Estudios Avanzados del IPN (CINVESTAV), Km. 9.6 Libramiento Norte Carretera Irapuato-León, 36821 Irapuato, GTO México
| | - Marc Hellmuth
- School of Computing, University of Leeds, E C Stoner Building, Leeds, LS2 9JT UK
| | - Maribel Hernández Rosales
- CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO México
| |
Collapse
|
23
|
Cavassim MIA, Moeskjær S, Moslemi C, Fields B, Bachmann A, Vilhjálmsson BJ, Schierup MH, W. Young JP, Andersen SU. Symbiosis genes show a unique pattern of introgression and selection within a Rhizobium leguminosarum species complex. Microb Genom 2020; 6:e000351. [PMID: 32176601 PMCID: PMC7276703 DOI: 10.1099/mgen.0.000351] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 02/17/2020] [Indexed: 12/22/2022] Open
Abstract
Rhizobia supply legumes with fixed nitrogen using a set of symbiosis genes. These can cross rhizobium species boundaries, but it is unclear how many other genes show similar mobility. Here, we investigate inter-species introgression using de novo assembly of 196 Rhizobium leguminosarum sv. trifolii genomes. The 196 strains constituted a five-species complex, and we calculated introgression scores based on gene-tree traversal to identify 171 genes that frequently cross species boundaries. Rather than relying on the gene order of a single reference strain, we clustered the introgressing genes into four blocks based on population structure-corrected linkage disequilibrium patterns. The two largest blocks comprised 125 genes and included the symbiosis genes, a smaller block contained 43 mainly chromosomal genes, and the last block consisted of three genes with variable genomic location. All introgression events were likely mediated by conjugation, but only the genes in the symbiosis linkage blocks displayed overrepresentation of distinct, high-frequency haplotypes. The three genes in the last block were core genes essential for symbiosis that had, in some cases, been mobilized on symbiosis plasmids. Inter-species introgression is thus not limited to symbiosis genes and plasmids, but other cases are infrequent and show distinct selection signatures.
Collapse
Affiliation(s)
- Maria Izabel A. Cavassim
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Sara Moeskjær
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Camous Moslemi
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | | | - Asger Bachmann
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | | | | | | | - Stig U. Andersen
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
24
|
Geiß M, Laffitte MEG, Sánchez AL, Valdivia DI, Hellmuth M, Rosales MH, Stadler PF. Best match graphs and reconciliation of gene trees with species trees. J Math Biol 2020; 80:1459-1495. [PMID: 32002659 PMCID: PMC7052050 DOI: 10.1007/s00285-020-01469-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 01/08/2020] [Indexed: 11/19/2022]
Abstract
A wide variety of problems in computational biology, most notably the assessment of orthology, are solved with the help of reciprocal best matches. Using an evolutionary definition of best matches that captures the intuition behind the concept we clarify rigorously the relationships between reciprocal best matches, orthology, and evolutionary events under the assumption of duplication/loss scenarios. We show that the orthology graph is a subgraph of the reciprocal best match graph (RBMG). We furthermore give conditions under which an RBMG that is a cograph identifies the correct orthlogy relation. Using computer simulations we find that most false positive orthology assignments can be identified as so-called good quartets-and thus corrected-in the absence of horizontal transfer. Horizontal transfer, however, may introduce also false-negative orthology assignments.
Collapse
Affiliation(s)
- Manuela Geiß
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center of Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany
| | - Marcos E. González Laffitte
- CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO Mexico
| | - Alitzel López Sánchez
- CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO Mexico
| | - Dulce I. Valdivia
- Centro de Ciencias Básicas, Universidad Autónoma de Aguascalientes, Av. Universidad 940, 20131 Aguascalientes, AGS México
- Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO Mexico
| | - Marc Hellmuth
- Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Straße 47, 17487 Greifswald, Germany
- Center for Bioinformatics, Saarland University, Building E 2.1, P.O. Box 151150, 66041 Saarbrücken, Germany
| | - Maribel Hernández Rosales
- CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO Mexico
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center of Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
- Competence Center for Scalable Data Services and Solutions, Leipzig Research Center for Civilization Diseases, Leipzig University, Härtelstraße 16-18, 04107 Leipzig, Germany
- Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany
- Inst. f. Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501 USA
| |
Collapse
|
25
|
Geiß M, Stadler PF, Hellmuth M. Reciprocal best match graphs. J Math Biol 2019; 80:865-953. [PMID: 31691135 DOI: 10.1007/s00285-019-01444-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 06/10/2019] [Indexed: 11/24/2022]
Abstract
Reciprocal best matches play an important role in numerous applications in computational biology, in particular as the basis of many widely used tools for orthology assessment. Nevertheless, very little is known about their mathematical structure. Here, we investigate the structure of reciprocal best match graphs (RBMGs). In order to abstract from the details of measuring distances, we define reciprocal best matches here as pairwise most closely related leaves in a gene tree, arguing that conceptually this is the notion that is pragmatically approximated by distance- or similarity-based heuristics. We start by showing that a graph G is an RBMG if and only if its quotient graph w.r.t. a certain thinness relation is an RBMG. Furthermore, it is necessary and sufficient that all connected components of G are RBMGs. The main result of this contribution is a complete characterization of RBMGs with 3 colors/species that can be checked in polynomial time. For 3 colors, there are three distinct classes of trees that are related to the structure of the phylogenetic trees explaining them. We derive an approach to recognize RBMGs with an arbitrary number of colors; it remains open however, whether a polynomial-time for RBMG recognition exists. In addition, we show that RBMGs that at the same time are cographs (co-RBMGs) can be recognized in polynomial time. Co-RBMGs are characterized in terms of hierarchically colored cographs, a particular class of vertex colored cographs that is introduced here. The (least resolved) trees that explain co-RBMGs can be constructed in polynomial time.
Collapse
Affiliation(s)
- Manuela Geiß
- Bioinformatics Group, Department of Computer Science, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.,Interdisciplinary Center of Bioinformatics, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.,Interdisciplinary Center of Bioinformatics, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.,Competence Center for Scalable Data Services and Solutions, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.,Leipzig Research Center for Civilization Diseases, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.,Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090, Vienna, Austria.,Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA
| | - Marc Hellmuth
- Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Straße 47, 17487, Greifswald, Germany. .,Center for Bioinformatics, Saarland University, Building E 2.1, P.O. Box 151150, 66041, Saarbrücken, Germany.
| |
Collapse
|
26
|
Hellmuth M, Huber KT, Moulton V. Reconciling event-labeled gene trees with MUL-trees and species networks. J Math Biol 2019; 79:1885-1925. [PMID: 31410552 DOI: 10.1007/s00285-019-01414-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 05/08/2019] [Indexed: 11/30/2022]
Abstract
Phylogenomics commonly aims to construct evolutionary trees from genomic sequence information. One way to approach this problem is to first estimate event-labeled gene trees (i.e., rooted trees whose non-leaf vertices are labeled by speciation or gene duplication events), and to then look for a species tree which can be reconciled with this tree through a reconciliation map between the trees. In practice, however, it can happen that there is no such map from a given event-labeled tree to any species tree. An important situation where this might arise is where the species evolution is better represented by a network instead of a tree. In this paper, we therefore consider the problem of reconciling event-labeled trees with species networks. In particular, we prove that any event-labeled gene tree can be reconciled with some network and that, under certain mild assumptions on the gene tree, the network can even be assumed to be multi-arc free. To prove this result, we show that we can always reconcile the gene tree with some multi-labeled (MUL-)tree, which can then be "folded up" to produce the desired reconciliation and network. In addition, we study the interplay between reconciliation maps from event-labeled gene trees to MUL-trees and networks. Our results could be useful for understanding how genomes have evolved after undergoing complex evolutionary events such as polyploidy.
Collapse
Affiliation(s)
- Marc Hellmuth
- Institute of Mathematics and Computer Science, University of Greifswald, Greifswald, Germany. .,Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
| | - Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, UK
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, UK
| |
Collapse
|
27
|
Slabaugh E, Desai JS, Sartor RC, Lawas LMF, Jagadish SVK, Doherty CJ. Analysis of differential gene expression and alternative splicing is significantly influenced by choice of reference genome. RNA (NEW YORK, N.Y.) 2019; 25:669-684. [PMID: 30872414 PMCID: PMC6521602 DOI: 10.1261/rna.070227.118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Accepted: 03/06/2019] [Indexed: 05/19/2023]
Abstract
RNA-seq analysis has enabled the evaluation of transcriptional changes in many species including nonmodel organisms. However, in most species only a single reference genome is available and RNA-seq reads from highly divergent varieties are typically aligned to this reference. Here, we quantify the impacts of the choice of mapping genome in rice where three high-quality reference genomes are available. We aligned RNA-seq data from a popular productive rice variety to three different reference genomes and found that the identification of differentially expressed genes differed depending on which reference genome was used for mapping. Furthermore, the ability to detect differentially used transcript isoforms was profoundly affected by the choice of reference genome: Only 30% of the differentially used splicing features were detected when reads were mapped to the more commonly used, but more distantly related reference genome. This demonstrated that gene expression and splicing analysis varies considerably depending on the mapping reference genome, and that analysis of individuals that are distantly related to an available reference genome may be improved by acquisition of new genomic reference material. We observed that these differences in transcriptome analysis are, in part, due to the presence of single nucleotide polymorphisms between the sequenced individual and each respective reference genome, as well as annotation differences between the reference genomes that exist even between syntenic orthologs. We conclude that even between two closely related genomes of similar quality, using the reference genome that is most closely related to the species being sampled significantly improves transcriptome analysis.
Collapse
Affiliation(s)
- Erin Slabaugh
- Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695, USA
| | - Jigar S Desai
- Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695, USA
| | - Ryan C Sartor
- Crop and Soil Science Department, North Carolina State University, Raleigh, North Carolina 27695, USA
| | - Lovely Mae F Lawas
- International Rice Research Institute (IRRI), DAPO Box 7777, Metro Manila, Philippines
- Max Planck Institute of Molecular Plant Physiology, D-14476, Potsdam, Germany
| | - S V Krishna Jagadish
- International Rice Research Institute (IRRI), DAPO Box 7777, Metro Manila, Philippines
- Department of Agronomy, Kansas State University, Manhattan, Kansas 66506, USA
| | - Colleen J Doherty
- Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695, USA
| |
Collapse
|
28
|
Hellmuth M, Seemann CR. Alternative characterizations of Fitch's xenology relation. J Math Biol 2019; 79:969-986. [PMID: 31111195 DOI: 10.1007/s00285-019-01384-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 05/08/2019] [Indexed: 11/25/2022]
Abstract
Horizontal gene transfer (HGT) is an important factor for the evolution of prokaryotes as well as eukaryotes. According to Walter M. Fitch, two genes are xenologs if they are separated by at least one HGT. This concept is formalized through Fitch relations, which are defined as binary relations that comprise all pairs (x, y) of genes x and y for which y has been horizontally transferred at least once since it diverged from the last common ancestor of x and y. This definition, in particular, preserves the directional character of the transfer. Fitch relations are characterized by a small set of forbidden induced subgraphs on three vertices and can be recognized in linear time. The mathematical characterization of Fitch relations is crucial to understand whether putative xenology relations are at least to some extent "biologically feasible". In this contribution, we provide two novel characterizations of Fitch relations. In particular, these results allow us directly to reconstruct gene trees (together with the location of the horizontal transfer events) that explain the underlying Fitch relation. As a biological side result, we can conclude that the phylogenetic signal to infer these gene trees is entirely contained in those pairs of genes x and y for which no directional transfer has been taken place in the common history of y and the last common ancestor of x and y. In other words, non-HGT events provide the essential information about the gene trees. In addition, we utilize the new characterizations to present an alternative, short and elegant proof of the characterization theorem established by Geiß et al. (J Math Bio 77(5), 2018).
Collapse
Affiliation(s)
- Marc Hellmuth
- Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Straße 47, 17487, Greifswald, Germany. .,Center for Bioinformatics, Saarland University, Building E 2.1, P.O. Box 151150, 66041, Saarbrücken, Germany.
| | - Carsten R Seemann
- Bioinformatics Group, Department of Computer Science; and Interdisciplinary Center of Bioinformatics, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany
| |
Collapse
|
29
|
Drukewitz SH, von Reumont BM. The Significance of Comparative Genomics in Modern Evolutionary Venomics. Front Ecol Evol 2019. [DOI: 10.3389/fevo.2019.00163] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
|
30
|
Prost S, Armstrong EE, Nylander J, Thomas GWC, Suh A, Petersen B, Dalen L, Benz BW, Blom MPK, Palkopoulou E, Ericson PGP, Irestedt M. Comparative analyses identify genomic features potentially involved in the evolution of birds-of-paradise. Gigascience 2019; 8:giz003. [PMID: 30689847 PMCID: PMC6497032 DOI: 10.1093/gigascience/giz003] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 10/30/2018] [Accepted: 01/10/2019] [Indexed: 12/14/2022] Open
Abstract
The diverse array of phenotypes and courtship displays exhibited by birds-of-paradise have long fascinated scientists and nonscientists alike. Remarkably, almost nothing is known about the genomics of this iconic radiation. There are 41 species in 16 genera currently recognized within the birds-of-paradise family (Paradisaeidae), most of which are endemic to the island of New Guinea. In this study, we sequenced genomes of representatives from all five major clades within this family to characterize genomic changes that may have played a role in the evolution of the group's extensive phenotypic diversity. We found genes important for coloration, morphology, and feather and eye development to be under positive selection. In birds-of-paradise with complex lekking systems and strong sexual dimorphism, the core birds-of-paradise, we found Gene Ontology categories for "startle response" and "olfactory receptor activity" to be enriched among the gene families expanding significantly faster compared to the other birds in our study. Furthermore, we found novel families of retrovirus-like retrotransposons active in all three de novo genomes since the early diversification of the birds-of-paradise group, which might have played a role in the evolution of this fascinating group of birds.
Collapse
Affiliation(s)
- Stefan Prost
- Department of Biodiversity and Genetics, Swedish Museum of Natural History, Frescativaegen 40, 114 18 Stockholm, Sweden
- Department of Integrative Biology, University of California, 3040 Valley Life Science Building, Berkeley, CA 94720-3140, USA
| | - Ellie E Armstrong
- Department of Biology, Stanford University, 371 Serra Mall, Stanford, CA 94305–5020, USA
| | - Johan Nylander
- Department of Biodiversity and Genetics, Swedish Museum of Natural History, Frescativaegen 40, 114 18 Stockholm, Sweden
| | - Gregg W C Thomas
- Department of Biology and School of Informatics, Computing, and Engineering, Indiana University, 1001 E. Third Street, Bloomington, IN 47405, USA
| | - Alexander Suh
- Department of Evolutionary Biology (EBC), Uppsala University, Norbyvaegen 14-18, 75236 Uppsala, Sweden
| | - Bent Petersen
- Natural History Museum of Denmark, University of Copenhagen, Oster Voldgade 5-7, 1353 Copenhagen, Denmark
- Centre of Excellence for Omics-Driven Computational Biodiscovery, Faculty of Applied Sciences, Asian Institute of Medicine, Science and Technology,Jalan Bedong-Semeling, 08100 Bedong, Kedah, Malaysia
| | - Love Dalen
- Department of Biodiversity and Genetics, Swedish Museum of Natural History, Frescativaegen 40, 114 18 Stockholm, Sweden
| | - Brett W Benz
- Department of Ornithology, American Museum of Natural History, Central Park West, New York, NY 10024, USA
| | - Mozes P K Blom
- Department of Biodiversity and Genetics, Swedish Museum of Natural History, Frescativaegen 40, 114 18 Stockholm, Sweden
| | - Eleftheria Palkopoulou
- Department of Biodiversity and Genetics, Swedish Museum of Natural History, Frescativaegen 40, 114 18 Stockholm, Sweden
| | - Per G P Ericson
- Department of Biodiversity and Genetics, Swedish Museum of Natural History, Frescativaegen 40, 114 18 Stockholm, Sweden
| | - Martin Irestedt
- Department of Biodiversity and Genetics, Swedish Museum of Natural History, Frescativaegen 40, 114 18 Stockholm, Sweden
| |
Collapse
|
31
|
Abstract
Best match graphs arise naturally as the first processing intermediate in algorithms for orthology detection. Let T be a phylogenetic (gene) tree T and [Formula: see text] an assignment of leaves of T to species. The best match graph [Formula: see text] is a digraph that contains an arc from x to y if the genes x and y reside in different species and y is one of possibly many (evolutionary) closest relatives of x compared to all other genes contained in the species [Formula: see text]. Here, we characterize best match graphs and show that it can be decided in cubic time and quadratic space whether [Formula: see text] derived from a tree in this manner. If the answer is affirmative, there is a unique least resolved tree that explains [Formula: see text], which can also be constructed in cubic time.
Collapse
|
32
|
Armstrong EE, Taylor RW, Prost S, Blinston P, van der Meer E, Madzikanda H, Mufute O, Mandisodza-Chikerema R, Stuelpnagel J, Sillero-Zubiri C, Petrov D. Cost-effective assembly of the African wild dog (Lycaon pictus) genome using linked reads. Gigascience 2019; 8:5140148. [PMID: 30346553 PMCID: PMC6350039 DOI: 10.1093/gigascience/giy124] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Accepted: 10/07/2018] [Indexed: 01/07/2023] Open
Abstract
Background A high-quality reference genome assembly is a valuable tool for the study of non-model organisms. Genomic techniques can provide important insights about past population sizes and local adaptation and can aid in the development of breeding management plans. This information is important for fields such as conservation genetics, where endangered species require critical and immediate attention. However, funding for genomic-based methods can be sparse for conservation projects, as costs for general species management can consume budgets. Findings Here, we report the generation of high-quality reference genomes for the African wild dog (Lycaon pictus) at a low cost (<$3000), thereby facilitating future studies of this endangered canid. We generated assemblies for three individuals using the linked-read 10x Genomics Chromium system. The most continuous assembly had a scaffold and contig N50 of 21 Mb and 83 Kb, respectively, and completely reconstructed 95% of a set of conserved mammalian genes. Additionally, we estimate the heterozygosity and demographic history of African wild dogs, revealing that although they have historically low effective population sizes, heterozygosity remains high. Conclusions We show that 10x Genomics Chromium data can be used to effectively generate high-quality genomes from Illumina short-read data of intermediate coverage (∼25x–50x). Interestingly, the wild dog shows higher heterozygosity than other species of conservation concern, possibly due to its behavioral ecology. The availability of reference genomes for non-model organisms will facilitate better genetic monitoring of threatened species such as the African wild dog and help conservationists to better understand the ecology and adaptability of those species in a changing environment.
Collapse
Affiliation(s)
- Ellie E Armstrong
- Program for Conservation Genomics, Department of Biology, 385 Serra Mall, Stanford University, Stanford, CA, 94305, USA
| | - Ryan W Taylor
- Program for Conservation Genomics, Department of Biology, 385 Serra Mall, Stanford University, Stanford, CA, 94305, USA
| | - Stefan Prost
- Program for Conservation Genomics, Department of Biology, 385 Serra Mall, Stanford University, Stanford, CA, 94305, USA.,Department of Integrative Biology, 3040 Valley Life Science Building, University of California, Berkeley, CA, 94720-3140, USA
| | - Peter Blinston
- Painted Dog Conservation, PO Box 72, Dete, 00263, Zimbabwe
| | | | | | - Olivia Mufute
- The Zimbabwe Parks & Wildlife Management Authority, Corner Sandringham & Borrowdale Roads, Botanical Gardens. Causeway, Harare, 00263, Zimbabwe
| | - Roseline Mandisodza-Chikerema
- The Zimbabwe Parks & Wildlife Management Authority, Corner Sandringham & Borrowdale Roads, Botanical Gardens. Causeway, Harare, 00263, Zimbabwe
| | - John Stuelpnagel
- 10x Genomics, Inc., 7068 Koll Center Pkwy #401, Pleasanton, CA, 94566, USA
| | - Claudio Sillero-Zubiri
- Wildlife Conservation Research Unit, Zoology, University of Oxford, The Recanati-Kaplan Centre, Abingdon Road, Tubney House, Tubney, UK014
| | - Dmitri Petrov
- Program for Conservation Genomics, Department of Biology, 385 Serra Mall, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
33
|
Georgescu CH, Manson AL, Griggs AD, Desjardins CA, Pironti A, Wapinski I, Abeel T, Haas BJ, Earl AM. SynerClust: a highly scalable, synteny-aware orthologue clustering tool. Microb Genom 2018; 4. [PMID: 30418868 PMCID: PMC6321874 DOI: 10.1099/mgen.0.000231] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Accurate orthologue identification is a vital component of bacterial comparative genomic studies, but many popular sequence-similarity-based approaches do not scale well to the large numbers of genomes that are now generated routinely. Furthermore, most approaches do not take gene synteny into account, which is useful information for disentangling paralogues. Here, we present SynerClust, a user-friendly synteny-aware tool based on synergy that can process thousands of genomes. SynerClust was designed to analyse genomes with high levels of local synteny, particularly prokaryotes, which have operon structure. SynerClust’s run-time is optimized by selecting cluster representatives at each node in the phylogeny; thus, avoiding the need for exhaustive pairwise similarity searches. In benchmarking against Roary, Hieranoid2, PanX and Reciprocal Best Hit, SynerClust was able to more completely identify sets of core genes for datasets that included diverse strains, while using substantially less memory, and with scalability comparable to the fastest tools. Due to its scalability, ease of installation and use, and suitability for a variety of computing environments, orthogroup clustering using SynerClust will enable many large-scale prokaryotic comparative genomics efforts.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Thomas Abeel
- 1Broad Institute, Cambridge, MA, USA.,3Delft University of Technology, Delft, The Netherlands
| | | | | |
Collapse
|
34
|
Nallu S, Hill JA, Don K, Sahagun C, Zhang W, Meslin C, Snell-Rood E, Clark NL, Morehouse NI, Bergelson J, Wheat CW, Kronforst MR. The molecular genetic basis of herbivory between butterflies and their host plants. Nat Ecol Evol 2018; 2:1418-1427. [PMID: 30076351 PMCID: PMC6149523 DOI: 10.1038/s41559-018-0629-9] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 07/02/2018] [Indexed: 12/30/2022]
Abstract
Interactions between herbivorous insects and their host-plants are a central component of terrestrial food webs and a critical topic in agriculture, where a substantial fraction of potential crop yield is lost annually to pests. Important insights into plant-insect interactions have come from research on specific plant defenses and insect detoxification mechanisms. Yet, much remains unknown about the molecular mechanisms that mediate plant-insect interactions. Here we use multiple genome-wide approaches to map the molecular basis of herbivory from both plant and insect perspectives, focusing on butterflies and their larval host-plants. Parallel genome-wide association studies in the Cabbage White butterfly, Pieris rapae, and its host-plant, Arabidopsis thaliana, pinpointed a small number of butterfly and plant genes that influenced herbivory. These genes, along with much of the genome, were regulated in a dynamic way over the time course of the feeding interaction. Comparative analyses, including diverse butterfly/plant systems, showed a variety of genome-wide responses to herbivory, yet a core set of highly conserved genes in butterflies as well as their host-plants. These results greatly expand our understanding of the genomic causes and evolutionary consequences of ecological interactions across two of nature’s most diverse taxa, butterflies and flowering plants.
Collapse
Affiliation(s)
- Sumitha Nallu
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Jason A Hill
- Department of Zoology, Stockholm University, Stockholm, Sweden
| | - Kristine Don
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Carlos Sahagun
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Wei Zhang
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA.,Peking-Tsinghua Center for Life Sciences, State Key Laboratory of Protein and Plant Gene Research, and School of Life Sciences, Peking University, Beijing, China
| | - Camille Meslin
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.,Institut National de la Recherche Agronomique (INRA), Institute of Ecology and Environmental Sciences of Paris (IEES-Paris), Versailles , France
| | - Emilie Snell-Rood
- Department of Ecology, Evolution and Behavior, University of Minnesota, Saint Paul, MN, USA
| | - Nathan L Clark
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nathan I Morehouse
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH, USA
| | - Joy Bergelson
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | | | - Marcus R Kronforst
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
35
|
Abstract
This chapter covers the theory and practice of ortholog gene set computation. In the theoretical part we give detailed and formal descriptions of the relevant concepts. We also cover the topic of graph-based clustering as a tool to compute ortholog gene sets. In the second part we provide an overview of practical considerations intended for researchers who need to determine orthologous genes from a collection of annotated genomes, briefly describing some of the most popular programs and resources currently available for this task.
Collapse
|
36
|
Waldl M, Thiel BC, Ochsenreiter R, Holzenleiter A, de Araujo Oliveira JV, Walter MEMT, Wolfinger MT, Stadler PF. TERribly Difficult: Searching for Telomerase RNAs in Saccharomycetes. Genes (Basel) 2018; 9:genes9080372. [PMID: 30049970 PMCID: PMC6115765 DOI: 10.3390/genes9080372] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 07/17/2018] [Accepted: 07/18/2018] [Indexed: 11/20/2022] Open
Abstract
The telomerase RNA in yeasts is large, usually >1000 nt, and contains functional elements that have been extensively studied experimentally in several disparate species. Nevertheless, they are very difficult to detect by homology-based methods and so far have escaped annotation in the majority of the genomes of Saccharomycotina. This is a consequence of sequences that evolve rapidly at nucleotide level, are subject to large variations in size, and are highly plastic with respect to their secondary structures. Here, we report on a survey that was aimed at closing this gap in RNA annotation. Despite considerable efforts and the combination of a variety of different methods, it was only partially successful. While 27 new telomerase RNAs were identified, we had to restrict our efforts to the subgroup Saccharomycetacea because even this narrow subgroup was diverse enough to require different search models for different phylogenetic subgroups. More distant branches of the Saccharomycotina remain without annotated telomerase RNA.
Collapse
Affiliation(s)
- Maria Waldl
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.
| | - Bernhard C Thiel
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.
| | - Roman Ochsenreiter
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.
| | - Alexander Holzenleiter
- BioInformatics Group, Fakultät CB Hochschule Mittweida, Technikumplatz 17, D-09648 Mittweida, Germany.
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany.
| | - João Victor de Araujo Oliveira
- Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade de Brasília, Campus Universitário⁻Asa Norte, Brasília, DF CEP: 70910-900, Brazil.
| | - Maria Emília M T Walter
- Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade de Brasília, Campus Universitário⁻Asa Norte, Brasília, DF CEP: 70910-900, Brazil.
| | - Michael T Wolfinger
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.
- Center for Anatomy and Cell Biology, Medical University of Vienna, Währingerstraße 13, 1090 Vienna, Austria.
| | - Peter F Stadler
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, Universität Leipzig, D-04107 Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany.
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA.
| |
Collapse
|
37
|
Fertin G, Hüffner F, Komusiewicz C, Sorge M. Matching algorithms for assigning orthologs after genome duplication events. Comput Biol Chem 2018; 74:379-390. [PMID: 29650458 DOI: 10.1016/j.compbiolchem.2018.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Accepted: 03/13/2018] [Indexed: 11/25/2022]
Abstract
In this paper, we introduce and analyze two graph-based models for assigning orthologs in the presence of whole-genome duplications, using similarity information between pairs of genes. The common feature of our two models is that genes of the first genome may be assigned two orthologs from the second genome, which has undergone a whole-genome duplication. Additionally, our models incorporate the new notion of duplication bonus, a parameter that reflects how assigning two orthologs to a given gene should be rewarded or penalized. Our work is mainly focused on developing exact and reasonably time-consuming algorithms for these two models: we show that the first one is polynomial-time solvable, while the second is NP-hard. For the latter, we thus design two fixed-parameter algorithms, i.e. exact algorithms whose running times are exponential only with respect to a small and well-chosen input parameter. Finally, for both models, we evaluate our algorithms on pairs of plant genomes. Our experiments show that the NP-hard model yields a better cluster quality at the cost of lower coverage, due to the fact that our instances cannot be completely solved by our algorithms. However, our results are altogether encouraging and show that our methods yield biologically significant predictions of orthologs when the duplication bonus value is properly chosen.
Collapse
Affiliation(s)
| | | | - Christian Komusiewicz
- Fachbereich für Mathematik und Informatik, Philipps-Universität Marburg, Marburg, Germany.
| | - Manuel Sorge
- Ben-Gurion University of the Negev, Beer Sheva, Israel; Technische Universität Berlin, Berlin, Germany.
| |
Collapse
|
38
|
Nøjgaard N, Geiß M, Merkle D, Stadler PF, Wieseke N, Hellmuth M. Time-consistent reconciliation maps and forbidden time travel. Algorithms Mol Biol 2018; 13:2. [PMID: 29441122 PMCID: PMC5800358 DOI: 10.1186/s13015-018-0121-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2017] [Accepted: 01/20/2018] [Indexed: 12/04/2022] Open
Abstract
Background In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to event-labeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S, relative to the reconciliation problem without prior knowledge of the event types. It is well-known that optimal reconciliations in the unlabeled case may violate time-consistency and thus are not biologically feasible. Here we investigate the mathematical structure of the event labeled reconciliation problem with horizontal transfer. Results We investigate the issue of time-consistency for the event-labeled version of the reconciliation problem, provide a convenient axiomatic framework, and derive a complete characterization of time-consistent reconciliations. This characterization depends on certain weak conditions on the event-labeled gene trees that reflect conditions under which evolutionary events are observable at least in principle. We give an \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal {O}(|V(T)|\log (|V(S)|))$$\end{document}O(|V(T)|log(|V(S)|))-time algorithm to decide whether a time-consistent reconciliation map exists. It does not require the construction of explicit timing maps, but relies entirely on the comparably easy task of checking whether a small auxiliary graph is acyclic. The algorithms are implemented in C++ using the boost graph library and are freely available at https://github.com/Nojgaard/tc-recon. Significance The combinatorial characterization of time consistency and thus biologically feasible reconciliation is an important step towards the inference of gene family histories with horizontal transfer from orthology data, i.e., without presupposed gene and species trees. The fast algorithm to decide time consistency is useful in a broader context because it constitutes an attractive component for all tools that address tree reconciliation problems.
Collapse
|
39
|
Palmer JM, Drees KP, Foster JT, Lindner DL. Extreme sensitivity to ultraviolet light in the fungal pathogen causing white-nose syndrome of bats. Nat Commun 2018; 9:35. [PMID: 29295979 PMCID: PMC5750222 DOI: 10.1038/s41467-017-02441-z] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 11/30/2017] [Indexed: 02/08/2023] Open
Abstract
Bat white-nose syndrome (WNS), caused by the fungal pathogen Pseudogymnoascus destructans, has decimated North American hibernating bats since its emergence in 2006. Here, we utilize comparative genomics to examine the evolutionary history of this pathogen in comparison to six closely related nonpathogenic species. P. destructans displays a large reduction in carbohydrate-utilizing enzymes (CAZymes) and in the predicted secretome (~50%), and an increase in lineage-specific genes. The pathogen has lost a key enzyme, UVE1, in the alternate excision repair (AER) pathway, which is known to contribute to repair of DNA lesions induced by ultraviolet (UV) light. Consistent with a nonfunctional AER pathway, P. destructans is extremely sensitive to UV light, as well as the DNA alkylating agent methyl methanesulfonate (MMS). The differential susceptibility of P. destructans to UV light in comparison to other hibernacula-inhabiting fungi represents a potential “Achilles’ heel” of P. destructans that might be exploited for treatment of bats with WNS. White-nose syndrome, caused by the fungus Pseudogymnoascus destructans, is decimating North American bats. Here, Palmer et al. use comparative genomics to examine the evolutionary history of this pathogen, and show that it has lost a crucial DNA repair enzyme and is extremely sensitive to UV light.
Collapse
Affiliation(s)
- Jonathan M Palmer
- Center for Forest Mycology Research, Northern Research Station, US Forest Service, Madison, WI, 53726, USA
| | - Kevin P Drees
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, NH, 03824, USA
| | - Jeffrey T Foster
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, NH, 03824, USA.,Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Daniel L Lindner
- Center for Forest Mycology Research, Northern Research Station, US Forest Service, Madison, WI, 53726, USA.
| |
Collapse
|
40
|
Zheng A, Jiang B, Li Y, Zhang X, Ding C. Elastic K-means using posterior probability. PLoS One 2017; 12:e0188252. [PMID: 29240756 PMCID: PMC5730165 DOI: 10.1371/journal.pone.0188252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 11/05/2017] [Indexed: 11/30/2022] Open
Abstract
The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model.
Collapse
Affiliation(s)
| | | | - Yan Li
- Anhui Broadcasting Movie and Television College, Hefei, China
| | | | | |
Collapse
|
41
|
Jahangiri-Tazehkand S, Wong L, Eslahchi C. OrthoGNC: A Software for Accurate Identification of Orthologs Based on Gene Neighborhood Conservation. GENOMICS PROTEOMICS & BIOINFORMATICS 2017; 15:361-370. [PMID: 29133277 PMCID: PMC5828658 DOI: 10.1016/j.gpb.2017.07.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2017] [Revised: 07/17/2017] [Accepted: 07/28/2017] [Indexed: 11/17/2022]
Abstract
Orthology relations can be used to transfer annotations from one gene (or protein) to another. Hence, detecting orthology relations has become an important task in the post-genomic era. Various genomic events, such as duplication and horizontal gene transfer, can cause erroneous assignment of orthology relations. In closely-related species, gene neighborhood information can be used to resolve many ambiguities in orthology inference. Here we present OrthoGNC, a software for accurately predicting pairwise orthology relations based on gene neighborhood conservation. Analyses on simulated and real data reveal the high accuracy of OrthoGNC. In addition to orthology detection, OrthoGNC can be employed to investigate the conservation of genomic context among potential orthologs detected by other methods. OrthoGNC is freely available online at http://bs.ipm.ir/softwares/orthognc and http://tinyurl.com/orthoGNC.
Collapse
Affiliation(s)
| | - Limsoon Wong
- School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Changiz Eslahchi
- Department of Computer Science, Shahid Beheshti University, Tehran 1983969411, Iran.
| |
Collapse
|
42
|
Genome-Guided Phylo-Transcriptomic Methods and the Nuclear Phylogentic Tree of the Paniceae Grasses. Sci Rep 2017; 7:13528. [PMID: 29051622 PMCID: PMC5648822 DOI: 10.1038/s41598-017-13236-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Accepted: 09/20/2017] [Indexed: 11/23/2022] Open
Abstract
The past few years have witnessed a paradigm shift in molecular systematics from phylogenetic methods (using one or a few genes) to those that can be described as phylogenomics (phylogenetic inference with entire genomes). One approach that has recently emerged is phylo-transcriptomics (transcriptome-based phylogenetic inference). As in any phylogenetics experiment, accurate orthology inference is critical to phylo-transcriptomics. To date, most analyses have inferred orthology based either on pure sequence similarity or using gene-tree approaches. The use of conserved genome synteny in orthology detection has been relatively under-employed in phylogenetics, mainly due to the cost of sequencing genomes. While current trends focus on the quantity of genes included in an analysis, the use of synteny is likely to improve the quality of ortholog inference. In this study, we combine de novo transcriptome data and sequenced genomes from an economically important group of grass species, the tribe Paniceae, to make phylogenomic inferences. This method, which we call “genome-guided phylo-transcriptomics”, is compared to other recently published orthology inference pipelines, and benchmarked using a set of sequenced genomes from across the grasses. These comparisons provide a framework for future researchers to evaluate the costs and benefits of adding sequenced genomes to transcriptome data sets.
Collapse
|
43
|
Sharma A, Wai CM, Ming R, Yu Q. Diurnal Cycling Transcription Factors of Pineapple Revealed by Genome-Wide Annotation and Global Transcriptomic Analysis. Genome Biol Evol 2017; 9:2170-2190. [PMID: 28922793 PMCID: PMC5737478 DOI: 10.1093/gbe/evx161] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/22/2017] [Indexed: 12/22/2022] Open
Abstract
Circadian clock provides fitness advantage by coordinating internal metabolic and physiological processes to external cyclic environments. Core clock components exhibit daily rhythmic changes in gene expression, and the majority of them are transcription factors (TFs) and transcription coregulators (TCs). We annotated 1,398 TFs from 67 TF families and 80 TCs from 20 TC families in pineapple, and analyzed their tissue-specific and diurnal expression patterns. Approximately 42% of TFs and 45% of TCs displayed diel rhythmic expression, including 177 TF/TCs cycling only in the nonphotosynthetic leaf tissue, 247 cycling only in the photosynthetic leaf tissue, and 201 cycling in both. We identified 68 TF/TCs whose cycling expression was tightly coupled between the photosynthetic and nonphotosynthetic leaf tissues. These TF/TCs likely coordinate key biological processes in pineapple as we demonstrated that this group is enriched in homologous genes that form the core circadian clock in Arabidopsis and includes a STOP1 homolog. Two lines of evidence support the important role of the STOP1 homolog in regulating CAM photosynthesis in pineapple. First, STOP1 responds to acidic pH and regulates a malate channel in multiple plant species. Second, the cycling expression pattern of the pineapple STOP1 and the diurnal pattern of malate accumulation in pineapple leaf are correlated. We further examined duplicate-gene retention and loss in major known circadian genes and refined their evolutionary relationships between pineapple and other plants. Significant variations in duplicate-gene retention and loss were observed for most clock genes in both monocots and dicots.
Collapse
Affiliation(s)
- Anupma Sharma
- Texas A&M AgriLife Research Center at Dallas, Texas A&M University System, Dallas
| | - Ching Man Wai
- Department of Plant Biology, University of Illinois at Urbana-Champaign
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China
| | - Ray Ming
- Department of Plant Biology, University of Illinois at Urbana-Champaign
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China
| | - Qingyi Yu
- Texas A&M AgriLife Research Center at Dallas, Texas A&M University System, Dallas
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China
- Department of Plant Pathology and Microbiology, Texas A&M University
| |
Collapse
|
44
|
Hellmuth M. Biologically feasible gene trees, reconciliation maps and informative triples. Algorithms Mol Biol 2017; 12:23. [PMID: 28861118 PMCID: PMC5576477 DOI: 10.1186/s13015-017-0114-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 08/16/2017] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The history of gene families-which are equivalent to event-labeled gene trees-can be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are biologically feasible, that is, if there is a possible true history that would explain a given gene tree. In practice, this problem is boiled down to finding a reconciliation map-also known as DTL-scenario-between the event-labeled gene trees and a (possibly unknown) species tree. RESULTS In this contribution, we first characterize whether there is a valid reconciliation map for binary event-labeled gene trees T that contain speciation, duplication and horizontal gene transfer events and some unknown species tree S in terms of "informative" triples that are displayed in T and provide information of the topology of S. These informative triples are used to infer the unknown species tree S for T. We obtain a similar result for non-binary gene trees. To this end, however, the reconciliation map needs to be further restricted. We provide a polynomial-time algorithm to decide whether there is a species tree for a given event-labeled gene tree, and in the positive case, to construct the species tree and the respective (restricted) reconciliation map. However, informative triples as well as DTL-scenarios have their limitations when they are used to explain the biological feasibility of gene trees. While reconciliation maps imply biological feasibility, we show that the converse is not true in general. Moreover, we show that informative triples neither provide enough information to characterize "relaxed" DTL-scenarios nor non-restricted reconciliation maps for non-binary biologically feasible gene trees.
Collapse
Affiliation(s)
- Marc Hellmuth
- Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Strasse 47, 17487 Greifswald, Germany
- Center for Bioinformatics, Saarland University, Building E 2.1, P.O. Box 151150, 66041 Saarbrücken, Germany
| |
Collapse
|
45
|
Positive diversifying selection is a pervasive adaptive force throughout the Drosophila radiation. Mol Phylogenet Evol 2017; 112:230-243. [DOI: 10.1016/j.ympev.2017.04.023] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Revised: 04/26/2017] [Accepted: 04/26/2017] [Indexed: 01/02/2023]
|
46
|
Battenberg K, Lee EK, Chiu JC, Berry AM, Potter D. OrthoReD: a rapid and accurate orthology prediction tool with low computational requirement. BMC Bioinformatics 2017. [PMID: 28633662 PMCID: PMC5479036 DOI: 10.1186/s12859-017-1726-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Identifying orthologous genes is an initial step required for phylogenetics, and it is also a common strategy employed in functional genetics to find candidates for functionally equivalent genes across multiple species. At the same time, in silico orthology prediction tools often require large computational resources only available on computing clusters. Here we present OrthoReD, an open-source orthology prediction tool with accuracy comparable to published tools that requires only a desktop computer. The low computational resource requirement of OrthoReD is achieved by repeating orthology searches on one gene of interest at a time, thereby generating a reduced dataset to limit the scope of orthology search for each gene of interest. Results The output of OrthoReD was highly similar to the outputs of two other published orthology prediction tools, OrthologID and/or OrthoDB, for the three dataset tested, which represented three phyla with different ranges of species diversity and different number of genomes included. Median CPU time for ortholog prediction per gene by OrthoReD executed on a desktop computer was <15 min even for the largest dataset tested, which included all coding sequences of 100 bacterial species. Conclusions With high-throughput sequencing, unprecedented numbers of genes from non-model organisms are available with increasing need for clear information about their orthologies and/or functional equivalents in model organisms. OrthoReD is not only fast and accurate as an orthology prediction tool, but also gives researchers flexibility in the number of genes analyzed at a time, without requiring a high-performance computing cluster. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1726-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kai Battenberg
- Department of Plant Sciences, University of California, Davis, CA, USA.
| | - Ernest K Lee
- Department of Entomology and Nematology, University of California, Davis, CA, USA
| | - Joanna C Chiu
- Department of Entomology and Nematology, University of California, Davis, CA, USA
| | - Alison M Berry
- Department of Plant Sciences, University of California, Davis, CA, USA
| | - Daniel Potter
- Department of Plant Sciences, University of California, Davis, CA, USA
| |
Collapse
|
47
|
Doerr D, Kowada LAB, Araujo E, Deshpande S, Dantas S, Moret BME, Stoye J. New Genome Similarity Measures based on Conserved Gene Adjacencies. J Comput Biol 2017; 24:616-634. [PMID: 28590847 DOI: 10.1089/cmb.2017.0065] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Many important questions in molecular biology, evolution, and biomedicine can be addressed by comparative genomic approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example, to elucidate the phylogenetic relationships between species. The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome under study must contain the same genes, each in exactly one copy. More realistic models allow several copies of a gene in a genome. One speaks of gene families, and comparative genomic methods that allow this kind of input are called gene family-based. The most powerful-but also most complex-models avoid this preprocessing of the input data and instead integrate the family assignment within the comparative analysis. Such methods are called gene family-free. In this article, we study an intermediate approach between family-based and family-free genomic similarity measures. Introducing this simpler model, called gene connections, we focus on the combinatorial aspects of gene family-free genome comparison. While in most cases, the computational costs to the general family-free case are the same, we also find an instance where the gene connections model has lower complexity. Within the gene connections model, we define three variants of genomic similarity measures that have different expression powers. We give polynomial-time algorithms for two of them, while we show NP-hardness for the third, most powerful one. We also generalize the measures and algorithms to make them more robust against recent local disruptions in gene order. Our theoretical findings are supported by experimental results, proving the applicability and performance of our newly defined similarity measures.
Collapse
Affiliation(s)
- Daniel Doerr
- 1 École Polytechnique Fédérale de Lausanne , Lausanne, Switzerland
| | | | - Eloi Araujo
- 3 Universidade Federal de Mato Grosso do Sul , Campo Grande, Brazil .,4 Faculty of Technology and Center for Biotechnology, Bielefeld University , Bielefeld, Germany
| | - Shachi Deshpande
- 1 École Polytechnique Fédérale de Lausanne , Lausanne, Switzerland .,5 Department of Computer Science and Engineering, IIT Bombay , Mumbai, India
| | | | | | - Jens Stoye
- 2 Universidade Federal Fluminense , Niterói, Brazil .,4 Faculty of Technology and Center for Biotechnology, Bielefeld University , Bielefeld, Germany
| |
Collapse
|
48
|
Doerr D, Balaban M, Feijão P, Chauve C. The gene family-free median of three. Algorithms Mol Biol 2017; 12:14. [PMID: 28559921 PMCID: PMC5446766 DOI: 10.1186/s13015-017-0106-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Accepted: 05/18/2017] [Indexed: 11/20/2022] Open
Abstract
Background The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes. Methods We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$k \le 3$$\end{document}k≤3 and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice. Results The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR. Conclusions We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs. Electronic supplementary material The online version of this article (doi:10.1186/s13015-017-0106-z) contains supplementary material, which is available to authorized users.
Collapse
|
49
|
Leimbach A, Poehlein A, Vollmers J, Görlich D, Daniel R, Dobrindt U. No evidence for a bovine mastitis Escherichia coli pathotype. BMC Genomics 2017; 18:359. [PMID: 28482799 PMCID: PMC5422975 DOI: 10.1186/s12864-017-3739-x] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2016] [Accepted: 04/27/2017] [Indexed: 11/30/2022] Open
Abstract
Background Escherichia coli bovine mastitis is a disease of significant economic importance in the dairy industry. Molecular characterization of mastitis-associated E. coli (MAEC) did not result in the identification of common traits. Nevertheless, a mammary pathogenic E. coli (MPEC) pathotype has been proposed suggesting virulence traits that differentiate MAEC from commensal E. coli. The present study was designed to investigate the MPEC pathotype hypothesis by comparing the genomes of MAEC and commensal bovine E. coli. Results We sequenced the genomes of eight E. coli isolated from bovine mastitis cases and six fecal commensal isolates from udder-healthy cows. We analyzed the phylogenetic history of bovine E. coli genomes by supplementing this strain panel with eleven bovine-associated E. coli from public databases. The majority of the isolates originate from phylogroups A and B1, but neither MAEC nor commensal strains could be unambiguously distinguished by phylogenetic lineage. The gene content of both MAEC and commensal strains is highly diverse and dominated by their phylogenetic background. Although individual strains carry some typical E. coli virulence-associated genes, no traits important for pathogenicity could be specifically attributed to MAEC. Instead, both commensal strains and MAEC have very few gene families enriched in either pathotype. Only the aerobactin siderophore gene cluster was enriched in commensal E. coli within our strain panel. Conclusions This is the first characterization of a phylogenetically diverse strain panel including several MAEC and commensal isolates. With our comparative genomics approach we could not confirm previous studies that argue for a positive selection of specific traits enabling MAEC to elicit bovine mastitis. Instead, MAEC are facultative and opportunistic pathogens recruited from the highly diverse bovine gastrointestinal microbiota. Virulence-associated genes implicated in mastitis are a by-product of commensalism with the primary function to enhance fitness in the bovine gastrointestinal tract. Therefore, we put the definition of the MPEC pathotype into question and suggest to designate corresponding isolates as MAEC. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3739-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Andreas Leimbach
- Institute of Hygiene, University of Münster, Mendelstrasse 7, 48149, Münster, Germany. .,Department of Genomic and Applied Microbiology, Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August-University of Göttingen, Göttingen, Germany. .,Institute for Molecular Infection Biology, Julius-Maximilians-University of Würzburg, Würzburg, Germany.
| | - Anja Poehlein
- Department of Genomic and Applied Microbiology, Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August-University of Göttingen, Göttingen, Germany
| | - John Vollmers
- Leibniz Institute DSMZ, German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Dennis Görlich
- Institute of Biostatistics and Clinical Research, University of Münster, Münster, Germany
| | - Rolf Daniel
- Department of Genomic and Applied Microbiology, Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August-University of Göttingen, Göttingen, Germany
| | - Ulrich Dobrindt
- Institute of Hygiene, University of Münster, Mendelstrasse 7, 48149, Münster, Germany. .,Institute for Molecular Infection Biology, Julius-Maximilians-University of Würzburg, Würzburg, Germany.
| |
Collapse
|
50
|
Yue JX, Li J, Aigrain L, Hallin J, Persson K, Oliver K, Bergström A, Coupland P, Warringer J, Lagomarsino MC, Fischer G, Durbin R, Liti G. Contrasting evolutionary genome dynamics between domesticated and wild yeasts. Nat Genet 2017; 49:913-924. [PMID: 28416820 PMCID: PMC5446901 DOI: 10.1038/ng.3847] [Citation(s) in RCA: 208] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Accepted: 03/22/2017] [Indexed: 12/13/2022]
Abstract
Structural rearrangements have long been recognized as an important source of genetic variation, with implications in phenotypic diversity and disease, yet their detailed evolutionary dynamics remain elusive. Here we use long-read sequencing to generate end-to-end genome assemblies for 12 strains representing major subpopulations of the partially domesticated yeast Saccharomyces cerevisiae and its wild relative Saccharomyces paradoxus. These population-level high-quality genomes with comprehensive annotation enable precise definition of chromosomal boundaries between cores and subtelomeres and a high-resolution view of evolutionary genome dynamics. In chromosomal cores, S. paradoxus shows faster accumulation of balanced rearrangements (inversions, reciprocal translocations and transpositions), whereas S. cerevisiae accumulates unbalanced rearrangements (novel insertions, deletions and duplications) more rapidly. In subtelomeres, both species show extensive interchromosomal reshuffling, with a higher tempo in S. cerevisiae. Such striking contrasts between wild and domesticated yeasts are likely to reflect the influence of human activities on structural genome evolution.
Collapse
Affiliation(s)
- Jia-Xing Yue
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
| | - Jing Li
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
| | | | - Johan Hallin
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
| | - Karl Persson
- Department of Chemistry and Molecular Biology, Gothenburg University, Gothenburg, Sweden
| | | | | | | | - Jonas Warringer
- Department of Chemistry and Molecular Biology, Gothenburg University, Gothenburg, Sweden
| | - Marco Cosentino Lagomarsino
- Laboratory of Computational and Quantitative Biology, Institut de Biologie Paris-Seine, UPMC University Paris 06, Sorbonne Universités, CNRS, Paris, France
| | - Gilles Fischer
- Laboratory of Computational and Quantitative Biology, Institut de Biologie Paris-Seine, UPMC University Paris 06, Sorbonne Universités, CNRS, Paris, France
| | | | - Gianni Liti
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
| |
Collapse
|