1
|
Parra-Rincón E, Velandia-Huerto CA, Gittenberger A, Fallmann J, Gatter T, Brown FD, Stadler PF, Bermúdez-Santana CI. The Genome of the "Sea Vomit" Didemnum vexillum. Life (Basel) 2021; 11:life11121377. [PMID: 34947908 PMCID: PMC8704543 DOI: 10.3390/life11121377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 12/02/2021] [Accepted: 12/03/2021] [Indexed: 11/25/2022] Open
Abstract
Tunicates are the sister group of vertebrates and thus occupy a key position for investigations into vertebrate innovations as well as into the consequences of the vertebrate-specific genome duplications. Nevertheless, tunicate genomes have not been studied extensively in the past, and comparative studies of tunicate genomes have remained scarce. The carpet sea squirt Didemnum vexillum, commonly known as “sea vomit”, is a colonial tunicate considered an invasive species with substantial ecological and economical risk. We report the assembly of the D. vexillum genome using a hybrid approach that combines 28.5 Gb Illumina and 12.35 Gb of PacBio data. The new hybrid scaffolded assembly has a total size of 517.55 Mb that increases contig length about eightfold compared to previous, Illumina-only assembly. As a consequence of an unusually high genetic diversity of the colonies and the moderate length of the PacBio reads, presumably caused by the unusually acidic milieu of the tunic, the assembly is highly fragmented (L50 = 25,284, N50 = 6539). It is sufficient, however, for comprehensive annotations of both protein-coding genes and non-coding RNAs. Despite its shortcomings, the draft assembly of the “sea vomit” genome provides a valuable resource for comparative tunicate genomics and for the study of the specific properties of colonial ascidians.
Collapse
Affiliation(s)
- Ernesto Parra-Rincón
- Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá D.C 111321, Colombia; (E.P.-R.); (P.F.S.)
| | - Cristian A. Velandia-Huerto
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, 04107 Leipzig, Germany; (J.F.); (T.G.)
- Correspondence: (C.A.V.-H.); (C.I.B.-S.)
| | - Adriaan Gittenberger
- GiMaRIS, Rijksstraatweg 75, 2171 AK Sassenheim, The Netherlands;
- Institute of Biology, Leiden University, P.O. Box 9505, 2300 RA Leiden, The Netherlands
- Naturalis Biodiversity Center, Darwinweg 2, 2333 CR Leiden, The Netherlands
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, 04107 Leipzig, Germany; (J.F.); (T.G.)
| | - Thomas Gatter
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, 04107 Leipzig, Germany; (J.F.); (T.G.)
| | - Federico D. Brown
- Departamento de Zoologia, Instituto Biociências, Universidade de São Paulo, Rua do Matão, Tr. 14 no. 101, São Paulo 05508-090, Brazil;
- Centro de Biologia Marinha, Universidade de São Paulo, Rod. Manuel Hypólito do Rego km. 131.5, São Sebastião 11612-109, Brazil
| | - Peter F. Stadler
- Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá D.C 111321, Colombia; (E.P.-R.); (P.F.S.)
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, 04107 Leipzig, Germany; (J.F.); (T.G.)
- Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, 1090 Vienna, Austria
- Santa Fe Institute, Santa Fe, NM 87506, USA
| | - Clara I. Bermúdez-Santana
- Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá D.C 111321, Colombia; (E.P.-R.); (P.F.S.)
- Correspondence: (C.A.V.-H.); (C.I.B.-S.)
| |
Collapse
|
2
|
Evolution and Phylogeny of MicroRNAs - Protocols, Pitfalls, and Problems. Methods Mol Biol 2021. [PMID: 34432281 DOI: 10.1007/978-1-0716-1170-8_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/17/2023]
Abstract
MicroRNAs are important regulators in many eukaryotic lineages. Typical miRNAs have a length of about 22nt and are processed from precursors that form a characteristic hairpin structure. Once they appear in a genome, miRNAs are among the best-conserved elements in both animal and plant genomes. Functionally, they play an important role in particular in development. In contrast to protein-coding genes, miRNAs frequently emerge de novo. The genomes of animals and plants harbor hundreds of mutually unrelated families of homologous miRNAs that tend to be persistent throughout evolution. The evolution of their genomic miRNA complement closely correlates with important morphological innovation. In addition, miRNAs have been used as valuable characters in phylogenetic studies. An accurate and comprehensive annotation of miRNAs is required as a basis to understand their impact on phenotypic evolution. Since experimental data on miRNA expression are limited to relatively few species and are subject to unavoidable ascertainment biases, it is inevitable to complement miRNA sequencing by homology based annotation methods. This chapter reviews the state of the art workflows for homology based miRNA annotation, with an emphasis on their limitations and open problems.
Collapse
|
3
|
Velandia-Huerto CA, Fallmann J, Stadler PF. miRNAture-Computational Detection of microRNA Candidates. Genes (Basel) 2021; 12:348. [PMID: 33673400 PMCID: PMC7996739 DOI: 10.3390/genes12030348] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 02/19/2021] [Accepted: 02/20/2021] [Indexed: 12/16/2022] Open
Abstract
Homology-based annotation of short RNAs, including microRNAs, is a difficult problem because their inherently small size limits the available information. Highly sensitive methods, including parameter optimized blast, nhmmer, or cmsearch runs designed to increase sensitivity inevitable lead to large numbers of false positives, which can be detected only by detailed analysis of specific features typical for a RNA family and/or the analysis of conservation patterns in structure-annotated multiple sequence alignments. The miRNAture pipeline implements a workflow specific to animal microRNAs that automatizes homology search and validation steps. The miRNAture pipeline yields very good results for a large number of "typical" miRBase families. However, it also highlights difficulties with atypical cases, in particular microRNAs deriving from repetitive elements and microRNAs with unusual, branched precursor structures and atypical locations of the mature product, which require specific curation by domain experts.
Collapse
Affiliation(s)
- Cristian A. Velandia-Huerto
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, CO-111321 Bogotá, Colombia
- Santa Fe Insitute, Santa Fe, NM 87501, USA
| |
Collapse
|
4
|
Balogh G, Bernhart SH, Stadler PF, Schor J. A probabilistic version of Sankoff's maximum parsimony algorithm. J Bioinform Comput Biol 2020; 18:2050004. [PMID: 32336248 DOI: 10.1142/s0219720020500043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The number of genes belonging to a multi-gene family usually varies substantially over their evolutionary history as a consequence of gene duplications and losses. A first step toward analyzing these histories in detail is the inference of the changes in copy number that take place along the individual edges of the underlying phylogenetic tree. The corresponding maximum parsimony minimizes the total number of changes along the edges of the species tree. Incorrectly determined numbers of family members however may influence the estimates drastically. We therefore augment the analysis by introducing a probabilistic model that also considers suboptimal assignments of changes. Technically, this amounts to a partition function variant of Sankoff's parsimony algorithm. As a showcase application, we reanalyze the gain and loss patterns of metazoan microRNA families. As expected, the differences between the probabilistic and the parsimony method is moderate, in this limit of T→0, i.e. very little tolerance for deviations from parsimony, the total number of reconstructed changes is the same. However, we find that the partition function approach systematically predicts fewer gains and more loss events, showing that the data admit co-optimal solutions among which the parsimony approach selects biased representatives.
Collapse
Affiliation(s)
- Gábor Balogh
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany
| | - Stephan H Bernhart
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, Leipzig Research Center for Civilization Diseases, Leipzig Research Center for Civilization Diseases (LIFE), University Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany.,Max-Planck-Institute for Mathematics in Sciences, Inselstraße 22, D-04109 Leipzig, Germany.,Department of Theoretical Chemistry of the University of Vienna, Währingerstrasse 17, A-1090 Vienna, Austria.,Faculdad de Ciencias, Universidad Nacional de Colombia, Sede Bogotá, Ciudad Universitaria, COL-111321, Bogotá, D.C., Colombia.,Santa Fe Institute, 1399 Hyde Park Road, Santa Fe NM 87501, USA
| | - Jana Schor
- Young Investigators Group Bioinformatics and Transcriptomics, Department of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZ, Permoserstraße 15, D-04318 Leipzig, Germany
| |
Collapse
|