Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024;23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open

Tang T, Liu Y, Zheng B, Li R, Zhang X, Liu Y. Integration of hybrid and self-correction method improves the quality of long-read sequencing data. Brief Funct Genomics 2024;23:249-255. [PMID: 37340778 DOI: 10.1093/bfgp/elad026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 06/04/2023] [Accepted: 06/05/2023] [Indexed: 06/22/2023] Open

Grybchuk D, Galan A, Klocek D, Macedo DH, Wolf YI, Votýpka J, Butenko A, Lukeš J, Neri U, Záhonová K, Kostygov AY, Koonin EV, Yurchenko V. Identification of diverse RNA viruses in Obscuromonas flagellates (Euglenozoa: Trypanosomatidae: Blastocrithidiinae). Virus Evol 2024;10:veae037. [PMID: 38774311 PMCID: PMC11108086 DOI: 10.1093/ve/veae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 04/03/2024] [Accepted: 04/29/2024] [Indexed: 05/24/2024] Open

Affiliation(s)

Danyil Grybchuk Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava 710 00, Czechia Central European Institute of Technology, Masaryk University, Brno 625 00, Czechia
Arnau Galan Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava 710 00, Czechia
Donnamae Klocek Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava 710 00, Czechia
Diego H Macedo Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava 710 00, Czechia
Yuri I Wolf National Center for Biotechnology Information, NLM, National Institutes of Health, Bethesda 20894, USA
Jan Votýpka Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice 370 05, Czechia Department of Parasitology, Faculty of Science, Charles University, Prague 128 00, Czechia
Anzhelika Butenko Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava 710 00, Czechia Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice 370 05, Czechia Faculty of Science, University of South Bohemia, České Budějovice 370 05, Czechia
Julius Lukeš Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice 370 05, Czechia Faculty of Science, University of South Bohemia, České Budějovice 370 05, Czechia
Uri Neri The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv 39040, Israel
Kristína Záhonová Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava 710 00, Czechia Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice 370 05, Czechia Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Vestec 252 50, Czechia Division of Infectious Diseases, Department of Medicine, University of Alberta, Edmonton, Alberta T6G 2G3, Canada
Alexei Yu Kostygov Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava 710 00, Czechia Zoological Institute of the Ruian Academy of Sciences, St. Petersburg 199034, Russia
Eugene V Koonin National Center for Biotechnology Information, NLM, National Institutes of Health, Bethesda 20894, USA
Vyacheslav Yurchenko Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava 710 00, Czechia

Collapse

Eitel M, Osigus H, Brenzinger B, Wörheide G. Beauty in the beast - Placozoan biodiversity explored through molluscan predator genomics. Ecol Evol 2024;14:e11220. [PMID: 38606341 PMCID: PMC11007570 DOI: 10.1002/ece3.11220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 03/17/2024] [Accepted: 03/20/2024] [Indexed: 04/13/2024] Open

Sami A, El-Metwally S, Rashad MZ. MAC-ErrorReads: machine learning-assisted classifier for filtering erroneous NGS reads. BMC Bioinformatics 2024;25:61. [PMID: 38321434 PMCID: PMC10848413 DOI: 10.1186/s12859-024-05681-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 01/29/2024] [Indexed: 02/08/2024] Open

Abstract

BACKGROUND

The rapid advancement of next-generation sequencing (NGS) machines in terms of speed and affordability has led to the generation of a massive amount of biological data at the expense of data quality as errors become more prevalent. This introduces the need to utilize different approaches to detect and filtrate errors, and data quality assurance is moved from the hardware space to the software preprocessing stages.

RESULTS

We introduce MAC-ErrorReads, a novel Machine learning-Assisted Classifier designed for filtering Erroneous NGS Reads. MAC-ErrorReads transforms the erroneous NGS read filtration process into a robust binary classification task, employing five supervised machine learning algorithms. These models are trained on features extracted through the computation of Term Frequency-Inverse Document Frequency (TF_IDF) values from various datasets such as E. coli, GAGE S. aureus, H. Chr14, Arabidopsis thaliana Chr1 and Metriaclima zebra. Notably, Naive Bayes demonstrated robust performance across various datasets, displaying high accuracy, precision, recall, F1-score, MCC, and ROC values. The MAC-ErrorReads NB model accurately classified S. aureus reads, surpassing most error correction tools with a 38.69% alignment rate. For H. Chr14, tools like Lighter, Karect, CARE, Pollux, and MAC-ErrorReads showed rates above 99%. BFC and RECKONER exceeded 98%, while Fiona had 95.78%. For the Arabidopsis thaliana Chr1, Pollux, Karect, RECKONER, and MAC-ErrorReads demonstrated good alignment rates of 92.62%, 91.80%, 91.78%, and 90.87%, respectively. For the Metriaclima zebra, Pollux achieved a high alignment rate of 91.23%, despite having the lowest number of mapped reads. MAC-ErrorReads, Karect, and RECKONER demonstrated good alignment rates of 83.76%, 83.71%, and 83.67%, respectively, while also producing reasonable numbers of mapped reads to the reference genome.

CONCLUSIONS

This study demonstrates that machine learning approaches for filtering NGS reads effectively identify and retain the most accurate reads, significantly enhancing assembly quality and genomic coverage. The integration of genomics and artificial intelligence through machine learning algorithms holds promise for enhancing NGS data quality, advancing downstream data analysis accuracy, and opening new opportunities in genetics, genomics, and personalized medicine research.

Collapse

Lee WK, Chan BKK, Kim JY, Ju SJ, Kim SJ. Comparative genomics reveals the dynamic evolutionary history of cement protein genes of barnacles from intertidal to deep-sea hydrothermal vents. Mol Ecol Resour 2024;24:e13895. [PMID: 37955198 DOI: 10.1111/1755-0998.13895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Revised: 10/16/2023] [Accepted: 10/30/2023] [Indexed: 11/14/2023]

Dimens PV, Jones KL, Margulies D, Scholey V, Cusatti S, McPeak B, Hildahl TE, Saillant EAE. Genomic resources for the Yellowfin tuna Thunnus albacares. Mol Biol Rep 2024;51:232. [PMID: 38281308 DOI: 10.1007/s11033-023-09117-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 12/06/2023] [Indexed: 01/30/2024]

Długosz M, Deorowicz S. Illumina reads correction: evaluation and improvements. Sci Rep 2024;14:2232. [PMID: 38278837 DOI: 10.1038/s41598-024-52386-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 01/18/2024] [Indexed: 01/28/2024] Open

Albanaz ATS, Carrington M, Frolov AO, Ganyukova AI, Gerasimov ES, Kostygov AY, Lukeš J, Malysheva MN, Votýpka J, Zakharova A, Záhonová K, Zimmer SL, Yurchenko V, Butenko A. Shining the spotlight on the neglected: new high-quality genome assemblies as a gateway to understanding the evolution of Trypanosomatidae. BMC Genomics 2023;24:471. [PMID: 37605127 PMCID: PMC10441713 DOI: 10.1186/s12864-023-09591-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 08/15/2023] [Indexed: 08/23/2023] Open

Abstract

BACKGROUND

Protists of the family Trypanosomatidae (phylum Euglenozoa) have gained notoriety as parasites affecting humans, domestic animals, and agricultural plants. However, the true extent of the group's diversity spreads far beyond the medically and veterinary relevant species. We address several knowledge gaps in trypanosomatid research by undertaking sequencing, assembly, and analysis of genomes from previously overlooked representatives of this protistan group.

RESULTS

We assembled genomes for twenty-one trypanosomatid species, with a primary focus on insect parasites and Trypanosoma spp. parasitizing non-human hosts. The assemblies exhibit sizes consistent with previously sequenced trypanosomatid genomes, ranging from approximately 18 Mb for Obscuromonas modryi to 35 Mb for Crithidia brevicula and Zelonia costaricensis. Despite being the smallest, the genome of O. modryi has the highest content of repetitive elements, contributing nearly half of its total size. Conversely, the highest proportion of unique DNA is found in the genomes of Wallacemonas spp., with repeats accounting for less than 8% of the assembly length. The majority of examined species exhibit varying degrees of aneuploidy, with trisomy being the most frequently observed condition after disomy.

CONCLUSIONS

The genome of Obscuromonas modryi represents a very unusual, if not unique, example of evolution driven by two antidromous forces: i) increasing dependence on the host leading to genomic shrinkage and ii) expansion of repeats causing genome enlargement. The observed variation in somy within and between trypanosomatid genera suggests that these flagellates are largely predisposed to aneuploidy and, apparently, exploit it to gain a fitness advantage. High heterogeneity in the genome size, repeat content, and variation in chromosome copy numbers in the newly-sequenced species highlight the remarkable genome plasticity exhibited by trypanosomatid flagellates. These new genome assemblies are a robust foundation for future research on the genetic basis of life cycle changes and adaptation to different hosts in the family Trypanosomatidae.

Collapse

Affiliation(s)

Amanda T S Albanaz Life Science Research Centre, Faculty of Science, University of Ostrava, 710 00, Ostrava, Czech Republic
Mark Carrington Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QW, UK
Alexander O Frolov Zoological Institute of the Russian Academy of Sciences, 199034, St. Petersburg, Russia
Anna I Ganyukova Zoological Institute of the Russian Academy of Sciences, 199034, St. Petersburg, Russia
Evgeny S Gerasimov Faculty of Biology, M. V. Lomonosov Moscow State University, 119991, Moscow, Russia Martsinovsky Institute of Medical Parasitology, Sechenov University, 119435, Moscow, Russia
Alexei Y Kostygov Life Science Research Centre, Faculty of Science, University of Ostrava, 710 00, Ostrava, Czech Republic
Julius Lukeš Institute of Parasitology, Czech Academy of Sciences, 370 05, České Budějovice, Czech Republic Faculty of Sciences, University of South Bohemia, 370 05, České Budějovice, Czech Republic
Marina N Malysheva Zoological Institute of the Russian Academy of Sciences, 199034, St. Petersburg, Russia
Jan Votýpka Institute of Parasitology, Czech Academy of Sciences, 370 05, České Budějovice, Czech Republic Department of Parasitology, Faculty of Science, Charles University, 128 44, Prague, Czech Republic
Alexandra Zakharova Life Science Research Centre, Faculty of Science, University of Ostrava, 710 00, Ostrava, Czech Republic
Kristína Záhonová Life Science Research Centre, Faculty of Science, University of Ostrava, 710 00, Ostrava, Czech Republic Institute of Parasitology, Czech Academy of Sciences, 370 05, České Budějovice, Czech Republic Department of Parasitology, Faculty of Science, Charles University, BIOCEV, 252 50, Vestec, Czech Republic Division of Infectious Diseases, Department of Medicine, University of Alberta, Edmonton, T6G 2G3, Canada
Sara L Zimmer Duluth Campus, University of Minnesota Medical School, Duluth, MN, 55812, USA
Vyacheslav Yurchenko Life Science Research Centre, Faculty of Science, University of Ostrava, 710 00, Ostrava, Czech Republic.
Anzhelika Butenko Life Science Research Centre, Faculty of Science, University of Ostrava, 710 00, Ostrava, Czech Republic. Institute of Parasitology, Czech Academy of Sciences, 370 05, České Budějovice, Czech Republic. Faculty of Sciences, University of South Bohemia, 370 05, České Budějovice, Czech Republic.

Collapse

Francis WR, Eitel M, Vargas S, Garcia-Escudero CA, Conci N, Deister F, Mah JL, Guiglielmoni N, Krebs S, Blum H, Leys SP, Wörheide G. The genome of the reef-building glass sponge Aphrocallistes vastus provides insights into silica biomineralization. ROYAL SOCIETY OPEN SCIENCE 2023;10:230423. [PMID: 37351491 PMCID: PMC10282587 DOI: 10.1098/rsos.230423] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 05/26/2023] [Indexed: 06/24/2023]

Affiliation(s)

Warren R. Francis Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
Michael Eitel Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
Sergio Vargas Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
Catalina A. Garcia-Escudero Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
Nicola Conci Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
Fabian Deister Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
Jasmine L. Mah Department of Biological Sciences, University of Alberta, Edmonton, Canada T6G 2E9
Nadège Guiglielmoni Service Evolution Biologique et Ecologie, Université libre de Bruxelles (ULB), 1050 Brussels, Belgium
Stefan Krebs Laboratory for Functional Genome Analysis (LAFUGA), Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
Helmut Blum Laboratory for Functional Genome Analysis (LAFUGA), Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
Sally P. Leys Department of Biological Sciences, University of Alberta, Edmonton, Canada T6G 2E9
Gert Wörheide Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany GeoBio-Center, Ludwig-Maximilians-Universität München, Munich, Germany Staatliche Naturwissenschaftliche Sammlungen Bayerns (SNSB)–Bayerische Staatssammlung für Paläontologie und Geologie, Munich, Germany

Collapse

Cai X, Lan T, Ping P, Oliver B, Li J. Intra-Host Co-Existing Strains of SARS-CoV-2 Reference Genome Uncovered by Exhaustive Computational Search. Viruses 2023;15:v15051065. [PMID: 37243151 DOI: 10.3390/v15051065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 04/24/2023] [Accepted: 04/24/2023] [Indexed: 05/28/2023] Open

Beres SB, Olsen RJ, Long SW, Eraso JM, Boukthir S, Faili A, Kayal S, Musser JM. Analysis of the Genomics and Mouse Virulence of an Emergent Clone of Streptococcus dysgalactiae Subspecies equisimilis. Microbiol Spectr 2023;11:e0455022. [PMID: 36971562 PMCID: PMC10100674 DOI: 10.1128/spectrum.04550-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 03/04/2023] [Indexed: 03/29/2023] Open

Abstract

Streptococcus dysgalactiae subsp. equisimilis is a bacterial pathogen that is increasingly recognized as a cause of severe human infections. Much less is known about the genomics and infection pathogenesis of S. dysgalactiae subsp. equisimilis strains compared to the closely related bacterium Streptococcus pyogenes. To address these knowledge deficits, we sequenced to closure the genomes of seven S. dysgalactiae subsp. equisimilis human isolates, including six that were emm type stG62647. Recently, for unknown reasons, strains of this emm type have emerged and caused an increasing number of severe human infections in several countries. The genomes of these seven strains vary between 2.15 and 2.21 Mbp. The core chromosomes of these six S. dysgalactiae subsp. equisimilis stG62647 strains are closely related, differing on average by only 495 single-nucleotide polymorphisms, consistent with a recent descent from a common progenitor. The largest source of genetic diversity among these seven isolates is differences in putative mobile genetic elements, both chromosomal and extrachromosomal. Consistent with the epidemiological observations of increased frequency and severity of infections, both stG62647 strains studied were significantly more virulent than a strain of emm type stC74a in a mouse model of necrotizing myositis, as assessed by bacterial CFU burden, lesion size, and survival curves. Taken together, our genomic and pathogenesis data show the strains of emm type stG62647 we studied are closely genetically related and have enhanced virulence in a mouse model of severe invasive disease. Our findings underscore the need for expanded study of the genomics and molecular pathogenesis of S. dysgalactiae subsp. equisimilis strains causing human infections. IMPORTANCE Our studies addressed a critical knowledge gap in understanding the genomics and virulence of the bacterial pathogen Streptococcus dysgalactiae subsp. equisimilis. S. dysgalactiae subsp. equisimilis strains are responsible for a recent increase in severe human infections in some countries. We determined that certain S. dysgalactiae subsp. equisimilis strains are genetically descended from a common ancestor and that these strains can cause severe infections in a mouse model of necrotizing myositis. Our findings highlight the need for expanded studies on the genomics and pathogenic mechanisms of this understudied subspecies of the Streptococcus family.

Collapse

Affiliation(s)

Stephen B. Beres Laboratory of Molecular and Translational Human Infectious Disease Research, Center for Infectious Diseases, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute and Houston Methodist Hospital, Houston, Texas, USA
Randall J. Olsen Laboratory of Molecular and Translational Human Infectious Disease Research, Center for Infectious Diseases, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute and Houston Methodist Hospital, Houston, Texas, USA Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, USA Department of Microbiology and Immunology, Weill Cornell Medical College, New York, New York, USA
S. Wesley Long Laboratory of Molecular and Translational Human Infectious Disease Research, Center for Infectious Diseases, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute and Houston Methodist Hospital, Houston, Texas, USA Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, USA Department of Microbiology and Immunology, Weill Cornell Medical College, New York, New York, USA
Jesus M. Eraso Laboratory of Molecular and Translational Human Infectious Disease Research, Center for Infectious Diseases, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute and Houston Methodist Hospital, Houston, Texas, USA Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, USA
Sarrah Boukthir CHU de Rennes, Service de Bacteriologie-Hygiène Hospitalière, Rennes, France INSERM, CIC 1414, Rennes, France Université Rennes 1, Faculté de Médecine, Rennes, France
Ahmad Faili INSERM, CIC 1414, Rennes, France Université Rennes 1, Faculté de Pharmacie, Rennes, France Chemistry, Oncogenesis, Stress, and Signaling, INSERM 1242, Rennes, France
Samer Kayal CHU de Rennes, Service de Bacteriologie-Hygiène Hospitalière, Rennes, France INSERM, CIC 1414, Rennes, France Université Rennes 1, Faculté de Médecine, Rennes, France Chemistry, Oncogenesis, Stress, and Signaling, INSERM 1242, Rennes, France
James M. Musser Laboratory of Molecular and Translational Human Infectious Disease Research, Center for Infectious Diseases, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute and Houston Methodist Hospital, Houston, Texas, USA Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, USA Department of Microbiology and Immunology, Weill Cornell Medical College, New York, New York, USA

Collapse

Nesterenko M, Miroliubov A. From head to rootlet: comparative transcriptomic analysis of a rhizocephalan barnacle Peltogaster reticulata (Crustacea: Rhizocephala). F1000Res 2023;11:583. [PMID: 36447930 PMCID: PMC9664023 DOI: 10.12688/f1000research.110492.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/04/2023] [Indexed: 01/11/2023] Open

Abstract

Background: Rhizocephalan barnacles stand out in the diverse world of metazoan parasites. The body of a rhizocephalan female is modified beyond revealing any recognizable morphological features, consisting of the interna, a system of rootlets, and the externa, a sac-like reproductive body. Moreover, rhizocephalans have an outstanding ability to control their hosts, literally turning them into "zombies". Despite all these amazing traits, there are no genomic or transcriptomic data about any Rhizocephala. Methods: We collected transcriptomes from four body parts of an adult female rhizocephalan Peltogaster reticulata: the externa, and the main, growing, and thoracic parts of the interna. We used all prepared data for the de novo assembly of the reference transcriptome. Next, a set of encoded proteins was determined, the expression levels of protein-coding genes in different parts of the parasite's body were calculated and lists of enriched bioprocesses were identified. We also in silico identified and analyzed sets of potential excretory / secretory proteins. Finally, we applied phylostratigraphy and evolutionary transcriptomics approaches to our data. Results: The assembled reference transcriptome included transcripts of 12,620 protein-coding genes and was the first for any rhizocephalan. Based on the results obtained, the spatial heterogeneity of protein-coding gene expression in different regions of the adult female body of P. reticulata was established. The results of both transcriptomic analysis and histological studies indicated the presence of germ-like cells in the lumen of the interna. The potential molecular basis of the interaction between the nervous system of the host and the parasite's interna was also determined. Given the prolonged expression of development-associated genes, we suggest that rhizocephalans "got stuck in their metamorphosis", even at the reproductive stage. Conclusions: The results of the first comparative transcriptomic analysis for Rhizocephala not only clarified but also expanded the existing ideas about the biology of these extraordinary parasites.

Collapse

Platova S, Poliushkevich L, Kulakova M, Nesterenko M, Starunov V, Novikova E. Gotta Go Slow: Two Evolutionarily Distinct Annelids Retain a Common Hedgehog Pathway Composition, Outlining Its Pan-Bilaterian Core. Int J Mol Sci 2022;23:ijms232214312. [PMID: 36430788 PMCID: PMC9695228 DOI: 10.3390/ijms232214312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/11/2022] [Accepted: 11/13/2022] [Indexed: 11/19/2022] Open

Kaya Y, Aydın ZU, Cai X, Wang X, Dönmez AA. Genome-wide characterization of two Aubrieta taxa: Aubrieta canescens subsp. canescens and Au. macrostyla (Brassicaceae). AOB PLANTS 2022;14:plac035. [PMID: 36196394 PMCID: PMC9521481 DOI: 10.1093/aobpla/plac035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 09/09/2022] [Indexed: 06/16/2023]

Genome sequence assembly algorithms and misassembly identification methods. Mol Biol Rep 2022;49:11133-11148. [PMID: 36151399 DOI: 10.1007/s11033-022-07919-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 09/05/2022] [Indexed: 10/14/2022]

K-Mer Spectrum-Based Error Correction Algorithm for Next-Generation Sequencing Data. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022;2022:8077664. [PMID: 35875730 PMCID: PMC9303089 DOI: 10.1155/2022/8077664] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 06/13/2022] [Indexed: 11/26/2022]

Tang T, Hutvagner G, Wang W, Li J. Simultaneous compression of multiple error-corrected short-read sets for faster data transmission and better de novo assemblies. Brief Funct Genomics 2022;21:387-398. [PMID: 35848773 DOI: 10.1093/bfgp/elac016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/10/2022] [Accepted: 06/14/2022] [Indexed: 11/14/2022] Open

Abstract

Next-Generation Sequencing has produced incredible amounts of short-reads sequence data for de novo genome assembly over the last decades. For efficient transmission of these huge datasets, high-performance compression algorithms have been intensively studied. As both the de novo assembly and error correction methods utilize the overlaps between reads data, a concern is that the will the sequencing errors bring up negative effects on genome assemblies also affect the compression of the NGS data. This work addresses two problems: how current error correction algorithms can enable the compression algorithms to make the sequence data much more compact, and whether the sequence-modified reads by the error-correction algorithms will lead to quality improvement for de novo contig assembly. As multiple sets of short reads are often produced by a single biomedical project in practice, we propose a graph-based method to reorder the files in the collection of multiple sets and then compress them simultaneously for a further compression improvement after error correction. We use examples to illustrate that accurate error correction algorithms can significantly reduce the number of mismatched nucleotides in the reference-free compression, hence can greatly improve the compression performance. Extensive test on practical collections of multiple short-read sets does confirm that the compression performance on the error-corrected data (with unchanged size) significantly outperforms that on the original data, and that the file reordering idea contributes furthermore. The error correction on the original reads has also resulted in quality improvements of the genome assemblies, sometimes remarkably. However, it is still an open question that how to combine appropriate error correction methods with an assembly algorithm so that the assembly performance can be always significantly improved.

Collapse

Kallenborn F, Cascitti J, Schmidt B. CARE 2.0: reducing false-positive sequencing error corrections using machine learning. BMC Bioinformatics 2022;23:227. [PMID: 35698033 PMCID: PMC9195321 DOI: 10.1186/s12859-022-04754-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/30/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Next-generation sequencing pipelines often perform error correction as a preprocessing step to obtain cleaned input data. State-of-the-art error correction programs are able to reliably detect and correct the majority of sequencing errors. However, they also introduce new errors by making false-positive corrections. These correction mistakes can have negative impact on downstream analysis, such as k-mer statistics, de-novo assembly, and variant calling. This motivates the need for more precise error correction tools.

RESULTS

We present CARE 2.0, a context-aware read error correction tool based on multiple sequence alignment targeting Illumina datasets. In addition to a number of newly introduced optimizations its most significant change is the replacement of CARE 1.0's hand-crafted correction conditions with a novel classifier based on random decision forests trained on Illumina data. This results in up to two orders-of-magnitude fewer false-positive corrections compared to other state-of-the-art error correction software. At the same time, CARE 2.0 is able to achieve high numbers of true-positive corrections comparable to its competitors. On a simulated full human dataset with 914M reads CARE 2.0 generates only 1.2M false positives (FPs) (and 801.4M true positives (TPs)) at a highly competitive runtime while the best corrections achieved by other state-of-the-art tools contain at least 3.9M FPs and at most 814.5M TPs. Better de-novo assembly and improved k-mer analysis show the applicability of CARE 2.0 to real-world data.

CONCLUSION

False-positive corrections can negatively influence down-stream analysis. The precision of CARE 2.0 greatly reduces the number of those corrections compared to other state-of-the-art programs including BFC, Karect, Musket, Bcool, SGA, and Lighter. Thus, higher-quality datasets are produced which improve k-mer analysis and de-novo assembly in real-world datasets which demonstrates the applicability of machine learning techniques in the context of sequencing read error correction. CARE 2.0 is written in C++/CUDA for Linux systems and can be run on the CPU as well as on CUDA-enabled GPUs. It is available at https://github.com/fkallen/CARE .

Collapse

Nesterenko M, Miroliubov A. From head to rootlet: comparative transcriptomic analysis of a rhizocephalan barnacle Peltogaster reticulata (Crustacea: Rhizocephala). F1000Res 2022;11:583. [PMID: 36447930 PMCID: PMC9664023 DOI: 10.12688/f1000research.110492.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/04/2023] [Indexed: 09/16/2023] Open

Abstract

Background: Rhizocephalan barnacles stand out in the diverse world of metazoan parasites. The body of a rhizocephalan female is modified beyond revealing any recognizable morphological features, consisting of the interna, a system of rootlets, and the externa, a sac-like reproductive body. Moreover, rhizocephalans have an outstanding ability to control their hosts, literally turning them into "zombies". Despite all these amazing traits, there are no genomic or transcriptomic data about any Rhizocephala. Methods: We collected transcriptomes from four body parts of an adult female rhizocephalan Peltogaster reticulata: the externa, and the main, growing, and thoracic parts of the interna. We used all prepared data for the de novo assembly of the reference transcriptome. Next, a set of encoded proteins was determined, the expression levels of protein-coding genes in different parts of the parasite's body were calculated and lists of enriched bioprocesses were identified. We also in silico identified and analyzed sets of potential excretory / secretory proteins. Finally, we applied phylostratigraphy and evolutionary transcriptomics approaches to our data. Results: The assembled reference transcriptome included transcripts of 12,620 protein-coding genes and was the first for any rhizocephalan. Based on the results obtained, the spatial heterogeneity of protein-coding gene expression in different regions of the adult female body of P. reticulata was established. The results of both transcriptomic analysis and histological studies indicated the presence of germ-like cells in the lumen of the interna. The potential molecular basis of the interaction between the nervous system of the host and the parasite's interna was also determined. Given the prolonged expression of development-associated genes, we suggest that rhizocephalans "got stuck in their metamorphosis", even at the reproductive stage. Conclusions: The results of the first comparative transcriptomic analysis for Rhizocephala not only clarified but also expanded the existing ideas about the biology of these extraordinary parasites.

Collapse

Kim MJ, Park JS, Kim H, Kim SR, Kim SW, Kim KY, Kwak W, Kim I. Phylogeographic Relationships among Bombyx mandarina (Lepidoptera: Bombycidae) Populations and Their Relationships to B. mori Inferred from Mitochondrial Genomes. BIOLOGY 2022;11:68. [PMID: 35053066 PMCID: PMC8773246 DOI: 10.3390/biology11010068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 12/19/2021] [Accepted: 12/30/2021] [Indexed: 02/01/2023]

Complete Mitochondrial Genomes of Metcalfa pruinosa and Salurnis marginella (Hemiptera: Flatidae): Genomic Comparison and Phylogenetic Inference in Fulgoroidea. Curr Issues Mol Biol 2021;43:1391-1418. [PMID: 34698117 PMCID: PMC8929015 DOI: 10.3390/cimb43030099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 09/25/2021] [Accepted: 09/27/2021] [Indexed: 12/30/2022] Open

Wu J, Zhang S, Zhang T, Liu Y. HD-Code: End-to-End High Density Code for DNA Storage. IEEE Trans Nanobioscience 2021;20:455-463. [PMID: 34343096 DOI: 10.1109/tnb.2021.3102122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Charlesworth D, Graham C, Trivedi U, Gardner J, Bergero R. PromethION sequencing and assembly of the genome of Micropoecilia picta, a fish with a highly Degenerated Y chromosome. Genome Biol Evol 2021;13:6326803. [PMID: 34297069 PMCID: PMC8449826 DOI: 10.1093/gbe/evab171] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open

Zhang X, Ping P, Hutvagner G, Blumenstein M, Li J. Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach. Nucleic Acids Res 2021;49:e106. [PMID: 34291293 PMCID: PMC8631080 DOI: 10.1093/nar/gkab610] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 07/01/2021] [Accepted: 07/06/2021] [Indexed: 12/21/2022] Open

Muthukumarasamy U, Preusse M, Kordes A, Koska M, Schniederjans M, Khaledi A, Häussler S. Single-Nucleotide Polymorphism-Based Genetic Diversity Analysis of Clinical Pseudomonas aeruginosa Isolates. Genome Biol Evol 2021;12:396-406. [PMID: 32196089 PMCID: PMC7197496 DOI: 10.1093/gbe/evaa059] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/19/2020] [Indexed: 01/26/2023] Open

Kallenborn F, Hildebrandt A, Schmidt B. CARE: context-aware sequencing read error correction. Bioinformatics 2021;37:889-895. [PMID: 32818262 DOI: 10.1093/bioinformatics/btaa738] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 07/14/2020] [Accepted: 08/14/2020] [Indexed: 11/14/2022] Open

Zhang X, Liu Y, Yu Z, Blumenstein M, Hutvagner G, Li J. Instance-based error correction for short reads of disease-associated genes. BMC Bioinformatics 2021;22:142. [PMID: 34078284 PMCID: PMC8170817 DOI: 10.1186/s12859-021-04058-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 03/02/2021] [Indexed: 12/12/2022] Open

He K, Eastman TG, Czolacz H, Li S, Shinohara A, Kawada SI, Springer MS, Berenbrink M, Campbell KL. Myoglobin primary structure reveals multiple convergent transitions to semi-aquatic life in the world's smallest mammalian divers. eLife 2021;10:e66797. [PMID: 33949308 PMCID: PMC8205494 DOI: 10.7554/elife.66797] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 05/04/2021] [Indexed: 01/01/2023] Open

Hosseini ZZ, Rahimi SK, Forouzan E, Baraani A. RMI-DBG algorithm: A more agile iterative de Bruijn graph algorithm in short read genome assembly. J Bioinform Comput Biol 2021;19:2150005. [PMID: 33866959 DOI: 10.1142/s0219720021500050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Oh HK, Hwang YJ, Hong HW, Myung H. Comparison of Enterococcus faecalis Biofilm Removal Efficiency among Bacteriophage PBEF129, Its Endolysin, and Cefotaxime. Viruses 2021;13:v13030426. [PMID: 33800040 PMCID: PMC7999683 DOI: 10.3390/v13030426] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 03/01/2021] [Accepted: 03/03/2021] [Indexed: 02/07/2023] Open

Álvarez-Pérez S, Dhami MK, Pozo MI, Crauwels S, Verstrepen KJ, Herrera CM, Lievens B, Jacquemyn H. Genetic admixture increases phenotypic diversity in the nectar yeast Metschnikowia reukaufii. FUNGAL ECOL 2021. [DOI: 10.1016/j.funeco.2020.101016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Fehrer J, Slavíková R, Paštová L, Josefiová J, Mráz P, Chrtek J, Bertrand YJK. Molecular Evolution and Organization of Ribosomal DNA in the Hawkweed Tribe Hieraciinae (Cichorieae, Asteraceae). FRONTIERS IN PLANT SCIENCE 2021;12:647375. [PMID: 33777082 PMCID: PMC7994888 DOI: 10.3389/fpls.2021.647375] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 02/19/2021] [Indexed: 05/14/2023]

Abstract

Molecular evolution of ribosomal DNA can be highly dynamic. Hundreds to thousands of copies in the genome are subject to concerted evolution, which homogenizes sequence variants to different degrees. If well homogenized, sequences are suitable for phylogeny reconstruction; if not, sequence polymorphism has to be handled appropriately. Here we investigate non-coding rDNA sequences (ITS/ETS, 5S-NTS) along with the chromosomal organization of their respective loci (45S and 5S rDNA) in diploids of the Hieraciinae. The subtribe consists of genera Hieracium, Pilosella, Andryala, and Hispidella and has a complex evolutionary history characterized by ancient intergeneric hybridization, allele sharing among species, and incomplete lineage sorting. Direct or cloned Sanger sequences and phased alleles derived from Illumina genome sequencing were subjected to phylogenetic analyses. Patterns of homogenization and tree topologies based on the three regions were compared. In contrast to most other plant groups, 5S-NTS sequences were generally better homogenized than ITS and ETS sequences. A novel case of ancient intergeneric hybridization between Hispidella and Hieracium was inferred, and some further incongruences between the trees were found, suggesting independent evolution of these regions. In some species, homogenization of ITS/ETS and 5S-NTS sequences proceeded in different directions although the 5S rDNA locus always occurred on the same chromosome with one 45S rDNA locus. The ancestral rDNA organization in the Hieraciinae comprised 4 loci of 45S rDNA in terminal positions and 2 loci of 5S rDNA in interstitial positions per diploid genome. In Hieracium, some deviations from this general pattern were found (3, 6, or 7 loci of 45S rDNA; three loci of 5S rDNA). Some of these deviations concerned intraspecific variation, and most of them occurred at the tips of the tree or independently in different lineages. This indicates that the organization of rDNA loci is more dynamic than the evolution of sequences contained in them and that locus number is therefore largely unsuitable to inform about species relationships in Hieracium. No consistent differences in the degree of sequence homogenization and the number of 45S rDNA loci were found, suggesting interlocus concerted evolution.

Collapse

Mycoviral diversity and characteristics of a negative-stranded RNA virus LeNSRV1 in the edible mushroom Lentinula edodes. Virology 2020;555:89-101. [PMID: 33308828 DOI: 10.1016/j.virol.2020.11.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 11/04/2020] [Accepted: 11/12/2020] [Indexed: 11/23/2022]

Bennett EP, Petersen BL, Johansen IE, Niu Y, Yang Z, Chamberlain CA, Met Ö, Wandall HH, Frödin M. INDEL detection, the 'Achilles heel' of precise genome editing: a survey of methods for accurate profiling of gene editing induced indels. Nucleic Acids Res 2020;48:11958-11981. [PMID: 33170255 PMCID: PMC7708060 DOI: 10.1093/nar/gkaa975] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 10/05/2020] [Accepted: 10/15/2020] [Indexed: 12/11/2022] Open

Abstract

Advances in genome editing technologies have enabled manipulation of genomes at the single base level. These technologies are based on programmable nucleases (PNs) that include meganucleases, zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated 9 (Cas9) nucleases and have given researchers the ability to delete, insert or replace genomic DNA in cells, tissues and whole organisms. The great flexibility in re-designing the genomic target specificity of PNs has vastly expanded the scope of gene editing applications in life science, and shows great promise for development of the next generation gene therapies. PN technologies share the principle of inducing a DNA double-strand break (DSB) at a user-specified site in the genome, followed by cellular repair of the induced DSB. PN-elicited DSBs are mainly repaired by the non-homologous end joining (NHEJ) and the microhomology-mediated end joining (MMEJ) pathways, which can elicit a variety of small insertion or deletion (indel) mutations. If indels are elicited in a protein coding sequence and shift the reading frame, targeted gene knock out (KO) can readily be achieved using either of the available PNs. Despite the ease by which gene inactivation in principle can be achieved, in practice, successful KO is not only determined by the efficiency of NHEJ and MMEJ repair; it also depends on the design and properties of the PN utilized, delivery format chosen, the preferred indel repair outcomes at the targeted site, the chromatin state of the target site and the relative activities of the repair pathways in the edited cells. These variables preclude accurate prediction of the nature and frequency of PN induced indels. A key step of any gene KO experiment therefore becomes the detection, characterization and quantification of the indel(s) induced at the targeted genomic site in cells, tissues or whole organisms. In this survey, we briefly review naturally occurring indels and their detection. Next, we review the methods that have been developed for detection of PN-induced indels. We briefly outline the experimental steps and describe the pros and cons of the various methods to help users decide a suitable method for their editing application. We highlight recent advances that enable accurate and sensitive quantification of indel events in cells regardless of their genome complexity, turning a complex pool of different indel events into informative indel profiles. Finally, we review what has been learned about PN-elicited indel formation through the use of the new methods and how this insight is helping to further advance the genome editing field.

Collapse

Nesterenko MA, Starunov VV, Shchenkov SV, Maslova AR, Denisova SA, Granovich AI, Dobrovolskij AA, Khalturin KV. Molecular signatures of the rediae, cercariae and adult stages in the complex life cycles of parasitic flatworms (Digenea: Psilostomatidae). Parasit Vectors 2020;13:559. [PMID: 33168070 PMCID: PMC7653818 DOI: 10.1186/s13071-020-04424-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 10/24/2020] [Indexed: 11/10/2022] Open

Abstract

Background

Parasitic flatworms (Trematoda: Digenea) represent one of the most remarkable examples of drastic morphological diversity among the stages within a life cycle. Which genes are responsible for extreme differences in anatomy, physiology, behavior, and ecology among the stages? Here we report a comparative transcriptomic analysis of parthenogenetic and amphimictic generations in two evolutionary informative species of Digenea belonging to the family Psilostomatidae.

Methods

In this study the transcriptomes of rediae, cercariae and adult worm stages of Psilotrema simillimum and Sphaeridiotrema pseudoglobulus, were sequenced and analyzed. High-quality transcriptomes were generated, and the reference sets of protein-coding genes were used for differential expression analysis in order to identify stage-specific genes. Comparative analysis of gene sets, their expression dynamics and Gene Ontology enrichment analysis were performed for three life stages within each species and between the two species.

Results

Reference transcriptomes for P. simillimum and S. pseudoglobulus include 21,433 and 46,424 sequences, respectively. Among 14,051 orthologous groups (OGs), 1354 are common and specific for two analyzed psilostomatid species, whereas 13 and 43 OGs were unique for P. simillimum and S. pseudoglobulus, respectively. In contrast to P. simillimum, where more than 60% of analyzed genes were active in the redia, cercaria and adult worm stages, in S. pseudoglobulus less than 40% of genes had such a ubiquitous expression pattern. In general, 7805 (36.41%) and 30,622 (65.96%) of genes were preferentially expressed in one of the analyzed stages of P. simillimum and S. pseudoglobulus, respectively. In both species 12 clusters of co-expressed genes were identified, and more than a half of the genes belonging to the reference sets were included into these clusters. Functional specialization of the life cycle stages was clearly supported by Gene Ontology enrichment analysis.

Conclusions

During the life cycles of the two species studied, most of the genes change their expression levels considerably, consequently the molecular signature of a stage is not only a unique set of expressed genes, but also the specific levels of their expression. Our results indicate unexpectedly high level of plasticity in gene regulation between closely related species. Transcriptomes of P. simillimum and S. pseudoglobulus provide high quality reference resource for future evolutionary studies and comparative analyses.

Collapse

Metatranscriptomics by In Situ RNA Stabilization Directly and Comprehensively Revealed Episymbiotic Microbial Communities of Deep-Sea Squat Lobsters. mSystems 2020;5:5/5/e00551-20. [PMID: 33024051 PMCID: PMC8534475 DOI: 10.1128/msystems.00551-20] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open

Abstract

Shinkaia crosnieri is an invertebrate that inhabits an area around deep-sea hydrothermal vents in the Okinawa Trough in Japan by harboring episymbiotic microbes as the primary nutrition. To reveal physiology and phylogenetic composition of the active episymbiotic populations, metatranscriptomics is expected to be a powerful approach. However, this has been hindered by substantial perturbation (e.g., RNA degradation) during time-consuming retrieval from the deep sea. Here, we conducted direct metatranscriptomic analysis of S. crosnieri episymbionts by applying in situ RNA stabilization equipment. As expected, we obtained RNA expression profiles that were substantially different from those obtained by conventional metatranscriptomics (i.e., stabilization after retrieval). The episymbiotic community members were dominated by three orders, namely, Thiotrichales, Methylococcales, and Campylobacterales, and the Campylobacterales members were mostly dominated by the Sulfurovum genus. At a finer phylogenetic scale, the episymbiotic communities on different host individuals shared many species, indicating that the episymbionts on each host individual are not descendants of a few founder cells but are horizontally exchanged. Furthermore, our analysis revealed the key metabolisms of the community: two carbon fixation pathways, a formaldehyde assimilation pathway, and utilization of five electron donors (sulfide, thiosulfate, sulfur, methane, and ammonia) and two electron accepters (oxygen and nitrate/nitrite). Importantly, it was suggested that Thiotrichales episymbionts can utilize intercellular sulfur globules even when sulfur compounds are not usable, possibly also in a detached and free-living state.

IMPORTANCE Deep-sea hydrothermal vent ecosystems remain mysterious. To depict in detail the enigmatic life of chemosynthetic microbes, which are key primary producers in these ecosystems, metatranscriptomic analysis is expected to be a promising approach. However, this has been hindered by substantial perturbation (e.g., RNA degradation) during time-consuming retrieval from the deep sea. In this study, we conducted direct metatranscriptome analysis of microbial episymbionts of deep-sea squat lobsters (Shinkaia crosnieri) by applying in situ RNA stabilization equipment. Compared to conventional metatranscriptomics (i.e., RNA stabilization after retrieval), our method provided substantially different RNA expression profiles. Moreover, we discovered that S. crosnieri and its episymbiotic microbes constitute complex and resilient ecosystems, where closely related but various episymbionts are stably maintained by horizontal exchange and partly by their sulfur storage ability for survival even when sulfur compounds are not usable, likely also in a detached and free-living state.

Collapse

Extreme Viral Partitioning in a Marine-Derived High Arctic Lake. mSphere 2020;5:5/3/e00334-20. [PMID: 32404515 PMCID: PMC7227771 DOI: 10.1128/msphere.00334-20] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open

Abstract

The Arctic is warming at an accelerating pace, and the rise in temperature has increasing impacts on the Arctic biome. Lakes are integrators of their surroundings and thus excellent sentinels of environmental change. Despite their importance in the regulation of key microbial processes, viruses remain largely uncharacterized in Arctic lacustrine environments. We sampled a highly stratified meromictic lake near the northern limit of the Canadian High Arctic, a region in rapid transition due to climate change. We found that the different layers of the lake harbored viral communities that were strikingly dissimilar and highly divergent from known viruses. Viruses were more abundant in the deepest part of the lake containing ancient Arctic Ocean seawater that was trapped during glacial retreat and were genomically unlike any viruses previously described. This research demonstrates the complexity and novelty of viral communities in an environment that is vulnerable to ongoing perturbation.

High-latitude, perennially stratified (meromictic) lakes are likely to be especially vulnerable to climate warming because of the importance of ice in maintaining their water column structure and associated distribution of microbial communities. This study aimed to characterize viral abundance, diversity, and distribution in a meromictic lake of marine origin on the far northern coast of Ellesmere Island, in the Canadian High Arctic. We collected triplicate samples for double-stranded DNA (dsDNA) viromics from five depths that encompassed the major features of the lake, as determined by limnological profiling of the water column. Viral abundance and virus-to-prokaryote ratios were highest at greater depths, while bacterial and cyanobacterial counts were greatest in the surface waters. The viral communities from each zone of the lake defined by salinity, temperature, and dissolved oxygen concentrations were markedly distinct, suggesting that there was little exchange of viral types among lake strata. Ten viral assembled genomes were obtained from our libraries, and these also segregated with depth. This well-defined structure of viral communities was consistent with that of potential hosts. Viruses from the monimolimnion, a deep layer of ancient Arctic Ocean seawater, were more diverse and relatively abundant, with few similarities to available viral sequences. The Lake A viral communities also differed from published records from the Arctic Ocean and meromictic Ace Lake in Antarctica. This first characterization of viral diversity from this sentinel environment underscores the microbial richness and complexity of an ecosystem type that is increasingly exposed to major perturbations in the fast-changing Arctic.

IMPORTANCE The Arctic is warming at an accelerating pace, and the rise in temperature has increasing impacts on the Arctic biome. Lakes are integrators of their surroundings and thus excellent sentinels of environmental change. Despite their importance in the regulation of key microbial processes, viruses remain largely uncharacterized in Arctic lacustrine environments. We sampled a highly stratified meromictic lake near the northern limit of the Canadian High Arctic, a region in rapid transition due to climate change. We found that the different layers of the lake harbored viral communities that were strikingly dissimilar and highly divergent from known viruses. Viruses were more abundant in the deepest part of the lake containing ancient Arctic Ocean seawater that was trapped during glacial retreat and were genomically unlike any viruses previously described. This research demonstrates the complexity and novelty of viral communities in an environment that is vulnerable to ongoing perturbation.

Collapse

Liao X, Li M, Luo J, Zou Y, Wu FX, Pan Y, Luo F, Wang J. Improving de novo Assembly Based on Read Classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:177-188. [PMID: 30059317 DOI: 10.1109/tcbb.2018.2861380] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol 2019;20:278. [PMID: 31842956 PMCID: PMC6912988 DOI: 10.1186/s13059-019-1910-1] [Citation(s) in RCA: 716] [Impact Index Per Article: 143.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 12/02/2019] [Indexed: 11/13/2022] Open

Athena: Automated Tuning of k-mer based Genomic Error Correction Algorithms using Language Models. Sci Rep 2019;9:16157. [PMID: 31695060 PMCID: PMC6834855 DOI: 10.1038/s41598-019-52196-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 10/07/2019] [Indexed: 01/30/2023] Open

Abstract

The performance of most error-correction (EC) algorithms that operate on genomics reads is dependent on the proper choice of its configuration parameters, such as the value of k in k-mer based techniques. In this work, we target the problem of finding the best values of these configuration parameters to optimize error correction and consequently improve genome assembly. We perform this in an adaptive manner, adapted to different datasets and to EC tools, due to the observation that different configuration parameters are optimal for different datasets, i.e., from different platforms and species, and vary with the EC algorithm being applied. We use language modeling techniques from the Natural Language Processing (NLP) domain in our algorithmic suite, Athena, to automatically tune the performance-sensitive configuration parameters. Through the use of N-Gram and Recurrent Neural Network (RNN) language modeling, we validate the intuition that the EC performance can be computed quantitatively and efficiently using the “perplexity” metric, repurposed from NLP. After training the language model, we show that the perplexity metric calculated from a sample of the test (or production) data has a strong negative correlation with the quality of error correction of erroneous NGS reads. Therefore, we use the perplexity metric to guide a hill climbing-based search, converging toward the best configuration parameter value. Our approach is suitable for both de novo and comparative sequencing (resequencing), eliminating the need for a reference genome to serve as the ground truth. We find that Athena can automatically find the optimal value of k with a very high accuracy for 7 real datasets and using 3 different k-mer based EC algorithms, Lighter, Blue, and Racer. The inverse relation between the perplexity metric and alignment rate exists under all our tested conditions—for real and synthetic datasets, for all kinds of sequencing errors (insertion, deletion, and substitution), and for high and low error rates. The absolute value of that correlation is at least 73%. In our experiments, the best value of k found by Athena achieves an alignment rate within 0.53% of the oracle best value of k found through brute force searching (i.e., scanning through the entire range of k values). Athena’s selected value of k lies within the top-3 best k values using N-Gram models and the top-5 best k values using RNN models With best parameter selection by Athena, the assembly quality (NG50) is improved by a Geometric Mean of 4.72X across the 7 real datasets.

Collapse

Chen J, Shang J, Wang J, Sun Y. A binning tool to reconstruct viral haplotypes from assembled contigs. BMC Bioinformatics 2019;20:544. [PMID: 31684876 PMCID: PMC6829986 DOI: 10.1186/s12859-019-3138-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 10/09/2019] [Indexed: 11/21/2022] Open

Abstract

Background

Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despite extensive research on viral diseases. One challenge for producing effective prevention and treatment strategies is high intra-species genetic diversity. As different strains may have different biological properties, characterizing the genetic diversity is thus important to vaccine and drug design. Next-generation sequencing technology enables comprehensive characterization of both known and novel strains and has been widely adopted for sequencing viral populations. However, genome-scale reconstruction of haplotypes is still a challenging problem. In particular, haplotype assembly programs often produce contigs rather than full genomes. As a mutation in one gene can mask the phenotypic effects of a mutation at another locus, clustering these contigs into genome-scale haplotypes is still needed.

Results

We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each group represents a haplotype. Commonly used features based on sequence composition and contig coverage cannot effectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencing coverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to contain mutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with different haplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmark results with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binning for viral haplotype reconstruction.

Conclusions

In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from different viral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. The source codes are available at: https://github.com/chjiao/VirBin.

Collapse

Morisse P, Lecroq T, Lefebvre A. Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph. Bioinformatics 2019;34:4213-4222. [PMID: 29955770 DOI: 10.1093/bioinformatics/bty521] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Accepted: 06/27/2018] [Indexed: 12/31/2022] Open

Jin AH, Muttenthaler M, Dutertre S, Himaya SWA, Kaas Q, Craik DJ, Lewis RJ, Alewood PF. Conotoxins: Chemistry and Biology. Chem Rev 2019;119:11510-11549. [PMID: 31633928 DOI: 10.1021/acs.chemrev.9b00207] [Citation(s) in RCA: 161] [Impact Index Per Article: 32.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Chen J, Zhao Y, Sun Y. De novo haplotype reconstruction in viral quasispecies using paired-end read guided path finding. Bioinformatics 2019;34:2927-2935. [PMID: 29617936 DOI: 10.1093/bioinformatics/bty202] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Accepted: 04/02/2018] [Indexed: 12/29/2022] Open

Chen J, Huang J, Sun Y. TAR-VIR: a pipeline for TARgeted VIRal strain reconstruction from metagenomic data. BMC Bioinformatics 2019;20:305. [PMID: 31164077 PMCID: PMC6549370 DOI: 10.1186/s12859-019-2878-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 05/07/2019] [Indexed: 12/15/2022] Open

Current challenges and solutions of de novo assembly. QUANTITATIVE BIOLOGY 2019. [DOI: 10.1007/s40484-019-0166-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Heydari M, Miclotte G, Van de Peer Y, Fostier J. Illumina error correction near highly repetitive DNA regions improves de novo genome assembly. BMC Bioinformatics 2019;20:298. [PMID: 31159722 PMCID: PMC6545690 DOI: 10.1186/s12859-019-2906-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 05/17/2019] [Indexed: 11/10/2022] Open

Abstract

Background

Several standalone error correction tools have been proposed to correct sequencing errors in Illumina data in order to facilitate de novo genome assembly. However, in a recent survey, we showed that state-of-the-art assemblers often did not benefit from this pre-correction step. We found that many error correction tools introduce new errors in reads that overlap highly repetitive DNA regions such as low-complexity patterns or short homopolymers, ultimately leading to a more fragmented assembly.

Results

We propose BrownieCorrector, an error correction tool for Illumina sequencing data that focuses on the correction of only those reads that overlap short DNA patterns that are highly repetitive in the genome. BrownieCorrector extracts all reads that contain such a pattern and clusters them into different groups using a community detection algorithm that takes into account both the sequence similarity between overlapping reads and their respective paired-end reads. Each cluster holds reads that originate from the same genomic region and hence each cluster can be corrected individually, thus providing a consistent correction for all reads within that cluster.

Conclusions

BrownieCorrector is benchmarked using six real Illumina datasets for different eukaryotic genomes. The prior use of BrownieCorrector improves assembly results over the use of uncorrected reads in all cases. In comparison with other error correction tools, BrownieCorrector leads to the best assembly results in most cases even though less than 2% of the reads within a dataset are corrected. Additionally, we investigate the impact of error correction on hybrid assembly where the corrected Illumina reads are supplemented with PacBio data. Our results confirm that BrownieCorrector improves the quality of hybrid genome assembly as well. BrownieCorrector is written in standard C++11 and released under GPL license. BrownieCorrector relies on multithreading to take advantage of multi-core/multi-CPU systems. The source code is available at https://github.com/biointec/browniecorrector.

Electronic supplementary material

The online version of this article (10.1186/s12859-019-2906-2) contains supplementary material, which is available to authorized users.

Collapse

Transcriptomic-Proteomic Correlation in the Predation-Evoked Venom of the Cone Snail, Conus imperialis. Mar Drugs 2019;17:md17030177. [PMID: 30893765 PMCID: PMC6471084 DOI: 10.3390/md17030177] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 03/12/2019] [Accepted: 03/14/2019] [Indexed: 12/23/2022] Open

Wang W, Schalamun M, Morales-Suarez A, Kainer D, Schwessinger B, Lanfear R. Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case. BMC Genomics 2018;19:977. [PMID: 30594129 PMCID: PMC6311037 DOI: 10.1186/s12864-018-5348-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 12/03/2018] [Indexed: 12/23/2022] Open

Abstract

BACKGROUND

Chloroplasts are organelles that conduct photosynthesis in plant and algal cells. The information chloroplast genome contained is widely used in agriculture and studies of evolution and ecology. Correctly assembling chloroplast genomes can be challenging because the chloroplast genome contains a pair of long inverted repeats (10-30 kb). Typically, it is simply assumed that the gross structure of the chloroplast genome matches the most commonly observed structure of two single-copy regions separated by a pair of inverted repeats. The advent of long-read sequencing technologies should remove the need to make this assumption by providing sufficient information to completely span the inverted repeat regions. Yet, long-reads tend to have higher error rates than short-reads, and relatively little is known about the best way to combine long- and short-reads to obtain the most accurate chloroplast genome assemblies. Using Eucalyptus pauciflora, the snow gum, as a test case, we evaluated the effect of multiple parameters, such as different coverage of long-(Oxford nanopore) and short-(Illumina) reads, different long-read lengths, different assembly pipelines, with a view to determining the most accurate and efficient approach to chloroplast genome assembly.

RESULTS

Hybrid assemblies combining at least 20x coverage of both long-reads and short-reads generated a single contig spanning the entire chloroplast genome with few or no detectable errors. Short-read-only assemblies generated three contigs (the long single copy, short single copy and inverted repeat regions) of the chloroplast genome. These contigs contained few single-base errors but tended to exclude several bases at the beginning or end of each contig. Long-read-only assemblies tended to create multiple contigs with a much higher single-base error rate. The chloroplast genome of Eucalyptus pauciflora is 159,942 bp, contains 131 genes of known function.

CONCLUSIONS

Our results suggest that very accurate assemblies of chloroplast genomes can be achieved using a combination of at least 20x coverage of long- and short-reads respectively, provided that the long-reads contain at least ~5x coverage of reads longer than the inverted repeat region. We show that further increases in coverage give little or no improvement in accuracy, and that hybrid assemblies are more accurate than long-read-only or short-read-only assemblies.

Collapse