Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Dubinkina VB, Ischenko DS, Ulyantsev VI, Tyakht AV, Alexeev DG. Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis. BMC Bioinformatics 2016;17:38. [PMID: 26774270 PMCID: PMC4715287 DOI: 10.1186/s12859-015-0875-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 12/14/2015] [Indexed: 12/11/2022] Open

For:	Dubinkina VB, Ischenko DS, Ulyantsev VI, Tyakht AV, Alexeev DG. Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis. BMC Bioinformatics 2016;17:38. [PMID: 26774270 PMCID: PMC4715287 DOI: 10.1186/s12859-015-0875-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 12/14/2015] [Indexed: 12/11/2022] Open

Number

Cited by Other Article(s)

Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024;23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open

Reynolds G, Mumey B, Strnadova‐Neeley V, Lachowiec J. Hijacking a rapid and scalable metagenomic method reveals subgenome dynamics and evolution in polyploid plants. APPLICATIONS IN PLANT SCIENCES 2024;12:e11581. [PMID: 39184200 PMCID: PMC11342227 DOI: 10.1002/aps3.11581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 11/26/2023] [Accepted: 12/20/2023] [Indexed: 08/27/2024]

Abstract

Premise

The genomes of polyploid plants archive the evolutionary events leading to their present forms. However, plant polyploid genomes present numerous hurdles to the genome comparison algorithms for classification of polyploid types and exploring genome dynamics.

Methods

Here, the problem of intra- and inter-genome comparison for examining polyploid genomes is reframed as a metagenomic problem, enabling the use of the rapid and scalable MinHashing approach. To determine how types of polyploidy are described by this metagenomic approach, plant genomes were examined from across the polyploid spectrum for both k-mer composition and frequency with a range of k-mer sizes. In this approach, no subgenome-specific k-mers are identified; rather, whole-chromosome k-mer subspaces were utilized.

Results

Given chromosome-scale genome assemblies with sufficient subgenome-specific repetitive element content, literature-verified subgenomic and genomic evolutionary relationships were revealed, including distinguishing auto- from allopolyploidy and putative progenitor genome assignment. The sequences responsible were the rapidly evolving landscape of transposable elements. An investigation into the MinHashing parameters revealed that the downsampled k-mer space (genomic signatures) produced excellent approximations of sequence similarity. Furthermore, the clustering approach used for comparison of the genomic signatures is scrutinized to ensure applicability of the metagenomics-based method.

Discussion

The easily implementable and highly computationally efficient MinHashing-based sequence comparison strategy enables comparative subgenomics and genomics for large and complex polyploid plant genomes. Such comparisons provide evidence for polyploidy-type subgenomic assignments. In cases where subgenome-specific repeat signal may not be adequate given a chromosomes' global k-mer profile, alternative methods that are more specific but more computationally complex outperform this approach.

Collapse

Roberts M, Josephs EB. Previously unmeasured genetic diversity explains part of Lewontin's paradox in a k-mer-based meta-analysis of 112 plant species. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.17.594778. [PMID: 38798362 PMCID: PMC11118579 DOI: 10.1101/2024.05.17.594778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]

Tian Q, Zhang P, Zhai Y, Wang Y, Zou Q. Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data. Genome Biol Evol 2024;16:evae102. [PMID: 38748485 PMCID: PMC11135637 DOI: 10.1093/gbe/evae102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2024] [Indexed: 05/30/2024] Open

Abstract

The advent of high-throughput sequencing technologies has not only revolutionized the field of bioinformatics but has also heightened the demand for efficient taxonomic classification. Despite technological advancements, efficiently processing and analyzing the deluge of sequencing data for precise taxonomic classification remains a formidable challenge. Existing classification approaches primarily fall into two categories, database-based methods and machine learning methods, each presenting its own set of challenges and advantages. On this basis, the aim of our study was to conduct a comparative analysis between these two methods while also investigating the merits of integrating multiple database-based methods. Through an in-depth comparative study, we evaluated the performance of both methodological categories in taxonomic classification by utilizing simulated data sets. Our analysis revealed that database-based methods excel in classification accuracy when backed by a rich and comprehensive reference database. Conversely, while machine learning methods show superior performance in scenarios where reference sequences are sparse or lacking, they generally show inferior performance compared with database methods under most conditions. Moreover, our study confirms that integrating multiple database-based methods does, in fact, enhance classification accuracy. These findings shed new light on the taxonomic classification of high-throughput sequencing data and bear substantial implications for the future development of computational biology. For those interested in further exploring our methods, the source code of this study is publicly available on https://github.com/LoadStar822/Genome-Classifier-Performance-Evaluator. Additionally, a dedicated webpage showcasing our collected database, data sets, and various classification software can be found at http://lab.malab.cn/~tqz/project/taxonomic/.

Collapse

Ponsero AJ, Miller M, Hurwitz BL. Comparison of k-mer-based de novo comparative metagenomic tools and approaches. MICROBIOME RESEARCH REPORTS 2023;2:27. [PMID: 38058765 PMCID: PMC10696585 DOI: 10.20517/mrr.2023.26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 06/28/2023] [Accepted: 07/12/2023] [Indexed: 12/08/2023]

Abstract

Aim: Comparative metagenomic analysis requires measuring a pairwise similarity between metagenomes in the dataset. Reference-based methods that compute a beta-diversity distance between two metagenomes are highly dependent on the quality and completeness of the reference database, and their application on less studied microbiota can be challenging. On the other hand, de-novo comparative metagenomic methods only rely on the sequence composition of metagenomes to compare datasets. While each one of these approaches has its strengths and limitations, their comparison is currently limited. Methods: We developed sets of simulated short-reads metagenomes to (1) compare k-mer-based and taxonomy-based distances and evaluate the impact of technical and biological variables on these metrics and (2) evaluate the effect of k-mer sketching and filtering. We used a real-world metagenomic dataset to provide an overview of the currently available tools for de novo metagenomic comparative analysis. Results: Using simulated metagenomes of known composition and controlled error rate, we showed that k-mer-based distance metrics were well correlated to the taxonomic distance metric for quantitative Beta-diversity metrics, but the correlation was low for presence/absence distances. The community complexity in terms of taxa richness and the sequencing depth significantly affected the quality of the k-mer-based distances, while the impact of low amounts of sequence contamination and sequencing error was limited. Finally, we benchmarked currently available de-novo comparative metagenomic tools and compared their output on two datasets of fecal metagenomes and showed that most k-mer-based tools were able to recapitulate the data structure observed using taxonomic approaches. Conclusion: This study expands our understanding of the strength and limitations of k-mer-based de novo comparative metagenomic approaches and aims to provide concrete guidelines for researchers interested in applying these approaches to their metagenomic datasets.

Collapse

Price C, Russell JA. AMAnD: an automated metagenome anomaly detection methodology utilizing DeepSVDD neural networks. Front Public Health 2023;11:1181911. [PMID: 37497030 PMCID: PMC10368493 DOI: 10.3389/fpubh.2023.1181911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 06/12/2023] [Indexed: 07/28/2023] Open

Abstract

The composition of metagenomic communities within the human body often reflects localized medical conditions such as upper respiratory diseases and gastrointestinal diseases. Fast and accurate computational tools to flag anomalous metagenomic samples from typical samples are desirable to understand different phenotypes, especially in contexts where repeated, long-duration temporal sampling is done. Here, we present Automated Metagenome Anomaly Detection (AMAnD), which utilizes two types of Deep Support Vector Data Description (DeepSVDD) models; one trained on taxonomic feature space output by the Pan-Genomics for Infectious Agents (PanGIA) taxonomy classifier and one trained on kmer frequency counts. AMAnD's semi-supervised one-class approach makes no assumptions about what an anomaly may look like, allowing the flagging of potentially novel anomaly types. Three diverse datasets are profiled. The first dataset is hosted on the National Center for Biotechnology Information's (NCBI) Sequence Read Archive (SRA) and contains nasopharyngeal swabs from healthy and COVID-19-positive patients. The second dataset is also hosted on SRA and contains gut microbiome samples from normal controls and from patients with slow transit constipation (STC). AMAnD can learn a typical healthy nasopharyngeal or gut microbiome profile and reliably flag the anomalous COVID+ or STC samples in both feature spaces. The final dataset is a synthetic metagenome created by the Critical Assessment of Metagenome Annotation Simulator (CAMISIM). A control dataset of 50 well-characterized organisms was submitted to CAMISIM to generate 100 synthetic control class samples. The experimental conditions included 12 different spiked-in contaminants that are taxonomically similar to organisms present in the laboratory blank sample ranging from one strain tree branch taxonomic distance away to one family tree branch taxonomic distance away. This experiment was repeated in triplicate at three different coverage levels to probe the dependence on sample coverage. AMAnD was again able to flag the contaminant inserts as anomalous. AMAnD's assumption-free flagging of metagenomic anomalies, the real-time model training update potential of the deep learning approach, and the strong performance even with lightweight models of low sample cardinality would make AMAnD well-suited to a wide array of applied metagenomics biosurveillance use-cases, from environmental to clinical utility.

Collapse

Simons AL, Theroux S, Osborne M, Nuzhdin S, Mazor R, Steele J. Zeta diversity patterns in metabarcoded lotic algal assemblages as a tool for bioassessment. ECOLOGICAL APPLICATIONS : A PUBLICATION OF THE ECOLOGICAL SOCIETY OF AMERICA 2023;33:e2812. [PMID: 36708145 DOI: 10.1002/eap.2812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 12/07/2022] [Accepted: 12/20/2022] [Indexed: 06/18/2023]

Abstract

Assessments of the ecological health of algal assemblages in streams typically focus on measures of their local diversity and classify individuals by morphotaxonomy. Such assemblages are often connected through various ecological processes, such as dispersal, and may be more accurately assessed as components of regional-, rather than local-scale assemblages. With recent declines in the costs of sequencing and computation, it has also become increasingly feasible to use metabarcoding to more accurately classify algal species and perform regional-scale bioassessments. Recently, zeta diversity has been explored as a novel method of constructing regional bioassessments for groups of streams. Here, we model the use of zeta diversity to investigate whether stream health can be determined by the landscape diversity of algal assemblages. We also compare the use of DNA metabarcoding and morphotaxonomy classifications in these zeta diversity-based bioassessments of regional stream health. From 96 stream samples in California, we used various orders of zeta diversity to construct models of biotic integrity for multiple assemblages of diatoms, as well as hybrid assemblages of diatoms in combination with soft-bodied algae, using taxonomy data generated with both DNA sequencing as well as traditional morphotaxonomic approaches. We compared our ability to evaluate the ecological health of streams with the performance of multiple algal indices of biological condition. Our zeta diversity-based models of regional biotic integrity were more strongly correlated with existing indices for algal assemblages classified using metabarcoding compared to morphotaxonomy. Metabarcoding for diatoms and hybrid algal assemblages involved rbcL and 18S V9 primers, respectively. Importantly, we also found that these algal assemblages, independent of the classification method, are more likely to be assembled under a process of niche differentiation rather than stochastically. Taken together, these results suggest the potential for zeta diversity patterns of algal assemblages classified using metabarcoding to inform stream bioassessments.

Collapse

Pradhan UK, Meher PK, Naha S, Rao AR, Gupta A. ASLncR: a novel computational tool for prediction of abiotic stress-responsive long non-coding RNAs in plants. Funct Integr Genomics 2023;23:113. [PMID: 37000299 DOI: 10.1007/s10142-023-01040-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 03/23/2023] [Accepted: 03/24/2023] [Indexed: 04/01/2023]

Abstract

Abiotic stresses are detrimental to plant growth and development and have a major negative impact on crop yields. A growing body of evidence indicates that a large number of long non-coding RNAs (lncRNAs) are key to many abiotic stress responses. Thus, identifying abiotic stress-responsive lncRNAs is essential in crop breeding programs in order to develop crop cultivars resistant to abiotic stresses. In this study, we have developed the first machine learning-based computational model for predicting abiotic stress-responsive lncRNAs. The lncRNA sequences which were responsive and non-responsive to abiotic stresses served as the two classes of the dataset for binary classification using the machine learning algorithms. The training dataset was created using 263 stress-responsive and 263 non-stress-responsive sequences, whereas the independent test set consists of 101 sequences from both classes. As the machine learning model can adopt only the numeric data, the Kmer features ranging from sizes 1 to 6 were utilized to represent lncRNAs in numeric form. To select important features, four different feature selection strategies were utilized. Among the seven learning algorithms, the support vector machine (SVM) achieved the highest cross-validation accuracy with the selected feature sets. The observed 5-fold cross-validation accuracy, AU-ROC, and AU-PRC were found to be 68.84, 72.78, and 75.86%, respectively. Furthermore, the robustness of the developed model (SVM with the selected feature) was evaluated using an independent test dataset, where the overall accuracy, AU-ROC, and AU-PRC were found to be 76.23, 87.71, and 88.49%, respectively. The developed computational approach was also implemented in an online prediction tool ASLncR accessible at https://iasri-sg.icar.gov.in/aslncr/ . The proposed computational model and the developed prediction tool are believed to supplement the existing effort for the identification of abiotic stress-responsive lncRNAs in plants.

Collapse

Panda A, Tuller T. Determinants of associations between codon and amino acid usage patterns of microbial communities and the environment inferred based on a cross-biome metagenomic analysis. NPJ Biofilms Microbiomes 2023;9:5. [PMID: 36693851 PMCID: PMC9873608 DOI: 10.1038/s41522-023-00372-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 01/11/2023] [Indexed: 01/25/2023] Open

Zhai H, Fukuyama J. A convenient correspondence between k-mer-based metagenomic distances and phylogenetically-informed β-diversity measures. PLoS Comput Biol 2023;19:e1010821. [PMID: 36608056 PMCID: PMC9879504 DOI: 10.1371/journal.pcbi.1010821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 01/26/2023] [Accepted: 12/16/2022] [Indexed: 01/07/2023] Open

Xie XH, Huang YJ, Han GS, Yu ZG, Ma YL. Microbial characterization based on multifractal analysis of metagenomes. Front Cell Infect Microbiol 2023;13:1117421. [PMID: 36779183 PMCID: PMC9910082 DOI: 10.3389/fcimb.2023.1117421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 01/09/2023] [Indexed: 01/28/2023] Open

Chakoory O, Comtet-Marre S, Peyret P. RiboTaxa: combined approaches for rRNA genes taxonomic resolution down to the species level from metagenomics data revealing novelties. NAR Genom Bioinform 2022;4:lqac070. [PMID: 36159175 PMCID: PMC9492272 DOI: 10.1093/nargab/lqac070] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 08/04/2022] [Accepted: 08/31/2022] [Indexed: 11/13/2022] Open

Strain identification and quantitative analysis in microbial communities. J Mol Biol 2022;434:167582. [DOI: 10.1016/j.jmb.2022.167582] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 03/31/2022] [Accepted: 04/03/2022] [Indexed: 12/14/2022]

Vanni C, Schechter MS, Acinas SG, Barberán A, Buttigieg PL, Casamayor EO, Delmont TO, Duarte CM, Eren AM, Finn RD, Kottmann R, Mitchell A, Sánchez P, Siren K, Steinegger M, Gloeckner FO, Fernàndez-Guerra A. Unifying the known and unknown microbial coding sequence space. eLife 2022;11:e67667. [PMID: 35356891 PMCID: PMC9132574 DOI: 10.7554/elife.67667] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/30/2022] [Indexed: 12/02/2022] Open

Affiliation(s)

Chiara Vanni Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany Jacobs University BremenBremenGermany
Matthew S Schechter Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany Department of Medicine, University of ChicagoChicagoUnited States
Silvia G Acinas Department of Marine Biology and Oceanography, Institut de Ciències del Mar (CSIC)BarcelonaSpain
Albert Barberán Department of Environmental Science, University of ArizonaTucsonUnited States
Pier Luigi Buttigieg Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Alfred Wegener InstituteBremerhavenGermany
Emilio O Casamayor Center for Advanced Studies of Blanes CEAB-CSIC, Spanish Council for ResearchBlanesSpain
Tom O Delmont Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-SaclayEvryFrance
Carlos M Duarte Red Sea Research Centre and Computational Bioscience Research Center, King Abdullah University of Science and TechnologyThuwalSaudi Arabia
A Murat Eren Department of Medicine, University of ChicagoChicagoUnited States Josephine Bay Paul Center, Marine Biological LaboratoryWoods HoleUnited States
Robert D Finn European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome CampusHinxtonUnited Kingdom
Renzo Kottmann Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany
Alex Mitchell European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome CampusHinxtonUnited Kingdom
Pablo Sánchez Department of Marine Biology and Oceanography, Institut de Ciències del Mar (CSIC)BarcelonaSpain
Kimmo Siren Section for Evolutionary Genomics, The GLOBE Institute, University of CopenhagenCopenhagenDenmark
Martin Steinegger School of Biological Sciences, Seoul National UniversitySeoulRepublic of Korea Institute of Molecular Biology and Genetics, Seoul National UniversitySeoulRepublic of Korea
Frank Oliver Gloeckner Jacobs University BremenBremenGermany University of Bremen and Life Sciences and ChemistryBremenGermany Computing Center, Helmholtz Center for Polar and Marine ResearchBremerhavenGermany
Antonio Fernàndez-Guerra Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of CopenhagenCopenhagenDenmark

Collapse

Bennett C, Thornton M, Park C, Henry G, Zhang Y, Malladi V, Kim D. SeqWho: reliable, rapid determination of sequence file identity using k-mer frequencies in Random Forest classifiers. Bioinformatics 2022;38:1830-1837. [PMID: 35134110 PMCID: PMC8963323 DOI: 10.1093/bioinformatics/btac050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 01/12/2022] [Accepted: 01/26/2022] [Indexed: 02/05/2023] Open

Tay AP, Hosking B, Hosking C, Bauer DC, Wilson LO. INSIDER: alignment-free detection of foreign DNA sequences. Comput Struct Biotechnol J 2021;19:3810-3816. [PMID: 34285780 PMCID: PMC8273350 DOI: 10.1016/j.csbj.2021.06.045] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 06/28/2021] [Accepted: 06/28/2021] [Indexed: 11/21/2022] Open

Holding ML, Strickland JL, Rautsaw RM, Hofmann EP, Mason AJ, Hogan MP, Nystrom GS, Ellsworth SA, Colston TJ, Borja M, Castañeda-Gaytán G, Grünwald CI, Jones JM, Freitas-de-Sousa LA, Viala VL, Margres MJ, Hingst-Zaher E, Junqueira-de-Azevedo ILM, Moura-da-Silva AM, Grazziotin FG, Gibbs HL, Rokyta DR, Parkinson CL. Phylogenetically diverse diets favor more complex venoms in North American pitvipers. Proc Natl Acad Sci U S A 2021;118:e2015579118. [PMID: 33875585 PMCID: PMC8092465 DOI: 10.1073/pnas.2015579118] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Affiliation(s)

Matthew L Holding Department of Biological Sciences, Clemson University, Clemson, SC 29634; Department of Biological Science, Florida State University, Tallahassee, FL 32306
Jason L Strickland Department of Biological Sciences, Clemson University, Clemson, SC 29634
Rhett M Rautsaw Department of Biological Sciences, Clemson University, Clemson, SC 29634
Erich P Hofmann Department of Biological Sciences, Clemson University, Clemson, SC 29634
Andrew J Mason Department of Biological Sciences, Clemson University, Clemson, SC 29634 Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH 43210
Michael P Hogan Department of Biological Science, Florida State University, Tallahassee, FL 32306
Gunnar S Nystrom Department of Biological Science, Florida State University, Tallahassee, FL 32306
Schyler A Ellsworth Department of Biological Science, Florida State University, Tallahassee, FL 32306
Timothy J Colston Department of Biological Science, Florida State University, Tallahassee, FL 32306
Miguel Borja Facultad de Ciencias Biológicas, Universidad Juárez del Estado de Durango, C.P. 35010 Gómez Palacio, Dgo., Mexico
Gamaliel Castañeda-Gaytán Facultad de Ciencias Biológicas, Universidad Juárez del Estado de Durango, C.P. 35010 Gómez Palacio, Dgo., Mexico
Christoph I Grünwald HERP.MX A.C., Villa del Álvarez, Colima 28973, Mexico
Jason M Jones HERP.MX A.C., Villa del Álvarez, Colima 28973, Mexico
Luciana A Freitas-de-Sousa Laboratório de Imunopatologia, Instituto Butantan, São Paulo 05503-900, Brazil
Vincent Louis Viala Laboratório de Toxinologia Aplicada, Instituto Butantan, São Paulo 05503-900, Brazil Center of Toxins, Immune-Response and Cell Signaling, São Paulo 05503-900, Brazil
Mark J Margres Department of Biological Sciences, Clemson University, Clemson, SC 29634 Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138
Erika Hingst-Zaher Museu Biológico, Instituto Butantan, São Paulo 05503-900, Brazil
Inácio L M Junqueira-de-Azevedo Laboratório de Toxinologia Aplicada, Instituto Butantan, São Paulo 05503-900, Brazil Center of Toxins, Immune-Response and Cell Signaling, São Paulo 05503-900, Brazil
Ana M Moura-da-Silva Laboratório de Imunopatologia, Instituto Butantan, São Paulo 05503-900, Brazil Instituto de Pesquisa Clínica Carlos Borborema, Fundação de Medicina Tropical Doutor Heitor Vieira Dourado, Manaus 69040, Brazil
Felipe G Grazziotin Laboratório de Coleções Zoológicas, Instituto Butantan, São Paulo 05503-900, Brazil
H Lisle Gibbs Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH 43210
Darin R Rokyta Department of Biological Science, Florida State University, Tallahassee, FL 32306
Christopher L Parkinson Department of Biological Sciences, Clemson University, Clemson, SC 29634; Department of Forestry and Environmental Conservation, Clemson University, Clemson, SC 29634

Collapse

Bakhtiari M, Park J, Ding YC, Shleizer-Burko S, Neuhausen SL, Halldórsson BV, Stefánsson K, Gymrek M, Bafna V. Variable number tandem repeats mediate the expression of proximal genes. Nat Commun 2021;12:2075. [PMID: 33824302 PMCID: PMC8024321 DOI: 10.1038/s41467-021-22206-z] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open

Bize A, Midoux C, Mariadassou M, Schbath S, Forterre P, Da Cunha V. Exploring short k-mer profiles in cells and mobile elements from Archaea highlights the major influence of both the ecological niche and evolutionary history. BMC Genomics 2021;22:186. [PMID: 33726663 PMCID: PMC7962313 DOI: 10.1186/s12864-021-07471-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 02/24/2021] [Indexed: 12/16/2022] Open

Abstract

BACKGROUND

K-mer-based methods have greatly advanced in recent years, largely driven by the realization of their biological significance and by the advent of next-generation sequencing. Their speed and their independence from the annotation process are major advantages. Their utility in the study of the mobilome has recently emerged and they seem a priori adapted to the patchy gene distribution and the lack of universal marker genes of viruses and plasmids. To provide a framework for the interpretation of results from k-mer based methods applied to archaea or their mobilome, we analyzed the 5-mer DNA profiles of close to 600 archaeal cells, viruses and plasmids. Archaea is one of the three domains of life. Archaea seem enriched in extremophiles and are associated with a high diversity of viral and plasmid families, many of which are specific to this domain. We explored the dataset structure by multivariate and statistical analyses, seeking to identify the underlying factors.

RESULTS

For cells, the 5-mer profiles were inconsistent with the phylogeny of archaea. At a finer taxonomic level, the influence of the taxonomy and the environmental constraints on 5-mer profiles was very strong. These two factors were interdependent to a significant extent, and the respective weights of their contributions varied according to the clade. A convergent adaptation was observed for the class Halobacteria, for which a strong 5-mer signature was identified. For mobile elements, coevolution with the host had a clear influence on their 5-mer profile. This enabled us to identify one previously known and one new case of recent host transfer based on the atypical composition of the mobile elements involved. Beyond the effect of coevolution, extrachromosomal elements strikingly retain the specific imprint of their own viral or plasmid taxonomic family in their 5-mer profile.

CONCLUSION

This specific imprint confirms that the evolution of extrachromosomal elements is driven by multiple parameters and is not restricted to host adaptation. In addition, we detected only recent host transfer events, suggesting the fast evolution of short k-mer profiles. This calls for caution when using k-mers for host prediction, metagenomic binning or phylogenetic reconstruction.

Collapse

Kirk JM, Sprague D, Calabrese JM. Classification of Long Noncoding RNAs by k-mer Content. Methods Mol Biol 2021;2254:41-60. [PMID: 33326069 PMCID: PMC7850294 DOI: 10.1007/978-1-0716-1158-6_4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]

Yang Z, Li H, Jia Y, Zheng Y, Meng H, Bao T, Li X, Luo L. Intrinsic laws of k-mer spectra of genome sequences and evolution mechanism of genomes. BMC Evol Biol 2020;20:157. [PMID: 33228538 PMCID: PMC7684957 DOI: 10.1186/s12862-020-01723-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 11/10/2020] [Indexed: 11/17/2022] Open

Kirzhner V, Toledano-Kitai D, Volkovich Z. Evaluating the number of different genomes in a metagenome by means of the compositional spectra approach. PLoS One 2020;15:e0237205. [PMID: 33156862 PMCID: PMC7647110 DOI: 10.1371/journal.pone.0237205] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Accepted: 10/22/2020] [Indexed: 01/02/2023] Open

Comin M, Di Camillo B, Pizzi C, Vandin F. Comparison of microbiome samples: methods and computational challenges. Brief Bioinform 2020;22:88-95. [PMID: 32577746 DOI: 10.1093/bib/bbaa121] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 05/09/2020] [Accepted: 05/18/2020] [Indexed: 12/14/2022] Open

Beier S, Ulpinnis C, Schwalbe M, Münch T, Hoffie R, Koeppel I, Hertig C, Budhagatapalli N, Hiekel S, Pathi KM, Hensel G, Grosse M, Chamas S, Gerasimova S, Kumlehn J, Scholz U, Schmutzer T. Kmasker plants - a tool for assessing complex sequence space in plant species. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020;102:631-642. [PMID: 31823436 DOI: 10.1111/tpj.14645] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 11/27/2019] [Accepted: 11/28/2019] [Indexed: 06/10/2023]

Affiliation(s)

Sebastian Beier Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Chris Ulpinnis Leibniz Institute of Plant Biochemistry, Bioinformatics and Scientific Data, 06120, Halle, Germany
Markus Schwalbe Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Thomas Münch Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Robert Hoffie Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Iris Koeppel Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Christian Hertig Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Nagaveni Budhagatapalli Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Stefan Hiekel Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Krishna M Pathi Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Goetz Hensel Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Martin Grosse Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Sindy Chamas Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Sophia Gerasimova Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Jochen Kumlehn Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Uwe Scholz Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
Thomas Schmutzer Department of Natural Sciences III, Institute for Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, 06120, Halle, Germany

Collapse

Peng H. CFSP: a collaborative frequent sequence pattern discovery algorithm for nucleic acid sequence classification. PeerJ 2020;8:e8965. [PMID: 32341900 PMCID: PMC7179567 DOI: 10.7717/peerj.8965] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 03/24/2020] [Indexed: 12/19/2022] Open

Abstract

BACKGROUND

Conserved nucleic acid sequences play an essential role in transcriptional regulation. The motifs/templates derived from nucleic acid sequence datasets are usually used as biomarkers to predict biochemical properties such as protein binding sites or to identify specific non-coding RNAs. In many cases, template-based nucleic acid sequence classification performs better than some feature extraction methods, such as N-gram and k-spaced pairs classification. The availability of large-scale experimental data provides an unprecedented opportunity to improve motif extraction methods. The process for pattern extraction from large-scale data is crucial for the creation of predictive models.

METHODS

In this article, a Teiresias-like feature extraction algorithm to discover frequent sub-sequences (CFSP) is proposed. Although gaps are allowed in some motif discovery algorithms, the distance and number of gaps are limited. The proposed algorithm can find frequent sequence pairs with a larger gap. The combinations of frequent sub-sequences in given protracted sequences capture the long-distance correlation, which implies a specific molecular biological property. Hence, the proposed algorithm intends to discover the combinations. A set of frequent sub-sequences derived from nucleic acid sequences with order is used as a base frequent sub-sequence array. The mutation information is attached to each sub-sequence array to implement fuzzy matching. Thus, a mutate records a single nucleotide variant or nucleotides insertion/deletion (indel) to encode a slight difference between frequent sequences and a matched subsequence of a sequence under investigation.

CONCLUSIONS

The proposed algorithm has been validated with several nucleic acid sequence prediction case studies. These data demonstrate better results than the recently available feature descriptors based methods based on experimental data sets such as miRNA, piRNA, and Sigma 54 promoters. CFSP is implemented in C++ and shell script; the source code and related data are available at https://github.com/HePeng2016/CFSP.

Collapse

Goussarov G, Cleenwerck I, Mysara M, Leys N, Monsieurs P, Tahon G, Carlier A, Vandamme P, Van Houdt R. PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing. Bioinformatics 2020;36:2337-2344. [PMID: 31899493 PMCID: PMC7178395 DOI: 10.1093/bioinformatics/btz964] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 11/21/2019] [Accepted: 12/30/2019] [Indexed: 11/13/2022] Open

Sun JH, Ai SM, Liu SQ. Methylation-driven model for analysis of dinucleotide evolution in genomes. Theor Biol Med Model 2020;17:3. [PMID: 32264909 PMCID: PMC7140373 DOI: 10.1186/s12976-020-00122-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 03/10/2020] [Indexed: 11/16/2022] Open

LaPierre N, Ju CJT, Zhou G, Wang W. MetaPheno: A critical evaluation of deep learning and machine learning in metagenome-based disease prediction. Methods 2019;166:74-82. [PMID: 30885720 PMCID: PMC6708502 DOI: 10.1016/j.ymeth.2019.03.003] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 02/14/2019] [Accepted: 03/04/2019] [Indexed: 01/21/2023] Open

Dougan TJ, Quake SR. Viral taxonomy derived from evolutionary genome relationships. PLoS One 2019;14:e0220440. [PMID: 31412051 PMCID: PMC6693820 DOI: 10.1371/journal.pone.0220440] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Accepted: 07/16/2019] [Indexed: 11/23/2022] Open

Rowe WPM, Carrieri AP, Alcon-Giner C, Caim S, Shaw A, Sim K, Kroll JS, Hall LJ, Pyzer-Knapp EO, Winn MD. Streaming histogram sketching for rapid microbiome analytics. MICROBIOME 2019;7:40. [PMID: 30878035 PMCID: PMC6420756 DOI: 10.1186/s40168-019-0653-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 03/01/2019] [Indexed: 06/09/2023]

Abstract

BACKGROUND

The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for tyrhe compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time.

RESULTS

We apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed 'histosketch' that can efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using the pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we use a 'real life' example to show that histosketches can train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a random forest classifier that could accurately predict whether the neonate had received antibiotic treatment (97% accuracy, 96% precision) and could subsequently be used to classify microbiome data streams in less than 3 s.

CONCLUSIONS

Our method offers a new approach to rapidly process microbiome data streams, allowing samples to be rapidly clustered, indexed and classified. We also provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2 GB microbiome in 50 s on a standard laptop using four cores, with the sketch occupying 3000 bytes of disk space. ( https://github.com/will-rowe/hulk ).

Collapse

Choi I, Ponsero AJ, Bomhoff M, Youens-Clark K, Hartman JH, Hurwitz BL. Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons. Gigascience 2019;8:5266304. [PMID: 30597002 PMCID: PMC6354030 DOI: 10.1093/gigascience/giy165] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Accepted: 12/17/2018] [Indexed: 11/23/2022] Open

Tyakht AV, Manolov AI, Kanygina AV, Ischenko DS, Kovarsky BA, Popenko AS, Pavlenko AV, Elizarova AV, Rakitina DV, Baikova JP, Ladygina VG, Kostryukova ES, Karpova IY, Semashko TA, Larin AK, Grigoryeva TV, Sinyagina MN, Malanin SY, Shcherbakov PL, Kharitonova AY, Khalif IL, Shapina MV, Maev IV, Andreev DN, Belousova EA, Buzunova YM, Alexeev DG, Govorun VM. Genetic diversity of Escherichia coli in gut microbiota of patients with Crohn's disease discovered using metagenomic and genomic analyses. BMC Genomics 2018;19:968. [PMID: 30587114 PMCID: PMC6307143 DOI: 10.1186/s12864-018-5306-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2018] [Accepted: 11/23/2018] [Indexed: 12/12/2022] Open

Affiliation(s)

Alexander V Tyakht Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia. .,Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow Region, Russian Federation, 141700. .,ITMO University, 49 Kronverkskiy pr, Saint-Petersburg, Russian Federation, 197101.
Alexander I Manolov Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia
Alexandra V Kanygina Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow Region, Russian Federation, 141700
Dmitry S Ischenko Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia.,Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow Region, Russian Federation, 141700
Boris A Kovarsky Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia
Anna S Popenko Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia
Alexander V Pavlenko Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia
Anna V Elizarova Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow Region, Russian Federation, 141700
Daria V Rakitina Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia
Julia P Baikova Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia
Valentina G Ladygina Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia
Elena S Kostryukova Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia.,Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow Region, Russian Federation, 141700
Irina Y Karpova Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia
Tatyana A Semashko Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia.,Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow Region, Russian Federation, 141700
Andrei K Larin Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia
Tatyana V Grigoryeva Kazan Federal University, 18 Kremlyovskaya St., Kazan, Russian Federation, 420008
Mariya N Sinyagina Kazan Federal University, 18 Kremlyovskaya St., Kazan, Russian Federation, 420008
Sergei Y Malanin Kazan Federal University, 18 Kremlyovskaya St., Kazan, Russian Federation, 420008
Petr L Shcherbakov Moscow Clinical Scientific Center, 86 Shosse Entuziastov St., Moscow, Russian Federation, 111123
Anastasiya Y Kharitonova Clinical and Research Institute of Emergency Children's Surgery and Trauma, 22 Bolshaya Polyanka St., Moscow, Russian Federation, 119180
Igor L Khalif State Scientific Center of Coloproctology, 2 Salam Adil St., Moscow, Russian Federation, 123423
Marina V Shapina State Scientific Center of Coloproctology, 2 Salam Adil St., Moscow, Russian Federation, 123423
Igor V Maev Moscow State University of Medicine and Dentistry, Build. 6, 20 Delegatskaya St., Moscow, Russian Federation, 127473
Dmitriy N Andreev Moscow State University of Medicine and Dentistry, Build. 6, 20 Delegatskaya St., Moscow, Russian Federation, 127473
Elena A Belousova Moscow Regional Research and Clinical Institute, 61/2 Shchepkina str, Moscow, Russian Federation, 129110
Yulia M Buzunova Moscow Regional Research and Clinical Institute, 61/2 Shchepkina str, Moscow, Russian Federation, 129110
Dmitry G Alexeev Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia.,Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow Region, Russian Federation, 141700
Vadim M Govorun Federal Research and Clinical Centre of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435, Russia.,Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow Region, Russian Federation, 141700.,M.M. Shemyakin - Yu.A. Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 16/10 Miklukho-Maklaya St., Moscow, Russian Federation, 117997

Collapse

Şener DD, Santoni D, Felici G, Oğul H. A Content-Based Retrieval Framework for Whole Metagenome Sequencing Samples. J Integr Bioinform 2018;15:/j/jib.ahead-of-print/jib-2017-0067/jib-2017-0067.xml. [PMID: 30367805 PMCID: PMC6348744 DOI: 10.1515/jib-2017-0067] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 04/11/2018] [Indexed: 11/15/2022] Open

Wang Z, Lou H, Wang Y, Shamir R, Jiang R, Chen T. GePMI: A statistical model for personal intestinal microbiome identification. NPJ Biofilms Microbiomes 2018;4:20. [PMID: 30210803 PMCID: PMC6123480 DOI: 10.1038/s41522-018-0065-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 07/19/2018] [Accepted: 08/02/2018] [Indexed: 02/07/2023] Open

Hirai M, Nishi S, Tsuda M, Sunamura M, Takaki Y, Nunoura T. Library Construction from Subnanogram DNA for Pelagic Sea Water and Deep-Sea Sediments. Microbes Environ 2017;32:336-343. [PMID: 29187708 PMCID: PMC5745018 DOI: 10.1264/jsme2.me17132] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Affiliation(s)

Miho Hirai Research and Development (R&D) Center for Marine Biosciences, Japan Agency for Marine-Earth Science and Technology (JAMSTEC)
Shinro Nishi Research and Development (R&D) Center for Marine Biosciences, Japan Agency for Marine-Earth Science and Technology (JAMSTEC).,Ecosystem Observation and Evaluation Methodology Research Unit, Project Team for Development of New-generation Research Protocol for Submarine Resources, Japan Agency for Marine-Earth Science and Technology (JAMSTEC)
Miwako Tsuda Ecosystem Observation and Evaluation Methodology Research Unit, Project Team for Development of New-generation Research Protocol for Submarine Resources, Japan Agency for Marine-Earth Science and Technology (JAMSTEC)
Michinari Sunamura Ecosystem Observation and Evaluation Methodology Research Unit, Project Team for Development of New-generation Research Protocol for Submarine Resources, Japan Agency for Marine-Earth Science and Technology (JAMSTEC).,Department of Earth and Planetary Science, The University of Tokyo
Yoshihiro Takaki Research and Development (R&D) Center for Marine Biosciences, Japan Agency for Marine-Earth Science and Technology (JAMSTEC).,Ecosystem Observation and Evaluation Methodology Research Unit, Project Team for Development of New-generation Research Protocol for Submarine Resources, Japan Agency for Marine-Earth Science and Technology (JAMSTEC).,Department of Subsurface Geobiological Analysis and Research, Japan Agency for Marine-Earth Science and Technology (JAMSTEC)
Takuro Nunoura Research and Development (R&D) Center for Marine Biosciences, Japan Agency for Marine-Earth Science and Technology (JAMSTEC).,Ecosystem Observation and Evaluation Methodology Research Unit, Project Team for Development of New-generation Research Protocol for Submarine Resources, Japan Agency for Marine-Earth Science and Technology (JAMSTEC)

Collapse

Dubinkina VB, Tyakht AV, Odintsova VY, Yarygin KS, Kovarsky BA, Pavlenko AV, Ischenko DS, Popenko AS, Alexeev DG, Taraskina AY, Nasyrova RF, Krupitsky EM, Shalikiani NV, Bakulin IG, Shcherbakov PL, Skorodumova LO, Larin AK, Kostryukova ES, Abdulkhakov RA, Abdulkhakov SR, Malanin SY, Ismagilova RK, Grigoryeva TV, Ilina EN, Govorun VM. Links of gut microbiota composition with alcohol dependence syndrome and alcoholic liver disease. MICROBIOME 2017;5:141. [PMID: 29041989 PMCID: PMC5645934 DOI: 10.1186/s40168-017-0359-2] [Citation(s) in RCA: 283] [Impact Index Per Article: 40.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Accepted: 10/02/2017] [Indexed: 05/21/2023]

Abstract

BACKGROUND

Alcohol abuse has deleterious effects on human health by disrupting the functions of many organs and systems. Gut microbiota has been implicated in the pathogenesis of alcohol-related liver diseases, with its composition manifesting expressed dysbiosis in patients suffering from alcoholic dependence. Due to its inherent plasticity, gut microbiota is an important target for prevention and treatment of these diseases. Identification of the impact of alcohol abuse with associated psychiatric symptoms on the gut community structure is confounded by the liver dysfunction. In order to differentiate the effects of these two factors, we conducted a comparative "shotgun" metagenomic survey of 99 patients with the alcohol dependence syndrome represented by two cohorts-with and without liver cirrhosis. The taxonomic and functional composition of the gut microbiota was subjected to a multifactor analysis including comparison with the external control group.

RESULTS

Alcoholic dependence and liver cirrhosis were associated with profound shifts in gut community structures and metabolic potential across the patients. The specific effects on species-level community composition were remarkably different between cohorts with and without liver cirrhosis. In both cases, the commensal microbiota was found to be depleted. Alcoholic dependence was inversely associated with the levels of butyrate-producing species from the Clostridiales order, while the cirrhosis-with multiple members of the Bacteroidales order. The opportunist pathogens linked to alcoholic dependence included pro-inflammatory Enterobacteriaceae, while the hallmarks of cirrhosis included an increase of oral microbes in the gut and more frequent occurrence of abnormal community structures. Interestingly, each of the two factors was associated with the expressed enrichment in many Bifidobacterium and Lactobacillus-but the exact set of the species was different between alcoholic dependence and liver cirrhosis. At the level of functional potential, the patients showed different patterns of increase in functions related to alcohol metabolism and virulence factors, as well as pathways related to inflammation.

CONCLUSIONS

Multiple shifts in the community structure and metabolic potential suggest strong negative influence of alcohol dependence and associated liver dysfunction on gut microbiota. The identified differences in patterns of impact between these two factors are important for planning of personalized treatment and prevention of these pathologies via microbiota modulation. Particularly, the expansion of Bifidobacterium and Lactobacillus suggests that probiotic interventions for patients with alcohol-related disorders using representatives of the same taxa should be considered with caution. Taxonomic and functional analysis shows an increased propensity of the gut microbiota to synthesis of the toxic acetaldehyde, suggesting higher risk of colorectal cancer and other pathologies in alcoholics.

Collapse

Affiliation(s)

Veronika B. Dubinkina Moscow Institute of Physics and Technology, Institutskiy per. 9, Dolgoprudny, Moscow Region, 141700 Russia Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia Department of Bioengineering, University of Illinois at Urbana-Champaign, 1304 W. Springfield Avenue Urbana, Champaign, IL 61801 USA Carl R. Woese Institute for Genomic Biology, 1206 West Gregory Drive, Urbana, IL 61801 USA
Alexander V. Tyakht Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia ITMO University, Kronverkskiy pr. 49, Saint-Petersburg, 197101 Russia
Vera Y. Odintsova Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia
Konstantin S. Yarygin Moscow Institute of Physics and Technology, Institutskiy per. 9, Dolgoprudny, Moscow Region, 141700 Russia Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia
Boris A. Kovarsky Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia
Alexander V. Pavlenko Moscow Institute of Physics and Technology, Institutskiy per. 9, Dolgoprudny, Moscow Region, 141700 Russia Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia
Dmitry S. Ischenko Moscow Institute of Physics and Technology, Institutskiy per. 9, Dolgoprudny, Moscow Region, 141700 Russia Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia
Anna S. Popenko Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia
Dmitry G. Alexeev Moscow Institute of Physics and Technology, Institutskiy per. 9, Dolgoprudny, Moscow Region, 141700 Russia Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia
Anastasiya Y. Taraskina Saint-Petersburg Bekhterev Psychoneurological Research Institute, Bekhtereva 3, Saint-Petersburg, 192019 Russia
Regina F. Nasyrova Saint-Petersburg Bekhterev Psychoneurological Research Institute, Bekhtereva 3, Saint-Petersburg, 192019 Russia
Evgeny M. Krupitsky Saint-Petersburg Bekhterev Psychoneurological Research Institute, Bekhtereva 3, Saint-Petersburg, 192019 Russia
Nino V. Shalikiani Moscow Clinical Scientific Center, Shosse Entuziastov 86, Moscow, 111123 Russia
Igor G. Bakulin Moscow Clinical Scientific Center, Shosse Entuziastov 86, Moscow, 111123 Russia
Petr L. Shcherbakov Moscow Clinical Scientific Center, Shosse Entuziastov 86, Moscow, 111123 Russia
Lyubov O. Skorodumova Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia
Andrei K. Larin Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia
Elena S. Kostryukova Moscow Institute of Physics and Technology, Institutskiy per. 9, Dolgoprudny, Moscow Region, 141700 Russia Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia
Rustam A. Abdulkhakov Kazan State Medical University, Butlerova 49, Kazan, 420012 Russia
Sayar R. Abdulkhakov Kazan State Medical University, Butlerova 49, Kazan, 420012 Russia Kazan Federal University, Kremlyovskaya 18, Kazan, 420008 Russia
Sergey Y. Malanin Kazan Federal University, Kremlyovskaya 18, Kazan, 420008 Russia
Ruzilya K. Ismagilova Kazan Federal University, Kremlyovskaya 18, Kazan, 420008 Russia
Tatiana V. Grigoryeva Kazan Federal University, Kremlyovskaya 18, Kazan, 420008 Russia
Elena N. Ilina Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia
Vadim M. Govorun Moscow Institute of Physics and Technology, Institutskiy per. 9, Dolgoprudny, Moscow Region, 141700 Russia Federal Research and Clinical Center of Physical-Chemical Medicine, Malaya Pirogovskaya 1a, Moscow, 119435 Russia

Collapse

Beisser D, Graupner N, Grossmann L, Timm H, Boenigk J, Rahmann S. TaxMapper: an analysis tool, reference database and workflow for metatranscriptome analysis of eukaryotic microorganisms. BMC Genomics 2017;18:787. [PMID: 29037173 PMCID: PMC5644092 DOI: 10.1186/s12864-017-4168-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 10/05/2017] [Indexed: 12/17/2022] Open

Abstract

Background

High-throughput sequencing (HTS) technologies are increasingly applied to analyse complex microbial ecosystems by mRNA sequencing of whole communities, also known as metatranscriptome sequencing. This approach is at the moment largely limited to prokaryotic communities and communities of few eukaryotic species with sequenced genomes. For eukaryotes the analysis is hindered mainly by a low and fragmented coverage of the reference databases to infer the community composition, but also by lack of automated workflows for the task.

Results

From the databases of the National Center for Biotechnology Information and Marine Microbial Eukaryote Transcriptome Sequencing Project, 142 references were selected in such a way that the taxa represent the main lineages within each of the seven supergroups of eukaryotes and possess predominantly complete transcriptomes or genomes. From these references, we created an annotated microeukaryotic reference database. We developed a tool called TaxMapper for a reliably mapping of sequencing reads against this database and filtering of unreliable assignments. For filtering, a classifier was trained and tested on each of the following: sequences of taxa in the database, sequences of taxa related to those in the database, and random sequences. Additionally, TaxMapper is part of a metatranscriptomic Snakemake workflow developed to perform quality assessment, functional and taxonomic annotation and (multivariate) statistical analysis including environmental data. The workflow is provided and described in detail to empower researchers to apply it for metatranscriptome analysis of any environmental sample.

Conclusions

TaxMapper shows superior performance compared to standard approaches, resulting in a higher number of true positive taxonomic assignments. Both the TaxMapper tool and the workflow are available as open-source code at Bitbucket under the MIT license: https://bitbucket.org/dbeisser/taxmapperand as a Bioconda package: https://bioconda.github.io/recipes/taxmapper/README.html.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-017-4168-6) contains supplementary material, which is available to authorized users.

Collapse

Philips A, Stolarek I, Kuczkowska B, Juras A, Handschuh L, Piontek J, Kozlowski P, Figlerowicz M. Comprehensive analysis of microorganisms accompanying human archaeological remains. Gigascience 2017;6:1-13. [PMID: 28609785 PMCID: PMC5965364 DOI: 10.1093/gigascience/gix044] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 05/09/2017] [Accepted: 06/11/2017] [Indexed: 02/01/2023] Open

Abstract

Metagenome analysis has become a common source of information about microbial communities that occupy a wide range of niches, including archaeological specimens. It has been shown that the vast majority of DNA extracted from ancient samples come from bacteria (presumably modern contaminants). However, characterization of microbial DNA accompanying human remains has never been done systematically for a wide range of different samples. We used metagenomic approaches to perform comparative analyses of microorganism communities present in 161 archaeological human remains. DNA samples were isolated from the teeth of human skeletons dated from 100 AD to 1200 AD. The skeletons were collected from 7 archaeological sites in Central Europe and stored under different conditions. The majority of identified microbes were ubiquitous environmental bacteria that most likely contaminated the host remains not long ago. We observed that the composition of microbial communities was sample-specific and not correlated with its temporal or geographical origin. Additionally, traces of bacteria and archaea typical for human oral/gut flora, as well as potential pathogens, were identified in two-thirds of the samples. The genetic material of human-related species, in contrast to the environmental species that accounted for the majority of identified bacteria, displayed DNA damage patterns comparable with endogenous human ancient DNA, which suggested that these microbes might have accompanied the individual before death. Our study showed that the microbiome observed in an individual sample is not reliant on the method or duration of sample storage. Moreover, shallow sequencing of DNA extracted from ancient specimens and subsequent bioinformatics analysis allowed both the identification of ancient microbial species, including potential pathogens, and their differentiation from contemporary species that colonized human remains more recently.

Collapse

Forsdyke DR. Base Composition, Speciation, and Why the Mitochondrial Barcode Precisely Classifies. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/s13752-017-0267-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Liu S, Zheng J, Migeon P, Ren J, Hu Y, He C, Liu H, Fu J, White FF, Toomajian C, Wang G. Unbiased K-mer Analysis Reveals Changes in Copy Number of Highly Repetitive Sequences During Maize Domestication and Improvement. Sci Rep 2017;7:42444. [PMID: 28186206 PMCID: PMC5301235 DOI: 10.1038/srep42444] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 01/10/2017] [Indexed: 12/15/2022] Open

Rosani U, Gerdol M. A bioinformatics approach reveals seven nearly-complete RNA-virus genomes in bivalve RNA-seq data. Virus Res 2016;239:33-42. [PMID: 27769778 DOI: 10.1016/j.virusres.2016.10.009] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Revised: 10/17/2016] [Accepted: 10/17/2016] [Indexed: 01/17/2023]