1
|
Lim B, Xu J, Wierzbicki IH, Gonzalez CG, Chen Z, Gonzalez DJ, Gao X, Goodman AL. A human gut bacterium antagonizes neighboring bacteria by altering their protein-folding ability. Cell Host Microbe 2025; 33:200-217.e24. [PMID: 39909037 DOI: 10.1016/j.chom.2025.01.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Revised: 01/06/2025] [Accepted: 01/14/2025] [Indexed: 02/07/2025]
Abstract
Antagonistic interactions play a key role in determining microbial community dynamics. Here, we report that one of the most widespread contact-dependent effectors in human gut microbiomes, Bte1, directly targets the PpiD-YfgM periplasmic chaperone complex in related microbes. Structural, biochemical, and genetic characterization of this interaction reveals that Bte1 reverses the activity of the chaperone complex, promoting substrate aggregation and toxicity. Using Bacteroides, we show that Bte1 is active in the mammalian gut, conferring a fitness advantage to expressing strains. Recipient cells targeted by Bte1 exhibit sensitivity to membrane-compromising conditions, and human gut microbes can use this effector to exploit pathogen-induced inflammation in the gut. Further, Bte1 allelic variation in gut metagenomes provides evidence for an arms race between Bte1-encoding and immunity-encoding strains in humans. Together, these studies demonstrate that human gut microbes alter the protein-folding capacity of neighboring cells and suggest strategies for manipulating community dynamics.
Collapse
Affiliation(s)
- Bentley Lim
- Department of Microbial Pathogenesis and Microbial Sciences Institute, Yale University School of Medicine, New Haven, CT 06536, USA
| | - Jinghua Xu
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Igor H Wierzbicki
- Department of Pharmacology and the Skaggs School of Pharmacy and Pharmaceutical Sciences, Center of Microbiome Innovation, University of California, San Diego, La Jolla, San Diego, CA 92093, USA
| | - Carlos G Gonzalez
- Department of Pharmacology and the Skaggs School of Pharmacy and Pharmaceutical Sciences, Center of Microbiome Innovation, University of California, San Diego, La Jolla, San Diego, CA 92093, USA
| | - Zhe Chen
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237, China
| | - David J Gonzalez
- Department of Pharmacology and the Skaggs School of Pharmacy and Pharmaceutical Sciences, Center of Microbiome Innovation, University of California, San Diego, La Jolla, San Diego, CA 92093, USA
| | - Xiang Gao
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Andrew L Goodman
- Department of Microbial Pathogenesis and Microbial Sciences Institute, Yale University School of Medicine, New Haven, CT 06536, USA.
| |
Collapse
|
2
|
Abdill RJ, Graham SP, Rubinetti V, Ahmadian M, Hicks P, Chetty A, McDonald D, Ferretti P, Gibbons E, Rossi M, Krishnan A, Albert FW, Greene CS, Davis S, Blekhman R. Integration of 168,000 samples reveals global patterns of the human gut microbiome. Cell 2025:S0092-8674(24)01430-2. [PMID: 39848248 DOI: 10.1016/j.cell.2024.12.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 09/09/2024] [Accepted: 12/13/2024] [Indexed: 01/25/2025]
Abstract
The factors shaping human microbiome variation are a major focus of biomedical research. While other fields have used large sequencing compendia to extract insights requiring otherwise impractical sample sizes, the microbiome field has lacked a comparably sized resource for the 16S rRNA gene amplicon sequencing commonly used to quantify microbiome composition. To address this gap, we processed 168,464 publicly available human gut microbiome samples with a uniform pipeline. We use this compendium to evaluate geographic and technical effects on microbiome variation. We find that regions such as Central and Southern Asia differ significantly from the more thoroughly characterized microbiomes of Europe and Northern America and that composition alone can be used to predict a sample's region of origin. We also find strong associations between microbiome variation and technical factors such as primers and DNA extraction. We anticipate this growing work, the Human Microbiome Compendium, will enable advanced applied and methodological research.
Collapse
Affiliation(s)
- Richard J Abdill
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Samantha P Graham
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN, USA
| | - Vincent Rubinetti
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA; Center for Health Artificial Intelligence (CHAI), University of Colorado School of Medicine, Aurora, CO, USA
| | - Mansooreh Ahmadian
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, School of Public Health, Aurora, CO, USA
| | - Parker Hicks
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Ashwin Chetty
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Daniel McDonald
- Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Pamela Ferretti
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Elizabeth Gibbons
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Marco Rossi
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA; Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, School of Public Health, Aurora, CO, USA
| | - Frank W Albert
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN, USA
| | - Casey S Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA; Center for Health Artificial Intelligence (CHAI), University of Colorado School of Medicine, Aurora, CO, USA
| | - Sean Davis
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA; Center for Health Artificial Intelligence (CHAI), University of Colorado School of Medicine, Aurora, CO, USA
| | - Ran Blekhman
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
3
|
Jiang Y, Wang Y, Che L, Yang S, Zhang X, Lin Y, Shi Y, Zou N, Wang S, Zhang Y, Zhao Z, Li S. GutMetaNet: an integrated database for exploring horizontal gene transfer and functional redundancy in the human gut microbiome. Nucleic Acids Res 2025; 53:D772-D782. [PMID: 39526401 PMCID: PMC11701528 DOI: 10.1093/nar/gkae1007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 10/09/2024] [Accepted: 10/16/2024] [Indexed: 11/16/2024] Open
Abstract
Metagenomic studies have revealed the critical roles of complex microbial interactions, including horizontal gene transfer (HGT) and functional redundancy (FR), in shaping the gut microbiome's functional capacity and resilience. However, the lack of comprehensive data integration and systematic analysis approaches has limited the in-depth exploration of HGT and FR dynamics across large-scale gut microbiome datasets. To address this gap, we present GutMetaNet (https://gutmetanet.deepomics.org/), a first-of-its-kind database integrating extensive human gut microbiome data with comprehensive HGT and FR analyses. GutMetaNet contains 21 567 human gut metagenome samples with whole-genome shotgun sequencing data related to various health conditions. Through systematic analysis, we have characterized the taxonomic profiles and FR profiles, and identified 14 636 HGT events using a shared reference genome database across the collected samples. These HGT events have been curated into 8049 clusters, which are annotated with categorized mobile genetic elements, including transposons, prophages, integrative mobilizable elements, genomic islands, integrative conjugative elements and group II introns. Additionally, GutMetaNet incorporates automated analyses and visualizations for the HGT events and FR, serving as an efficient platform for in-depth exploration of the interactions among gut microbiome taxa and their implications for human health.
Collapse
Affiliation(s)
- Yiqi Jiang
- City University of Hong Kong Shenzhen Research Institute, 8 Yue Xing Yi Road, Nanshan District, Shenzhen, 518057, China
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong
| | - Yanfei Wang
- City University of Hong Kong Shenzhen Research Institute, 8 Yue Xing Yi Road, Nanshan District, Shenzhen, 518057, China
| | - Lijia Che
- City University of Hong Kong Shenzhen Research Institute, 8 Yue Xing Yi Road, Nanshan District, Shenzhen, 518057, China
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong
| | - Shuo Yang
- City University of Hong Kong Shenzhen Research Institute, 8 Yue Xing Yi Road, Nanshan District, Shenzhen, 518057, China
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong
| | - Xianglilan Zhang
- State Key Laboratory of Pathogen and Biosafety, 20 East Street, Fengtai District, Beijing, 100071, China
| | - Yu Lin
- State Key Laboratory of Pathogen and Biosafety, 20 East Street, Fengtai District, Beijing, 100071, China
- Beijing University of Chemical Technology, 15 Beisanhuan East Road, Chaoyang District, Beijing, 100029, China
| | - Yucheng Shi
- City University of Hong Kong Shenzhen Research Institute, 8 Yue Xing Yi Road, Nanshan District, Shenzhen, 518057, China
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong
| | - Nanhe Zou
- City University of Hong Kong Shenzhen Research Institute, 8 Yue Xing Yi Road, Nanshan District, Shenzhen, 518057, China
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong
| | - Shuai Wang
- City University of Hong Kong Shenzhen Research Institute, 8 Yue Xing Yi Road, Nanshan District, Shenzhen, 518057, China
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong
| | - Yuanzheng Zhang
- City University of Hong Kong Shenzhen Research Institute, 8 Yue Xing Yi Road, Nanshan District, Shenzhen, 518057, China
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong
| | - Zicheng Zhao
- OmicLab Limited, Unit 917, 19 Science Park West Avenue, New Territories, Hong Kong
| | - Shuai Cheng Li
- City University of Hong Kong Shenzhen Research Institute, 8 Yue Xing Yi Road, Nanshan District, Shenzhen, 518057, China
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong
| |
Collapse
|
4
|
Aplakidou E, Vergoulidis N, Chasapi M, Venetsianou NK, Kokoli M, Panagiotopoulou E, Iliopoulos I, Karatzas E, Pafilis E, Georgakopoulos-Soares I, Kyrpides NC, Pavlopoulos GA, Baltoumas FA. Visualizing metagenomic and metatranscriptomic data: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2011-2033. [PMID: 38765606 PMCID: PMC11101950 DOI: 10.1016/j.csbj.2024.04.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open
Abstract
The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.
Collapse
Affiliation(s)
- Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nikolaos Vergoulidis
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Chasapi
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Kokoli
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Eleni Panagiotopoulou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikos C. Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Center of New Biotechnologies & Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Greece
- Hellenic Army Academy, 16673 Vari, Greece
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| |
Collapse
|
5
|
Flamholz ZN, Li C, Kelly L. Improving viral annotation with artificial intelligence. mBio 2024; 15:e0320623. [PMID: 39230289 PMCID: PMC11481560 DOI: 10.1128/mbio.03206-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
Viruses of bacteria, "phages," are fundamental, poorly understood components of microbial community structure and function. Additionally, their dependence on hosts for replication positions phages as unique sensors of ecosystem features and environmental pressures. High-throughput sequencing approaches have begun to give us access to the diversity and range of phage populations in complex microbial community samples, and metagenomics is currently the primary tool with which we study phage populations. The study of phages by metagenomic sequencing, however, is fundamentally limited by viral diversity, which results in the vast majority of viral genomes and metagenome-annotated genomes lacking annotation. To harness bacteriophages for applications in human and environmental health and disease, we need new methods to organize and annotate viral sequence diversity. We recently demonstrated that methods that leverage self-supervised representation learning can supplement statistical sequence representations for remote viral protein homology detection in the ocean virome and propose that consideration of the functional content of viral sequences allows for the identification of similarity in otherwise sequence-diverse viruses and viral-like elements for biological discovery. In this review, we describe the potential and pitfalls of large language models for viral annotation. We describe the need for new approaches to annotate viral sequences in metagenomes, the fundamentals of what protein language models are and how one can use them for sequence annotation, the strengths and weaknesses of these models, and future directions toward developing better models for viral annotation more broadly.
Collapse
Affiliation(s)
- Zachary N. Flamholz
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Charlotte Li
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Libusha Kelly
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
- Department of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, New York, USA
| |
Collapse
|
6
|
Rocha U, Kasmanas JC, Toscan R, Sanches DS, Magnusdottir S, Saraiva JP. Simulation of 69 microbial communities indicates sequencing depth and false positives are major drivers of bias in prokaryotic metagenome-assembled genome recovery. PLoS Comput Biol 2024; 20:e1012530. [PMID: 39436938 PMCID: PMC11530072 DOI: 10.1371/journal.pcbi.1012530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 11/01/2024] [Accepted: 10/01/2024] [Indexed: 10/25/2024] Open
Abstract
We hypothesize that sample species abundance, sequencing depth, and taxonomic relatedness influence the recovery of metagenome-assembled genomes (MAGs). To test this hypothesis, we assessed MAG recovery in three in silico microbial communities composed of 42 species with the same richness but different sample species abundance, sequencing depth, and taxonomic distribution profiles using three different pipelines for MAG recovery. The pipeline developed by Parks and colleagues (8K) generated the highest number of MAGs and the lowest number of true positives per community profile. The pipeline by Karst and colleagues (DT) showed the most accurate results (~ 92%), outperforming the 8K and Multi-Metagenome pipeline (MM) developed by Albertsen and collaborators. Sequencing depth influenced the accurate recovery of genomes when using the 8K and MM, even with contrasting patterns: the MM pipeline recovered more MAGs found in the original communities when employing sequencing depths up to 60 million reads, while the 8K recovered more true positives in communities sequenced above 60 million reads. DT showed the best species recovery from the same genus, even though close-related species have a low recovery rate in all pipelines. Our results highlight that more bins do not translate to the actual community composition and that sequencing depth plays a role in MAG recovery and increased community resolution. Even low MAG recovery error rates can significantly impact biological inferences. Our data indicates that the scientific community should curate their findings from MAG recovery, especially when asserting novel species or metabolic traits.
Collapse
Affiliation(s)
- Ulisses Rocha
- Department of Applied Microbial Ecology, Helmholtz Center for Environmental Research-UFZ, Leipzig, Germany
| | - Jonas Coelho Kasmanas
- Department of Applied Microbial Ecology, Helmholtz Center for Environmental Research-UFZ, Leipzig, Germany
| | - Rodolfo Toscan
- Department of Applied Microbial Ecology, Helmholtz Center for Environmental Research-UFZ, Leipzig, Germany
| | - Danilo S. Sanches
- Department of Computer Science, Federal University of Technology—Paraná, UTFPR, Cornélio Procópio, Brazil
| | - Stefania Magnusdottir
- Department of Applied Microbial Ecology, Helmholtz Center for Environmental Research-UFZ, Leipzig, Germany
| | - Joao Pedro Saraiva
- Department of Applied Microbial Ecology, Helmholtz Center for Environmental Research-UFZ, Leipzig, Germany
| |
Collapse
|
7
|
Ren K, Zhou F, Zhang F, Yin M, Zhu Y, Wang S, Chen Y, Huang T, Wu Z, He J, Zhang A, Guo C, Huang Z. Discovery and structural mechanism of DNA endonucleases guided by RAGATH-18-derived RNAs. Cell Res 2024; 34:370-385. [PMID: 38575718 PMCID: PMC11061315 DOI: 10.1038/s41422-024-00952-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 03/09/2024] [Indexed: 04/06/2024] Open
Abstract
CRISPR-Cas systems and IS200/IS605 transposon-associated TnpBs have been utilized for the development of genome editing technologies. Using bioinformatics analysis and biochemical experiments, here we present a new family of RNA-guided DNA endonucleases. Our bioinformatics analysis initially identifies the stable co-occurrence of conserved RAGATH-18-derived RNAs (reRNAs) and their upstream IS607 TnpBs with an average length of 390 amino acids. IS607 TnpBs form programmable DNases through interaction with reRNAs. We discover the robust dsDNA interference activity of IS607 TnpB systems in bacteria and human cells. Further characterization of the Firmicutes bacteria IS607 TnpB system (ISFba1 TnpB) reveals that its dsDNA cleavage activity is remarkably sensitive to single mismatches between the guide and target sequences in human cells. Our findings demonstrate that a length of 20 nt in the guide sequence of reRNA achieves the highest DNA cleavage activity for ISFba1 TnpB. A cryo-EM structure of the ISFba1 TnpB effector protein bound by its cognate RAGATH-18 motif-containing reRNA and a dsDNA target reveals the mechanisms underlying reRNA recognition by ISFba1 TnpB, reRNA-guided dsDNA targeting, and the sensitivity of the ISFba1 TnpB system to base mismatches between the guide and target DNA. Collectively, this study identifies the IS607 TnpB family of compact and specific RNA-guided DNases with great potential for application in gene editing.
Collapse
Affiliation(s)
- Kuan Ren
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Fengxia Zhou
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
- Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
| | - Fan Zhang
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| | - Mingyu Yin
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yuwei Zhu
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Shouyu Wang
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yan Chen
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Tengjin Huang
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Zixuan Wu
- Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
| | - Jiale He
- Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
| | - Anqi Zhang
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Changyou Guo
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Zhiwei Huang
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
- Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China.
- New Cornerstone Science Laboratory, Shenzhen, Guangdong, China.
| |
Collapse
|
8
|
Kan CM, Tsang HF, Pei XM, Ng SSM, Yim AKY, Yu ACS, Wong SCC. Enhancing Clinical Utility: Utilization of International Standards and Guidelines for Metagenomic Sequencing in Infectious Disease Diagnosis. Int J Mol Sci 2024; 25:3333. [PMID: 38542307 PMCID: PMC10970082 DOI: 10.3390/ijms25063333] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 03/11/2024] [Accepted: 03/12/2024] [Indexed: 11/11/2024] Open
Abstract
Metagenomic sequencing has emerged as a transformative tool in infectious disease diagnosis, offering a comprehensive and unbiased approach to pathogen detection. Leveraging international standards and guidelines is essential for ensuring the quality and reliability of metagenomic sequencing in clinical practice. This review explores the implications of international standards and guidelines for the application of metagenomic sequencing in infectious disease diagnosis. By adhering to established standards, such as those outlined by regulatory bodies and expert consensus, healthcare providers can enhance the accuracy and clinical utility of metagenomic sequencing. The integration of international standards and guidelines into metagenomic sequencing workflows can streamline diagnostic processes, improve pathogen identification, and optimize patient care. Strategies in implementing these standards for infectious disease diagnosis using metagenomic sequencing are discussed, highlighting the importance of standardized approaches in advancing precision infectious disease diagnosis initiatives.
Collapse
Affiliation(s)
- Chau-Ming Kan
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (C.-M.K.); (H.F.T.)
| | - Hin Fung Tsang
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (C.-M.K.); (H.F.T.)
| | - Xiao Meng Pei
- Department of Applied Biology & Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, China;
| | - Simon Siu Man Ng
- Department of Surgery, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China;
| | | | - Allen Chi-Shing Yu
- Codex Genetics Limited, Shatin, Hong Kong, China; (A.K.-Y.Y.); (A.C.-S.Y.)
| | - Sze Chuen Cesar Wong
- Department of Applied Biology & Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, China;
| |
Collapse
|
9
|
Vázquez X, Lumbreras-Iglesias P, Rodicio MR, Fernández J, Bernal T, Moreno AF, de Ugarriza PL, Fernández-Verdugo A, Margolles A, Sabater C. Study of the intestinal microbiota composition and the effect of treatment with intensive chemotherapy in patients recovered from acute leukemia. Sci Rep 2024; 14:5585. [PMID: 38454103 PMCID: PMC10920697 DOI: 10.1038/s41598-024-56054-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 03/01/2024] [Indexed: 03/09/2024] Open
Abstract
A dataset comprising metagenomes of outpatients (n = 28) with acute leukemia (AL) and healthy controls (n = 14) was analysed to investigate the associations between gut microbiota composition and metabolic activity and AL. According to the results obtained, no significant differences in the microbial diversity between AL outpatients and healthy controls were found. However, significant differences in the abundance of specific microbial clades of healthy controls and AL outpatients were found. We found some differences at taxa level. The relative abundance of Enterobacteriaceae, Prevotellaceae and Rikenellaceae was increased in AL outpatients, while Bacteirodaceae, Bifidobacteriaceae and Lachnospiraceae was decreased. Interestingly, the abundances of several taxa including Bacteroides and Faecalibacterium species showed variations based on recovery time from the last cycle of chemotherapy. Functional annotation of metagenome-assembled genomes (MAGs) revealed the presence of functional domains corresponding to therapeutic enzymes including L-asparaginase in a wide range of genera including Prevotella, Ruminococcus, Faecalibacterium, Alistipes, Akkermansia. Metabolic network modelling revealed potential symbiotic relationships between Veillonella parvula and Levyella massiliensis and several species found in the microbiota of AL outpatients. These results may contribute to develop strategies for the recovery of microbiota composition profiles in the treatment of patients with AL.
Collapse
Grants
- FIS PI21/01590 Fondo de Investigación Sanitaria, Instituto de Salud Carlos III, Ministerio de Economía y Competitividad, Spain
- FIS PI21/01590 Fondo de Investigación Sanitaria, Instituto de Salud Carlos III, Ministerio de Economía y Competitividad, Spain
- FIS PI21/01590 Fondo de Investigación Sanitaria, Instituto de Salud Carlos III, Ministerio de Economía y Competitividad, Spain
- FIS PI21/01590 Fondo de Investigación Sanitaria, Instituto de Salud Carlos III, Ministerio de Economía y Competitividad, Spain
- FIS PI21/01590 Fondo de Investigación Sanitaria, Instituto de Salud Carlos III, Ministerio de Economía y Competitividad, Spain
- FIS PI21/01590 Fondo de Investigación Sanitaria, Instituto de Salud Carlos III, Ministerio de Economía y Competitividad, Spain
- FIS PI21/01590 Fondo de Investigación Sanitaria, Instituto de Salud Carlos III, Ministerio de Economía y Competitividad, Spain
- FIS PI21/01590 Fondo de Investigación Sanitaria, Instituto de Salud Carlos III, Ministerio de Economía y Competitividad, Spain
- FIS PI21/01590 Fondo de Investigación Sanitaria, Instituto de Salud Carlos III, Ministerio de Economía y Competitividad, Spain
- FIS PI21/01590 Fondo de Investigación Sanitaria, Instituto de Salud Carlos III, Ministerio de Economía y Competitividad, Spain
- GRUPIN IDI/2022/000033 Regional Ministry of Science of Asturias
- GRUPIN IDI/2022/000033 Regional Ministry of Science of Asturias
- GRUPIN IDI/2022/000033 Regional Ministry of Science of Asturias
- GRUPIN IDI/2022/000033 Regional Ministry of Science of Asturias
- GRUPIN IDI/2022/000033 Regional Ministry of Science of Asturias
- GRUPIN IDI/2022/000033 Regional Ministry of Science of Asturias
- GRUPIN IDI/2022/000033 Regional Ministry of Science of Asturias
- GRUPIN IDI/2022/000033 Regional Ministry of Science of Asturias
- GRUPIN IDI/2022/000033 Regional Ministry of Science of Asturias
- GRUPIN IDI/2022/000033 Regional Ministry of Science of Asturias
Collapse
Affiliation(s)
- Xenia Vázquez
- Dairy Research Institute of Asturias (IPLA), Spanish National Research Council, (CSIC), Villaviciosa, Asturias, Spain
- Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), MicroHealth Group, Oviedo, Spain
| | - Pilar Lumbreras-Iglesias
- Traslational Microbiology Group, Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), Oviedo, Spain
- Department of Clinical Microbiology, Hospital Universitario Central de Asturias (HUCA), Oviedo, Spain
- Department of Hematology Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), Instituto de Oncología del Principado de Asturias (IUOPA), Hospital Universitario Central de Asturias (HUCA), 33011, Oviedo, Spain
| | - M Rosario Rodicio
- Traslational Microbiology Group, Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), Oviedo, Spain
- Department of Functional Biology, Microbiology Area, University of Oviedo, Oviedo, Spain
| | - Javier Fernández
- Traslational Microbiology Group, Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), Oviedo, Spain
- Department of Clinical Microbiology, Hospital Universitario Central de Asturias (HUCA), Oviedo, Spain
- Research & Innovation, Artificial Intelligence and Statistical Department, Pragmatech AI Solutions, Oviedo, Spain
- Centro de Investigación Biomédica en Red-Enfermedades Respiratorias, Madrid, Spain
| | - Teresa Bernal
- Department of Hematology Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), Instituto de Oncología del Principado de Asturias (IUOPA), Hospital Universitario Central de Asturias (HUCA), 33011, Oviedo, Spain
| | - Ainhoa Fernández Moreno
- Department of Hematology Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), Instituto de Oncología del Principado de Asturias (IUOPA), Hospital Universitario Central de Asturias (HUCA), 33011, Oviedo, Spain
| | - Paula López de Ugarriza
- Department of Hematology Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), Instituto de Oncología del Principado de Asturias (IUOPA), Hospital Universitario Central de Asturias (HUCA), 33011, Oviedo, Spain
| | - Ana Fernández-Verdugo
- Traslational Microbiology Group, Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), Oviedo, Spain
- Department of Clinical Microbiology, Hospital Universitario Central de Asturias (HUCA), Oviedo, Spain
| | - Abelardo Margolles
- Dairy Research Institute of Asturias (IPLA), Spanish National Research Council, (CSIC), Villaviciosa, Asturias, Spain
- Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), MicroHealth Group, Oviedo, Spain
| | - Carlos Sabater
- Dairy Research Institute of Asturias (IPLA), Spanish National Research Council, (CSIC), Villaviciosa, Asturias, Spain.
- Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), MicroHealth Group, Oviedo, Spain.
| |
Collapse
|
10
|
Kumar B, Lorusso E, Fosso B, Pesole G. A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions. Front Microbiol 2024; 15:1343572. [PMID: 38419630 PMCID: PMC10900530 DOI: 10.3389/fmicb.2024.1343572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 01/29/2024] [Indexed: 03/02/2024] Open
Abstract
Metagenomics, Metabolomics, and Metaproteomics have significantly advanced our knowledge of microbial communities by providing culture-independent insights into their composition and functional potential. However, a critical challenge in this field is the lack of standard and comprehensive metadata associated with raw data, hindering the ability to perform robust data stratifications and consider confounding factors. In this comprehensive review, we categorize publicly available microbiome data into five types: shotgun sequencing, amplicon sequencing, metatranscriptomic, metabolomic, and metaproteomic data. We explore the importance of metadata for data reuse and address the challenges in collecting standardized metadata. We also, assess the limitations in metadata collection of existing public repositories collecting metagenomic data. This review emphasizes the vital role of metadata in interpreting and comparing datasets and highlights the need for standardized metadata protocols to fully leverage metagenomic data's potential. Furthermore, we explore future directions of implementation of Machine Learning (ML) in metadata retrieval, offering promising avenues for a deeper understanding of microbial communities and their ecological roles. Leveraging these tools will enhance our insights into microbial functional capabilities and ecological dynamics in diverse ecosystems. Finally, we emphasize the crucial metadata role in ML models development.
Collapse
Affiliation(s)
- Bablu Kumar
- Università degli Studi di Milano, Milan, Italy
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
| | - Erika Lorusso
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
- National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Bari, Italy
| | - Bruno Fosso
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
| | - Graziano Pesole
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
- National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Bari, Italy
| |
Collapse
|
11
|
Rocha U, Coelho Kasmanas J, Kallies R, Saraiva JP, Toscan RB, Štefanič P, Bicalho MF, Borim Correa F, Baştürk MN, Fousekis E, Viana Barbosa LM, Plewka J, Probst AJ, Baldrian P, Stadler PF. MuDoGeR: Multi-Domain Genome recovery from metagenomes made easy. Mol Ecol Resour 2024; 24:e13904. [PMID: 37994269 DOI: 10.1111/1755-0998.13904] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/18/2023] [Accepted: 11/13/2023] [Indexed: 11/24/2023]
Abstract
Several computational frameworks and workflows that recover genomes from prokaryotes, eukaryotes and viruses from metagenomes exist. Yet, it is difficult for scientists with little bioinformatics experience to evaluate quality, annotate genes, dereplicate, assign taxonomy and calculate relative abundance and coverage of genomes belonging to different domains. MuDoGeR is a user-friendly tool tailored for those familiar with Unix command-line environment that makes it easy to recover genomes of prokaryotes, eukaryotes and viruses from metagenomes, either alone or in combination. We tested MuDoGeR using 24 individual-isolated genomes and 574 metagenomes, demonstrating the applicability for a few samples and high throughput. While MuDoGeR can recover eukaryotic viral sequences, its characterization is predominantly skewed towards bacterial and archaeal viruses, reflecting the field's current state. However, acting as a dynamic wrapper, the MuDoGeR is designed to constantly incorporate updates and integrate new tools, ensuring its ongoing relevance in the rapidly evolving field. MuDoGeR is open-source software available at https://github.com/mdsufz/MuDoGeR. Additionally, MuDoGeR is also available as a Singularity container.
Collapse
Affiliation(s)
- Ulisses Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Jonas Coelho Kasmanas
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
| | - René Kallies
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Joao Pedro Saraiva
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Rodolfo Brizola Toscan
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Polonca Štefanič
- Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Marcos Fleming Bicalho
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Felipe Borim Correa
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Merve Nida Baştürk
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Efthymios Fousekis
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Luiz Miguel Viana Barbosa
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Julia Plewka
- Environmental Microbiology and Biotechnology, Department of Chemistry, University of Duisburg-Essen, Essen, Germany
| | - Alexander J Probst
- Environmental Microbiology and Biotechnology, Department of Chemistry, University of Duisburg-Essen, Essen, Germany
| | - Petr Baldrian
- Laboratory of Environmental Microbiology, Institute of Microbiology of the Czech Academy of Sciences, Praha 4, Czech Republic
| | - Peter F Stadler
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
- The Santa Fe Institute, Santa Fe, New Mexico, USA
| |
Collapse
|
12
|
Kiran A, Hanachi M, Alsayed N, Fassatoui M, Oduaran OH, Allali I, Maslamoney S, Meintjes A, Zass L, Rocha JD, Kefi R, Benkahla A, Ghedira K, Panji S, Mulder N, Fadlelmola FM, Souiai O. The African Human Microbiome Portal: a public web portal of curated metagenomic metadata. Database (Oxford) 2024; 2024:baad092. [PMID: 38204360 PMCID: PMC10782148 DOI: 10.1093/database/baad092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 11/03/2023] [Accepted: 12/21/2023] [Indexed: 01/12/2024]
Abstract
There is growing evidence that comprehensive and harmonized metadata are fundamental for effective public data reusability. However, it is often challenging to extract accurate metadata from public repositories. Of particular concern is the metagenomic data related to African individuals, which often omit important information about the particular features of these populations. As part of a collaborative consortium, H3ABioNet, we created a web portal, namely the African Human Microbiome Portal (AHMP), exclusively dedicated to metadata related to African human microbiome samples. Metadata were collected from various public repositories prior to cleaning, curation and harmonization according to a pre-established guideline and using ontology terms. These metadata sets can be accessed at https://microbiome.h3abionet.org/. This web portal is open access and offers an interactive visualization of 14 889 records from 70 bioprojects associated with 72 peer reviewed research articles. It also offers the ability to download harmonized metadata according to the user's applied filters. The AHMP thereby supports metadata search and retrieve operations, facilitating, thus, access to relevant studies linked to the African Human microbiome. Database URL: https://microbiome.h3abionet.org/.
Collapse
Affiliation(s)
| | - Mariem Hanachi
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institute Pasteur of Tunis, University Tunis El Manar, Tunis 1002, Tunisia
- Faculty of Science of Bizerte, University of Carthage, Tunis, Tunisia
| | - Nihad Alsayed
- Kush Centre for Genomics and Biomedical Informatics, Biotechnology Perspectives Organization, Khartoum, Sudan
| | - Meriem Fassatoui
- Laboratory of Biomedical Genomics & Oncogenetics, Institut Pasteur de Tunis, University Tunis El Manar, Tunis 1002, Tunisia
| | - Ovokeraye H Oduaran
- The Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
| | - Imane Allali
- Laboratory of Human Pathologies Biology, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Suresh Maslamoney
- Computational Biology Division, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| | - Ayton Meintjes
- Computational Biology Division, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| | - Lyndon Zass
- Computational Biology Division, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| | - Jorge Da Rocha
- The Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
| | - Rym Kefi
- Laboratory of Biomedical Genomics & Oncogenetics, Institut Pasteur de Tunis, University Tunis El Manar, Tunis 1002, Tunisia
| | - Alia Benkahla
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institute Pasteur of Tunis, University Tunis El Manar, Tunis 1002, Tunisia
| | - Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institute Pasteur of Tunis, University Tunis El Manar, Tunis 1002, Tunisia
| | - Sumir Panji
- Computational Biology Division, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| | - Faisal M Fadlelmola
- Kush Centre for Genomics and Biomedical Informatics, Biotechnology Perspectives Organization, Khartoum, Sudan
| | - Oussema Souiai
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institute Pasteur of Tunis, University Tunis El Manar, Tunis 1002, Tunisia
- Malawi-Liverpool-Wellcome Trust, Blantyre 3, Malawi
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool CH64 7TE, UK
| |
Collapse
|
13
|
Ma C, Zhang Y, Jiang S, Teng F, Huang S, Zhang J. Cross-cohort single-nucleotide-variant profiling of gut microbiota suggests a novel gut-health assessment approach. mSystems 2023; 8:e0082823. [PMID: 37905808 PMCID: PMC10734426 DOI: 10.1128/msystems.00828-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 09/21/2023] [Indexed: 11/02/2023] Open
Abstract
IMPORTANCE Most studies focused much on the change in abundance and often failed to explain the microbiome variation related to disease conditions, Herein, we argue that microbial genetic changes can precede the ecological changes associated with the host physiological changes and, thus, would offer a new information layer from metagenomic data for predictive modeling of diseases. Interestingly, we preliminarily found a few genetic biomarkers on SCFA production can cover most chronic diseases involved in the meta-analysis. In the future, it is of both scientific and clinical significance to further explore the dynamic interactions between adaptive evolution and ecology of gut microbiota associated with host health status.
Collapse
Affiliation(s)
- Chenchen Ma
- Key Laboratory of Food Nutrition and Functional Food of Hainan Province, School of Food Science and Engineering, Hainan University, Haikou, China
- School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Yufeng Zhang
- Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
| | - Shuaiming Jiang
- Key Laboratory of Food Nutrition and Functional Food of Hainan Province, School of Food Science and Engineering, Hainan University, Haikou, China
| | - Fei Teng
- Qingdao Stomatological Hospital Affiliated to Qingdao University, Qingdao, China
| | - Shi Huang
- Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
| | - Jiachao Zhang
- Key Laboratory of Food Nutrition and Functional Food of Hainan Province, School of Food Science and Engineering, Hainan University, Haikou, China
- One Health Institute, Hainan University, Haikou, Hainan, China
| |
Collapse
|
14
|
Kallies R, Hu D, Abdulkadir N, Schloter M, Rocha U. Identification of Huge Phages from Wastewater Metagenomes. Viruses 2023; 15:2330. [PMID: 38140571 PMCID: PMC10747093 DOI: 10.3390/v15122330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 11/20/2023] [Accepted: 11/22/2023] [Indexed: 12/24/2023] Open
Abstract
Huge phages have genomes larger than 200 kilobases, which are particularly interesting for their genetic inventory and evolution. We screened 165 wastewater metagenomes for the presence of viral sequences. After identifying over 600 potential huge phage genomes, we reduced the dataset using manual curation by excluding viral contigs that did not contain viral protein-coding genes or consisted of concatemers of several small phage genomes. This dataset showed seven fully annotated huge phage genomes. The phages grouped into distinct phylogenetic clades, likely forming new genera and families. A phylogenomic analysis between our huge phages and phages with smaller genomes, i.e., less than 200 kb, supported the hypothesis that huge phages have undergone convergent evolution. The genomes contained typical phage protein-coding genes, sequential gene cassettes for metabolic pathways, and complete inventories of tRNA genes covering all standard and rare amino acids. Our study showed a pipeline for huge phage analyses that may lead to new enzymes for therapeutic or biotechnological applications.
Collapse
Affiliation(s)
- René Kallies
- Department for Environmental Microbiology, Helmholtz Centre for Environmental Research, Permoserstr. 15, D-04318 Leipzig, Germany; (D.H.); (N.A.)
| | - Die Hu
- Department for Environmental Microbiology, Helmholtz Centre for Environmental Research, Permoserstr. 15, D-04318 Leipzig, Germany; (D.H.); (N.A.)
| | - Nafi’u Abdulkadir
- Department for Environmental Microbiology, Helmholtz Centre for Environmental Research, Permoserstr. 15, D-04318 Leipzig, Germany; (D.H.); (N.A.)
| | - Michael Schloter
- Department of Environmental Health, Helmholtz Munich, Ingolstaedter Landstr. 1, D-85758 Neuherberg, Germany;
| | - Ulisses Rocha
- Department for Environmental Microbiology, Helmholtz Centre for Environmental Research, Permoserstr. 15, D-04318 Leipzig, Germany; (D.H.); (N.A.)
| |
Collapse
|
15
|
Zhao K, Farrell K, Mashiku M, Abay D, Tang K, Oberste MS, Burns CC. A search-based geographic metadata curation pipeline to refine sequencing institution information and support public health. Front Public Health 2023; 11:1254976. [PMID: 38035280 PMCID: PMC10683794 DOI: 10.3389/fpubh.2023.1254976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 10/19/2023] [Indexed: 12/02/2023] Open
Abstract
Background The National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) has amassed a vast reservoir of genetic data since its inception in 2007. These public data hold immense potential for supporting pathogen surveillance and control. However, the lack of standardized metadata and inconsistent submission practices in SRA may impede the data's utility in public health. Methods To address this issue, we introduce the Search-based Geographic Metadata Curation (SGMC) pipeline. SGMC utilized Python and web scraping to extract geographic data of sequencing institutions from NCBI SRA in the Cloud and its website. It then harnessed ChatGPT to refine the sequencing institution and location assignments. To illustrate the pipeline's utility, we examined the geographic distribution of the sequencing institutions and their countries relevant to polio eradication and categorized them. Results SGMC successfully identified 7,649 sequencing institutions and their global locations from a random selection of 2,321,044 SRA accessions. These institutions were distributed across 97 countries, with strong representation in the United States, the United Kingdom and China. However, there was a lack of data from African, Central Asian, and Central American countries, indicating potential disparities in sequencing capabilities. Comparison with manually curated data for U.S. institutions reveals SGMC's accuracy rates of 94.8% for institutions, 93.1% for countries, and 74.5% for geographic coordinates. Conclusion SGMC may represent a novel approach using a generative AI model to enhance geographic data (country and institution assignments) for large numbers of samples within SRA datasets. This information can be utilized to bolster public health endeavors.
Collapse
Affiliation(s)
- Kun Zhao
- Division of Viral Diseases, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Katie Farrell
- Cherokee Nation Businesses, Contracting Agency to the Division of Viral Diseases, Centers for Disease Control and Prevention, Catoosa, OK, United States
| | - Melchizedek Mashiku
- Cherokee Nation Businesses, Contracting Agency to the Division of Viral Diseases, Centers for Disease Control and Prevention, Catoosa, OK, United States
| | - Dawit Abay
- Cherokee Nation Businesses, Contracting Agency to the Division of Viral Diseases, Centers for Disease Control and Prevention, Catoosa, OK, United States
| | - Kevin Tang
- Division of Scientific Resources, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - M Steven Oberste
- Division of Viral Diseases, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Cara C Burns
- Division of Viral Diseases, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| |
Collapse
|
16
|
Duitama González C, Vicedomini R, Lemane T, Rascovan N, Richard H, Chikhi R. decOM: similarity-based microbial source tracking of ancient oral samples using k-mer-based methods. MICROBIOME 2023; 11:243. [PMID: 37926832 PMCID: PMC10626679 DOI: 10.1186/s40168-023-01670-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 09/13/2023] [Indexed: 11/07/2023]
Abstract
BACKGROUND The analysis of ancient oral metagenomes from archaeological human and animal samples is largely confounded by contaminant DNA sequences from modern and environmental sources. Existing methods for Microbial Source Tracking (MST) estimate the proportions of environmental sources, but do not perform well on ancient metagenomes. We developed a novel method called decOM for Microbial Source Tracking and classification of ancient and modern metagenomic samples using k-mer matrices. RESULTS We analysed a collection of 360 ancient oral, modern oral, sediment/soil and skin metagenomes, using stratified five-fold cross-validation. decOM estimates the contributions of these source environments in ancient oral metagenomic samples with high accuracy, outperforming two state-of-the-art methods for source tracking, FEAST and mSourceTracker. CONCLUSIONS decOM is a high-accuracy microbial source tracking method, suitable for ancient oral metagenomic data sets. The decOM method is generic and could also be adapted for MST of other ancient and modern types of metagenomes. We anticipate that decOM will be a valuable tool for MST of ancient metagenomic studies. Video Abstract.
Collapse
Affiliation(s)
- Camila Duitama González
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Université Paris Cité, Sorbonne Université, Paris, F-75015, France.
| | - Riccardo Vicedomini
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Université Paris Cité, Sorbonne Université, Paris, F-75015, France
- Université de Rennes, Inria, CNRS, IRISA, Rennes, France
| | - Téo Lemane
- Université de Rennes, Inria, CNRS, IRISA, Rennes, France
| | - Nicolas Rascovan
- Institut Pasteur, Université de Paris Cité, CNRS UMR 2000, Microbial Paleogenomics Unit, Paris, F-75015, France
| | - Hugues Richard
- Bioinformatics unit (MF1), Robert Koch Institute, Nordufer, 20, 13353, Berlin, Germany
| | - Rayan Chikhi
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Université Paris Cité, Sorbonne Université, Paris, F-75015, France
| |
Collapse
|
17
|
Meng D, Ai S, Spanos M, Shi X, Li G, Cretoiu D, Zhou Q, Xiao J. Exercise and microbiome: From big data to therapy. Comput Struct Biotechnol J 2023; 21:5434-5445. [PMID: 38022690 PMCID: PMC10665598 DOI: 10.1016/j.csbj.2023.10.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/16/2023] [Accepted: 10/17/2023] [Indexed: 12/01/2023] Open
Abstract
Exercise is a vital component in maintaining optimal health and serves as a prospective therapeutic intervention for various diseases. The human microbiome, comprised of trillions of microorganisms, plays a crucial role in overall health. Given the advancements in microbiome research, substantial databases have been created to decipher the functionality and mechanisms of the microbiome in health and disease contexts. This review presents an initial overview of microbiomics development and related databases, followed by an in-depth description of the multi-omics technologies for microbiome. It subsequently synthesizes the research pertaining to exercise-induced modifications of the microbiome and diseases that impact the microbiome. Finally, it highlights the potential therapeutic implications of an exercise-modulated microbiome in intestinal disease, obesity and diabetes, cardiovascular disease, and immune/inflammation-related diseases.
Collapse
Affiliation(s)
- Danni Meng
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| | - Songwei Ai
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| | - Michail Spanos
- Cardiovascular Division of the Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Xiaohui Shi
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| | - Guoping Li
- Cardiovascular Division of the Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Dragos Cretoiu
- Department of Medical Genetics, Carol Davila University of Medicine and Pharmacy, Bucharest 020031, Romania
- Materno-Fetal Assistance Excellence Unit, Alessandrescu-Rusescu National Institute for Mother and Child Health, Bucharest 011062, Romania
| | - Qiulian Zhou
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| | - Junjie Xiao
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| |
Collapse
|
18
|
Abdill RJ, Graham SP, Rubinetti V, Albert FW, Greene CS, Davis S, Blekhman R. Integration of 168,000 samples reveals global patterns of the human gut microbiome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.11.560955. [PMID: 37873416 PMCID: PMC10592789 DOI: 10.1101/2023.10.11.560955] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Understanding the factors that shape variation in the human microbiome is a major goal of research in biology. While other genomics fields have used large, pre-compiled compendia to extract systematic insights requiring otherwise impractical sample sizes, there has been no comparable resource for the 16S rRNA sequencing data commonly used to quantify microbiome composition. To help close this gap, we have assembled a set of 168,484 publicly available human gut microbiome samples, processed with a single pipeline and combined into the largest unified microbiome dataset to date. We use this resource, which is freely available at microbiomap.org, to shed light on global variation in the human gut microbiome. We find that Firmicutes, particularly Bacilli and Clostridia, are almost universally present in the human gut. At the same time, the relative abundance of the 65 most common microbial genera differ between at least two world regions. We also show that gut microbiomes in undersampled world regions, such as Central and Southern Asia, differ significantly from the more thoroughly characterized microbiomes of Europe and Northern America. Moreover, humans in these overlooked regions likely harbor hundreds of taxa that have not yet been discovered due to this undersampling, highlighting the need for diversity in microbiome studies. We anticipate that this new compendium can serve the community and enable advanced applied and methodological research.
Collapse
Affiliation(s)
- Richard J. Abdill
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, USA
| | - Samantha P. Graham
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, USA
| | - Vincent Rubinetti
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Center for Health Artificial Intelligence (CHAI), University of Colorado School of Medicine, Aurora, CO, USA
| | - Frank W. Albert
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, USA
| | - Casey S. Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Center for Health Artificial Intelligence (CHAI), University of Colorado School of Medicine, Aurora, CO, USA
| | - Sean Davis
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Center for Health Artificial Intelligence (CHAI), University of Colorado School of Medicine, Aurora, CO, USA
| | - Ran Blekhman
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
19
|
Avila Santos AP, Kabiru Nata'ala M, Kasmanas JC, Bartholomäus A, Keller-Costa T, Jurburg SD, Tal T, Camarinha-Silva A, Saraiva JP, Ponce de Leon Ferreira de Carvalho AC, Stadler PF, Sipoli Sanches D, Rocha U. The AnimalAssociatedMetagenomeDB reveals a bias towards livestock and developed countries and blind spots in functional-potential studies of animal-associated microbiomes. Anim Microbiome 2023; 5:48. [PMID: 37798675 PMCID: PMC10552293 DOI: 10.1186/s42523-023-00267-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 09/18/2023] [Indexed: 10/07/2023] Open
Abstract
BACKGROUND Metagenomic data can shed light on animal-microbiome relationships and the functional potential of these communities. Over the past years, the generation of metagenomics data has increased exponentially, and so has the availability and reusability of data present in public repositories. However, identifying which datasets and associated metadata are available is not straightforward. We created the Animal-Associated Metagenome Metadata Database (AnimalAssociatedMetagenomeDB - AAMDB) to facilitate the identification and reuse of publicly available non-human, animal-associated metagenomic data, and metadata. Further, we used the AAMDB to (i) annotate common and scientific names of the species; (ii) determine the fraction of vertebrates and invertebrates; (iii) study their biogeography; and (iv) specify whether the animals were wild, pets, livestock or used for medical research. RESULTS We manually selected metagenomes associated with non-human animals from SRA and MG-RAST. Next, we standardized and curated 51 metadata attributes (e.g., host, compartment, geographic coordinates, and country). The AAMDB version 1.0 contains 10,885 metagenomes associated with 165 different species from 65 different countries. From the collected metagenomes, 51.1% were recovered from animals associated with medical research or grown for human consumption (i.e., mice, rats, cattle, pigs, and poultry). Further, we observed an over-representation of animals collected in temperate regions (89.2%) and a lower representation of samples from the polar zones, with only 11 samples in total. The most common genus among invertebrate animals was Trichocerca (rotifers). CONCLUSION Our work may guide host species selection in novel animal-associated metagenome research, especially in biodiversity and conservation studies. The data available in our database will allow scientists to perform meta-analyses and test new hypotheses (e.g., host-specificity, strain heterogeneity, and biogeography of animal-associated metagenomes), leveraging existing data. The AAMDB WebApp is a user-friendly interface that is publicly available at https://webapp.ufz.de/aamdb/ .
Collapse
Affiliation(s)
- Anderson Paulo Avila Santos
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
- Institute of Mathematics and Computer Sciences, University of Sao Paulo, Sao Carlos, Brazil
| | - Muhammad Kabiru Nata'ala
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Saxony, Germany
| | - Jonas Coelho Kasmanas
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Saxony, Germany
- Institute of Mathematics and Computer Sciences, University of Sao Paulo, Sao Carlos, Brazil
| | - Alexander Bartholomäus
- GFZ German Research Centre for Geosciences, Section 3.7 Geomicrobiology, 14473, Telegrafenberg, Potsdam, Germany
| | - Tina Keller-Costa
- Institute for Bioengineering and Biosciences (iBB) and Institute for Health and Bioeconomy (i4HB), Instituto Superior Tecnico (IST), Universidade de Lisboa, Lisbon, 1049-001, Portugal
| | - Stephanie D Jurburg
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
- German Centre of Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstraße 4, Leipzig, 04103, Germany
| | - Tamara Tal
- Department of Bioanalytical Ecotoxicology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Amélia Camarinha-Silva
- Hohenheim Center for Livestock Microbiome Research (HoLMiR), University of Hohenheim, Stuttgart, Germany
- Institute of Animal Science, University of Hohenheim, Stuttgart, Germany
| | - João Pedro Saraiva
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
| | | | - Peter F Stadler
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Saxony, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße, 04103, Leipzig, Germany
- Institute for Theoretical Chemistry, Universität Wien, Währingerstraße 17, Vienna, A-1090, Austria
- Center for Scalable Data Analytics and Artificial Intelligence Dresden-Leipzig, Leipzig University, Leipzig, Germany
- Faculdad de Ciencias, Universidad Nacional de Colombia, Sede Bogotá, Bogotá, Colombia
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- The Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA
| | | | - Ulisses Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany.
| |
Collapse
|
20
|
Liu Z, Wang Q, Ma A, Feng S, Chung D, Zhao J, Ma Q, Liu B. Inference of disease-associated microbial gene modules based on metagenomic and metatranscriptomic data. Comput Biol Med 2023; 165:107458. [PMID: 37703713 DOI: 10.1016/j.compbiomed.2023.107458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 08/22/2023] [Accepted: 09/04/2023] [Indexed: 09/15/2023]
Abstract
The identification of microbial characteristics associated with diseases is crucial for disease diagnosis and therapy. However, the presence of heterogeneity, high dimensionality, and large amounts of microbial data presents tremendous challenges in discovering key microbial features. In this paper, we present IDAM, a novel computational method for inferring disease-associated gene modules from metagenomic and metatranscriptomic data. This method integrates gene context conservation (uber-operons) and regulatory mechanisms (gene co-expression patterns) within a mathematical graph model to explore gene modules associated with specific diseases. It alleviates reliance on prior meta-data. We applied IDAM to publicly available datasets from inflammatory bowel disease, melanoma, type 1 diabetes mellitus, and irritable bowel syndrome. The results demonstrated the superior performance of IDAM in inferring disease-associated characteristics compared to existing popular tools. Furthermore, we showcased the high reproducibility of the gene modules inferred by IDAM using independent cohorts with inflammatory bowel disease. We believe that IDAM can be a highly advantageous method for exploring disease-associated microbial characteristics. The source code of IDAM is freely available at https://github.com/OSU-BMBL/IDAM, and the web server can be accessed at https://bmblx.bmi.osumc.edu/idam/.
Collapse
Affiliation(s)
- Zhaoqian Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Qi Wang
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Shaohong Feng
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Dongjun Chung
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA; Pelotonia Institute for Immuno-Oncology, The Ohio State University, Columbus, OH, 43210, USA
| | - Jing Zhao
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA; Pelotonia Institute for Immuno-Oncology, The Ohio State University, Columbus, OH, 43210, USA.
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China; Shandong National Center for Applied Mathematics, Jinan, Shandong, 250100, China.
| |
Collapse
|
21
|
Wei W, Millward A, Koslicki D. Finding phylogeny-aware and biologically meaningful averages of metagenomic samples: L2UniFrac. Bioinformatics 2023; 39:i57-i65. [PMID: 37387190 DOI: 10.1093/bioinformatics/btad238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Metagenomic samples have high spatiotemporal variability. Hence, it is useful to summarize and characterize the microbial makeup of a given environment in a way that is biologically reasonable and interpretable. The UniFrac metric has been a robust and widely used metric for measuring the variability between metagenomic samples. We propose that the characterization of metagenomic environments can be improved by finding the average, a.k.a. the barycenter, among the samples with respect to the UniFrac distance. However, it is possible that such a UniFrac-average includes negative entries, making it no longer a valid representation of a metagenomic community. RESULTS To overcome this intrinsic issue, we propose a special version of the UniFrac metric, termed L2UniFrac, which inherits the phylogenetic nature of the traditional UniFrac and with respect to which one can easily compute the average, producing biologically meaningful environment-specific "representative samples." We demonstrate the usefulness of such representative samples as well as the extended usage of L2UniFrac in efficient clustering of metagenomic samples, and provide mathematical characterizations and proofs to the desired properties of L2UniFrac. AVAILABILITY AND IMPLEMENTATION A prototype implementation is provided at https://github.com/KoslickiLab/L2-UniFrac.git. All figures, data, and analysis can be reproduced at https://github.com/KoslickiLab/L2-UniFrac-Paper.
Collapse
Affiliation(s)
- Wei Wei
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, United States
| | - Andrew Millward
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, United States
| | - David Koslicki
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, United States
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, United States
- Department of Biology, Pennsylvania State University, University Park, PA 16802, United States
| |
Collapse
|
22
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
| |
Collapse
|
23
|
Ferrocino I, Rantsiou K, McClure R, Kostic T, de Souza RSC, Lange L, FitzGerald J, Kriaa A, Cotter P, Maguin E, Schelkle B, Schloter M, Berg G, Sessitsch A, Cocolin L. The need for an integrated multi-OMICs approach in microbiome science in the food system. Compr Rev Food Sci Food Saf 2023; 22:1082-1103. [PMID: 36636774 DOI: 10.1111/1541-4337.13103] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 12/05/2022] [Accepted: 12/19/2022] [Indexed: 01/14/2023]
Abstract
Microbiome science as an interdisciplinary research field has evolved rapidly over the past two decades, becoming a popular topic not only in the scientific community and among the general public, but also in the food industry due to the growing demand for microbiome-based technologies that provide added-value solutions. Microbiome research has expanded in the context of food systems, strongly driven by methodological advances in different -omics fields that leverage our understanding of microbial diversity and function. However, managing and integrating different complex -omics layers are still challenging. Within the Coordinated Support Action MicrobiomeSupport (https://www.microbiomesupport.eu/), a project supported by the European Commission, the workshop "Metagenomics, Metaproteomics and Metabolomics: the need for data integration in microbiome research" gathered 70 participants from different microbiome research fields relevant to food systems, to discuss challenges in microbiome research and to promote a switch from microbiome-based descriptive studies to functional studies, elucidating the biology and interactive roles of microbiomes in food systems. A combination of technologies is proposed. This will reduce the biases resulting from each individual technology and result in a more comprehensive view of the biological system as a whole. Although combinations of different datasets are still rare, advanced bioinformatics tools and artificial intelligence approaches can contribute to understanding, prediction, and management of the microbiome, thereby providing the basis for the improvement of food quality and safety.
Collapse
Affiliation(s)
- Ilario Ferrocino
- Department of Agriculture, Forest and Food Science, University of Turin, Grugliasco, Italy
| | - Kalliopi Rantsiou
- Department of Agriculture, Forest and Food Science, University of Turin, Grugliasco, Italy
| | - Ryan McClure
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Tanja Kostic
- AIT Austrian Institute of Technology GmbH, Bioresources Unit, Tulln, Austria
| | - Rafael Soares Correa de Souza
- Genomics for Climate Change Research Center (GCCRC), Universidade Estadual de Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Lene Lange
- BioEconomy, Research & Advisory, Valby, Denmark
| | - Jamie FitzGerald
- Teagasc Food Research Centre, Moorepark, Fermoy, County Cork, Ireland
| | - Aicha Kriaa
- MICALIS, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
| | - Paul Cotter
- Teagasc Food Research Centre, Moorepark, Fermoy, County Cork, Ireland
| | - Emmanuelle Maguin
- MICALIS, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
| | | | | | - Gabriele Berg
- Institute of Environmental Biotechnology, Graz University of Technology, Graz, Austria
| | - Angela Sessitsch
- AIT Austrian Institute of Technology GmbH, Bioresources Unit, Tulln, Austria
| | - Luca Cocolin
- Department of Agriculture, Forest and Food Science, University of Turin, Grugliasco, Italy
| | | |
Collapse
|
24
|
Saraiva JP, Bartholomäus A, Toscan RB, Baldrian P, Nunes da Rocha U. Recovery of 197 eukaryotic bins reveals major challenges for eukaryote genome reconstruction from terrestrial metagenomes. Mol Ecol Resour 2023. [PMID: 36847735 DOI: 10.1111/1755-0998.13776] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 01/23/2023] [Accepted: 02/21/2023] [Indexed: 03/01/2023]
Abstract
As most eukaryotic genomes are yet to be sequenced, the mechanisms underlying their contribution to different ecosystem processes remain untapped. Although approaches to recovering Prokaryotic genomes have become common in genome biology, few studies have tackled the recovery of eukaryotic genomes from metagenomes. This study assessed the reconstruction of microbial eukaryotic genomes using 6000 metagenomes from terrestrial and some transition environments using the EukRep pipeline. Only 215 metagenomic libraries yielded eukaryotic bins. From a total of 447 eukaryotic bins recovered 197 were classified at the phylum level. Streptophytes and fungi were the most represented clades with 83 and 73 bins, respectively. More than 78% of the obtained eukaryotic bins were recovered from samples whose biomes were classified as host-associated, aquatic, and anthropogenic terrestrial. However, only 93 bins were taxonomically assigned at the genus level and 17 bins at the species level. Completeness and contamination estimates were obtained for a total of 193 bins and consisted of 44.64% (σ = 27.41%) and 3.97% (σ = 6.53%), respectively. Micromonas commoda was the most frequent taxon found while Saccharomyces cerevisiae presented the highest completeness, probably because more reference genomes are available. Current measures of completeness are based on the presence of single-copy genes. However, mapping of the contigs from the recovered eukaryotic bins to the chromosomes of the reference genomes showed many gaps, suggesting that completeness measures should also include chromosome coverage. Recovering eukaryotic genomes will benefit significantly from long-read sequencing, development of tools for dealing with repeat-rich genomes, and improved reference genomes databases.
Collapse
Affiliation(s)
- Joao Pedro Saraiva
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, Leipzig, Germany
| | | | - Rodolfo Brizola Toscan
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, Leipzig, Germany
| | - Petr Baldrian
- Laboratory of Environmental Microbiology, Institute of Microbiology of the Czech Academy of Sciences, Praha, Czech Republic
| | - Ulisses Nunes da Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, Leipzig, Germany
| |
Collapse
|
25
|
Wei W, Millward A, Koslicki D. Finding phylogeny-aware and biologically meaningful averages of metagenomic samples: L 2 UniFrac. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.02.526854. [PMID: 36778267 PMCID: PMC9915697 DOI: 10.1101/2023.02.02.526854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Metagenomic samples have high spatiotemporal variability. Hence, it is useful to summarize and characterize the microbial makeup of a given environment in a way that is biologically reasonable and interpretable. The UniFrac metric has been a robust and widely-used metric for measuring the variability between metagenomic samples. We propose that the characterization of metagenomic environments can be achieved by finding the average, a.k.a. the barycenter, among the samples with respect to the UniFrac distance. However, it is possible that such a UniFrac-average includes negative entries, making it no longer a valid representation of a metagenomic community. To overcome this intrinsic issue, we propose a special version of the UniFrac metric, termed L 2 UniFrac, which inherits the phylogenetic nature of the traditional UniFrac and with respect to which one can easily compute the average, producing biologically meaningful environment-specific "representative samples". We demonstrate the usefulness of such representative samples as well as the extended usage of L 2 UniFrac in efficient clustering of metagenomic samples, and provide mathematical characterizations and proofs to the desired properties of L 2 UniFrac. A prototype implementation is provided at: https://github.com/KoslickiLab/L2-UniFrac.git .
Collapse
Affiliation(s)
- Wei Wei
- Huck Institutes of Life Sciences, Pennsylvania State University
| | - Andrew Millward
- Department of Computer Science and Engineering, Pennsylvania State University
| | - David Koslicki
- Huck Institutes of Life Sciences, Pennsylvania State University,Department of Computer Science and Engineering, Pennsylvania State University,Department of Biology, Pennsylvania State University
| |
Collapse
|
26
|
Abdulkadir N, Saraiva JP, Schattenberg F, Toscan RB, Borim Correa F, Harms H, Müller S, da Rocha UN. Combining Flow Cytometry and Metagenomics Improves Recovery of Metagenome-Assembled Genomes in a Cell Culture from Activated Sludge. Microorganisms 2023; 11:175. [PMID: 36677467 PMCID: PMC9864227 DOI: 10.3390/microorganisms11010175] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 01/02/2023] [Accepted: 01/05/2023] [Indexed: 01/13/2023] Open
Abstract
The recovery of metagenome-assembled genomes is biased towards the most abundant species in a given community. To improve the identification of species, even if only dominant species are recovered, we investigated the integration of flow cytometry cell sorting with bioinformatics tools to recover metagenome-assembled genomes. We used a cell culture of a wastewater microbial community as our model system. Cells were separated based on fluorescence signals via flow cytometry cell sorting into sub-communities: dominant gates, low abundant gates, and outer gates into subsets of the original community. Metagenome sequencing was performed for all groups. The unsorted community was used as control. We recovered a total of 24 metagenome-assembled genomes (MAGs) representing 11 species-level genome operational taxonomic units (gOTUs). In addition, 57 ribosomal operational taxonomic units (rOTUs) affiliated with 29 taxa at species level were reconstructed from metagenomic libraries. Our approach suggests a two-fold increase in the resolution when comparing sorted and unsorted communities. Our results also indicate that species abundance is one determinant of genome recovery from metagenomes as we can recover taxa in the sorted libraries that are not present in the unsorted community. In conclusion, a combination of cell sorting and metagenomics allows the recovery of MAGs undetected without cell sorting.
Collapse
Affiliation(s)
- Nafi’u Abdulkadir
- Department of Environmental Microbiology, Helmholtz Center for Environmental Research, Permoserstrasse 15, 04318 Leipzig, Germany
- Department of Biochemistry, Faculty of Natural Science, University of Leipzig, Bruderstrasse 34, 04103 Leipzig, Germany
| | - Joao Pedro Saraiva
- Department of Environmental Microbiology, Helmholtz Center for Environmental Research, Permoserstrasse 15, 04318 Leipzig, Germany
| | - Florian Schattenberg
- Department of Environmental Microbiology, Helmholtz Center for Environmental Research, Permoserstrasse 15, 04318 Leipzig, Germany
| | - Rodolfo Brizola Toscan
- Department of Environmental Microbiology, Helmholtz Center for Environmental Research, Permoserstrasse 15, 04318 Leipzig, Germany
| | - Felipe Borim Correa
- Department of Environmental Microbiology, Helmholtz Center for Environmental Research, Permoserstrasse 15, 04318 Leipzig, Germany
| | - Hauke Harms
- Department of Environmental Microbiology, Helmholtz Center for Environmental Research, Permoserstrasse 15, 04318 Leipzig, Germany
- Department of Biochemistry, Faculty of Natural Science, University of Leipzig, Bruderstrasse 34, 04103 Leipzig, Germany
| | - Susann Müller
- Department of Environmental Microbiology, Helmholtz Center for Environmental Research, Permoserstrasse 15, 04318 Leipzig, Germany
| | - Ulisses Nunes da Rocha
- Department of Environmental Microbiology, Helmholtz Center for Environmental Research, Permoserstrasse 15, 04318 Leipzig, Germany
| |
Collapse
|
27
|
Gálvez-Merchán Á, Min KH(J, Pachter L, Booeshaghi AS. Metadata retrieval from sequence databases with ffq. Bioinformatics 2023; 39:6971839. [PMID: 36610997 PMCID: PMC9883619 DOI: 10.1093/bioinformatics/btac667] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 08/15/2022] [Accepted: 10/07/2022] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Several genomic databases host data and metadata for an ever-growing collection of sequence datasets. While these databases have a shared hierarchical structure, there are no tools specifically designed to leverage it for metadata extraction. RESULTS We present a command-line tool, called ffq, for querying user-generated data and metadata from sequence databases. Given an accession or a paper's DOI, ffq efficiently fetches metadata and links to raw data in JSON format. ffq's modularity and simplicity make it extensible to any genomic database exposing its data for programmatic access. AVAILABILITY AND IMPLEMENTATION ffq is free and open source, and the code can be found here: https://github.com/pachterlab/ffq.
Collapse
Affiliation(s)
- Ángel Gálvez-Merchán
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Kyung Hoi (Joseph) Min
- Department of Computer Science and Electrical Engineering, Massachusetts Institute of Technology, Cambridge, MA 91125, USA
| | | | | |
Collapse
|
28
|
Li M, Liu J, Zhu J, Wang H, Sun C, Gao NL, Zhao XM, Chen WH. Performance of Gut Microbiome as an Independent Diagnostic Tool for 20 Diseases: Cross-Cohort Validation of Machine-Learning Classifiers. Gut Microbes 2023; 15:2205386. [PMID: 37140125 PMCID: PMC10161951 DOI: 10.1080/19490976.2023.2205386] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/05/2023] Open
Abstract
Cross-cohort validation is essential for gut-microbiome-based disease stratification but was only performed for limited diseases. Here, we systematically evaluated the cross-cohort performance of gut microbiome-based machine-learning classifiers for 20 diseases. Using single-cohort classifiers, we obtained high predictive accuracies in intra-cohort validation (~0.77 AUC), but low accuracies in cross-cohort validation, except the intestinal diseases (~0.73 AUC). We then built combined-cohort classifiers trained on samples combined from multiple cohorts to improve the validation of non-intestinal diseases, and estimated the required sample size to achieve validation accuracies of >0.7. In addition, we observed higher validation performance for classifiers using metagenomic data than 16S amplicon data in intestinal diseases. We further quantified the cross-cohort marker consistency using a Marker Similarity Index and observed similar trends. Together, our results supported the gut microbiome as an independent diagnostic tool for intestinal diseases and revealed strategies to improve cross-cohort performance based on identified determinants of consistent cross-cohort gut microbiome alterations.
Collapse
Affiliation(s)
- Min Li
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Jinxin Liu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Jiaying Zhu
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Huarui Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Chuqing Sun
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Na L Gao
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Xing-Ming Zhao
- Department of Neurology, Zhongshan Hospital, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- International Human Phenome Institutes (Shanghai), Shanghai, China
| | - Wei-Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
- College of Life Science, Henan Normal University, Xinxiang, China
- Institution of Medical Artificial Intelligence, Binzhou Medical University, Yantai, China
| |
Collapse
|
29
|
Nata’ala MK, Avila Santos AP, Coelho Kasmanas J, Bartholomäus A, Saraiva JP, Godinho Silva S, Keller-Costa T, Costa R, Gomes NCM, Ponce de Leon Ferreira de Carvalho AC, Stadler PF, Sipoli Sanches D, Nunes da Rocha U. MarineMetagenomeDB: a public repository for curated and standardized metadata for marine metagenomes. ENVIRONMENTAL MICROBIOME 2022; 17:57. [PMID: 36401317 PMCID: PMC9675116 DOI: 10.1186/s40793-022-00449-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 09/15/2022] [Indexed: 05/17/2023]
Abstract
BACKGROUND Metagenomics is an expanding field within microbial ecology, microbiology, and related disciplines. The number of metagenomes deposited in major public repositories such as Sequence Read Archive (SRA) and Metagenomic Rapid Annotations using Subsystems Technology (MG-RAST) is rising exponentially. However, data mining and interpretation can be challenging due to mis-annotated and misleading metadata entries. In this study, we describe the Marine Metagenome Metadata Database (MarineMetagenomeDB) to help researchers identify marine metagenomes of interest for re-analysis and meta-analysis. To this end, we have manually curated the associated metadata of several thousands of microbial metagenomes currently deposited at SRA and MG-RAST. RESULTS In total, 125 terms were curated according to 17 different classes (e.g., biome, material, oceanic zone, geographic feature and oceanographic phenomena). Other standardized features include sample attributes (e.g., salinity, depth), sample location (e.g., latitude, longitude), and sequencing features (e.g., sequencing platform, sequence count). MarineMetagenomeDB version 1.0 contains 11,449 marine metagenomes from SRA and MG-RAST distributed across all oceans and several seas. Most samples were sequenced using Illumina sequencing technology (84.33%). More than 55% of the samples were collected from the Pacific and the Atlantic Oceans. About 40% of the samples had their biomes assigned as 'ocean'. The 'Quick Search' and 'Advanced Search' tabs allow users to use different filters to select samples of interest dynamically in the web app. The interactive map allows the visualization of samples based on their location on the world map. The web app is also equipped with a novel download tool (on both Windows and Linux operating systems), that allows easy download of raw sequence data of selected samples from their respective repositories. As a use case, we demonstrated how to use the MarineMetagenomeDB web app to select estuarine metagenomes for potential large-scale microbial biogeography studies. CONCLUSION The MarineMetagenomeDB is a powerful resource for non-bioinformaticians to find marine metagenome samples with curated metadata and stimulate meta-studies involving marine microbiomes. Our user-friendly web app is publicly available at https://webapp.ufz.de/marmdb/ .
Collapse
Affiliation(s)
- Muhammad Kabiru Nata’ala
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ GmbH, 04318 Leipzig, Saxony Germany
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, 04107 Leipzig, Saxony Germany
| | - Anderson P. Avila Santos
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ GmbH, 04318 Leipzig, Saxony Germany
- Institute of Mathematics and Computer Sciences, University of Sao Paulo, São Carlos, Brazil
| | - Jonas Coelho Kasmanas
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ GmbH, 04318 Leipzig, Saxony Germany
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, 04107 Leipzig, Saxony Germany
- Institute of Mathematics and Computer Sciences, University of Sao Paulo, São Carlos, Brazil
| | - Alexander Bartholomäus
- Section 3.7 Geomicrobiology, GFZ German Research Centre for Geosciences, 14473 Telegrafenberg, Potsdam Germany
| | - João Pedro Saraiva
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ GmbH, 04318 Leipzig, Saxony Germany
| | - Sandra Godinho Silva
- Department of Bioengineering and Institute for Bioengineering and Biosciences, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal
| | - Tina Keller-Costa
- Department of Bioengineering and Institute for Bioengineering and Biosciences, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal
| | - Rodrigo Costa
- Department of Bioengineering and Institute for Bioengineering and Biosciences, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal
| | - Newton C. M. Gomes
- Department of Biology and Centre for Environmental and Marine Studies (CESAM), University of Aveiro, 3810-193 Aveiro, Portugal
| | | | - Peter F. Stadler
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, 04107 Leipzig, Saxony Germany
| | | | - Ulisses Nunes da Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ GmbH, 04318 Leipzig, Saxony Germany
| |
Collapse
|
30
|
Nassar M, Rogers AB, Talo' F, Sanchez S, Shafique Z, Finn RD, McEntyre J. A machine learning framework for discovery and enrichment of metagenomics metadata from open access publications. Gigascience 2022; 11:giac077. [PMID: 35950838 PMCID: PMC9366992 DOI: 10.1093/gigascience/giac077] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 06/13/2022] [Accepted: 07/12/2022] [Indexed: 11/17/2022] Open
Abstract
Metagenomics is a culture-independent method for studying the microbes inhabiting a particular environment. Comparing the composition of samples (functionally/taxonomically), either from a longitudinal study or cross-sectional studies, can provide clues into how the microbiota has adapted to the environment. However, a recurring challenge, especially when comparing results between independent studies, is that key metadata about the sample and molecular methods used to extract and sequence the genetic material are often missing from sequence records, making it difficult to account for confounding factors. Nevertheless, these missing metadata may be found in the narrative of publications describing the research. Here, we describe a machine learning framework that automatically extracts essential metadata for a wide range of metagenomics studies from the literature contained in Europe PMC. This framework has enabled the extraction of metadata from 114,099 publications in Europe PMC, including 19,900 publications describing metagenomics studies in European Nucleotide Archive (ENA) and MGnify. Using this framework, a new metagenomics annotations pipeline was developed and integrated into Europe PMC to regularly enrich up-to-date ENA and MGnify metagenomics studies with metadata extracted from research articles. These metadata are now available for researchers to explore and retrieve in the MGnify and Europe PMC websites, as well as Europe PMC annotations API.
Collapse
Affiliation(s)
- Maaly Nassar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Current affiliation: SciBite - an Elsevier Company, Wellcome Genome Campus, Hinxton, Cambridge CB10 1DR, UK
| | - Alexander B Rogers
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Francesco Talo'
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Santiago Sanchez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zunaira Shafique
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Johanna McEntyre
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
31
|
Biderre‐Petit C, Charvy J, Bronner G, Chauvet M, Debroas D, Gardon H, Hennequin C, Jouan‐Dufournel I, Moné A, Monjot A, Ravet V, Vellet A, Lepère C. FreshOmics
: a manually curated and standardized –omics database for investigating freshwater microbiomes. Mol Ecol Resour 2022; 23:222-232. [DOI: 10.1111/1755-0998.13692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 07/22/2022] [Accepted: 07/25/2022] [Indexed: 11/30/2022]
Affiliation(s)
- Corinne Biderre‐Petit
- CNRS, Laboratoire Microorganismes: Génome et Environnement Université Clermont Auvergne Clermont‐Ferrand France
| | - Jean‐Christophe Charvy
- CNRS, Laboratoire Microorganismes: Génome et Environnement Université Clermont Auvergne Clermont‐Ferrand France
| | - Gisèle Bronner
- CNRS, Laboratoire Microorganismes: Génome et Environnement Université Clermont Auvergne Clermont‐Ferrand France
| | - Marina Chauvet
- CNRS, Laboratoire Microorganismes: Génome et Environnement Université Clermont Auvergne Clermont‐Ferrand France
| | - Didier Debroas
- CNRS, Laboratoire Microorganismes: Génome et Environnement Université Clermont Auvergne Clermont‐Ferrand France
| | - Hélène Gardon
- CNRS, Laboratoire Microorganismes: Génome et Environnement Université Clermont Auvergne Clermont‐Ferrand France
| | - Claire Hennequin
- CNRS, Laboratoire Microorganismes: Génome et Environnement Université Clermont Auvergne Clermont‐Ferrand France
| | - Isabelle Jouan‐Dufournel
- CNRS, Laboratoire Microorganismes: Génome et Environnement Université Clermont Auvergne Clermont‐Ferrand France
| | - Anne Moné
- CNRS, Laboratoire Microorganismes: Génome et Environnement Université Clermont Auvergne Clermont‐Ferrand France
| | - Arthur Monjot
- CNRS, Laboratoire Microorganismes: Génome et Environnement Université Clermont Auvergne Clermont‐Ferrand France
| | - Viviane Ravet
- CNRS, Laboratoire Microorganismes: Génome et Environnement Université Clermont Auvergne Clermont‐Ferrand France
| | - Agnès Vellet
- CNRS, Laboratoire Microorganismes: Génome et Environnement Université Clermont Auvergne Clermont‐Ferrand France
| | - Cécile Lepère
- CNRS, Laboratoire Microorganismes: Génome et Environnement Université Clermont Auvergne Clermont‐Ferrand France
| |
Collapse
|
32
|
Arabinoxylan and Pectin Metabolism in Crohn’s Disease Microbiota: An In Silico Study. Int J Mol Sci 2022; 23:ijms23137093. [PMID: 35806099 PMCID: PMC9266297 DOI: 10.3390/ijms23137093] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 06/20/2022] [Accepted: 06/22/2022] [Indexed: 12/03/2022] Open
Abstract
Inflammatory bowel disease is a chronic disorder including ulcerative colitis and Crohn’s disease (CD). Gut dysbiosis is often associated with CD, and metagenomics allows a better understanding of the microbial communities involved. The objective of this study was to reconstruct in silico carbohydrate metabolic capabilities from metagenome-assembled genomes (MAGs) obtained from healthy and CD individuals. This computational method was developed as a mean to aid rationally designed prebiotic interventions to rebalance CD dysbiosis, with a focus on metabolism of emergent prebiotics derived from arabinoxylan and pectin. Up to 1196 and 1577 MAGs were recovered from CD and healthy people, respectively. MAGs of Akkermansia muciniphila, Barnesiella viscericola DSM 18177 and Paraprevotella xylaniphila YIT 11841 showed a wide range of unique and specific enzymes acting on arabinoxylan and pectin. These glycosidases were also found in MAGs recovered from CD patients. Interestingly, these arabinoxylan and pectin degraders are predicted to exhibit metabolic interactions with other gut microbes reduced in CD. Thus, administration of arabinoxylan and pectin may ameliorate dysbiosis in CD by promoting species with key metabolic functions, capable of cross-feeding other beneficial species. These computational methods may be of special interest for the rational design of prebiotic ingredients targeting at CD.
Collapse
|
33
|
Abstract
With the rapid development of high-throughput sequencing technology, the amount of metagenomic data (including both 16S and whole-genome sequencing data) in public repositories is increasing exponentially. However, owing to the large and decentralized nature of the data, it is still difficult for users to mine, compare, and analyze the data. The animal metagenome database (AnimalMetagenome DB) integrates metagenomic sequencing data with host information, making it easier for users to find data of interest. The AnimalMetagenome DB is designed to contain all public metagenomic data from animals, and the data are divided into domestic and wild animal categories. Users can browse, search, and download animal metagenomic data of interest based on different attributes of the metadata such as animal species, sample site, study purpose, and DNA extraction method. The AnimalMetagenome DB version 1.0 includes metadata for 82,097 metagenomes from 4 domestic animals (pigs, bovines, horses, and sheep) and 540 wild animals. These metagenomes cover 15 years of experiments, 73 countries, 1,044 studies, 63,214 amplicon sequencing data, and 10,672 whole genome sequencing data. All data in the database are hosted and available in figshare 10.6084/m9.figshare.19728619. Measurement(s) | Metagenome metadata | Technology Type(s) | Collection and integration the metagenomic information of multiple animal species | Factor Type(s) | animal | Sample Characteristic - Organism | animal | Sample Characteristic - Environment | metagenome | Sample Characteristic - Location | United States of America • People’s Republic of China • Canada |
Collapse
|
34
|
Agostinetto G, Bozzi D, Porro D, Casiraghi M, Labra M, Bruno A. SKIOME Project: a curated collection of skin microbiome datasets enriched with study-related metadata. Database (Oxford) 2022; 2022:6586378. [PMID: 35576001 PMCID: PMC9216470 DOI: 10.1093/database/baac033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 02/25/2022] [Accepted: 05/09/2022] [Indexed: 04/07/2023]
Abstract
Large amounts of data from microbiome-related studies have been (and are currently being) deposited on international public databases. These datasets represent a valuable resource for the microbiome research community and could serve future researchers interested in integrating multiple datasets into powerful meta-analyses. However, this huge amount of data lacks harmonization and it is far from being completely exploited in its full potential to build a foundation that places microbiome research at the nexus of many subdisciplines within and beyond biology. Thus, it urges the need for data accessibility and reusability, according to findable, accessible, interoperable and reusable (FAIR) principles, as supported by National Microbiome Data Collaborative and FAIR Microbiome. To tackle the challenge of accelerating discovery and advances in skin microbiome research, we collected, integrated and organized existing microbiome data resources from human skin 16S rRNA amplicon-sequencing experiments. We generated a comprehensive collection of datasets, enriched in metadata, and organized this information into data frames ready to be integrated into microbiome research projects and advanced post-processing analyses, such as data science applications (e.g. machine learning). Furthermore, we have created a data retrieval and curation framework built on three different stages to maximize the retrieval of datasets and metadata associated with them. Lastly, we highlighted some caveats regarding metadata retrieval and suggested ways to improve future metadata submissions. Overall, our work resulted in a curated skin microbiome datasets collection accompanied by a state-of-the-art analysis of the last 10 years of the skin microbiome field. Database URL: https://github.com/giuliaago/SKIOMEMetadataRetrieval.
Collapse
Affiliation(s)
- Giulia Agostinetto
- *Corresponding author: Giulia Agostinetto. E-mail: and Antonia Bruno. Tel: +0039 0264483413; E-mail:
| | | | - Danilo Porro
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza, 2, Milan 20126, Italy
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), via Fratelli Cervi, 93, Segrate (MI) 20054, Italy
| | - Maurizio Casiraghi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza, 2, Milan 20126, Italy
| | - Massimo Labra
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza, 2, Milan 20126, Italy
| | - Antonia Bruno
- *Corresponding author: Giulia Agostinetto. E-mail: and Antonia Bruno. Tel: +0039 0264483413; E-mail:
| |
Collapse
|
35
|
Xu Y, Lei B, Zhang Q, Lei Y, Li C, Li X, Yao R, Hu R, Liu K, Wang Y, Cui Y, Wang L, Dai J, Li L, Ni W, Zhou P, Liu ZX, Hu S. ADDAGMA: A Database for Domestic Animal Gut Microbiome Atlas. Comput Struct Biotechnol J 2022; 20:891-898. [PMID: 35222847 PMCID: PMC8858777 DOI: 10.1016/j.csbj.2022.02.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 02/08/2022] [Accepted: 02/08/2022] [Indexed: 12/12/2022] Open
Abstract
We curated all publicly available high-throughput sequencing data on gut microbiomes for four domestic animal species. We compiled data for multiple levels of microbial taxa and classified the associated animal phenotypes in detail. Exhibiting the dynamic changes of animal gut microbes under different conditions. We developed a user-friendly website for browsing, searching, and displaying dynamic changes in animal gut microbes under different conditions.
Animal gut microbiomes play important roles in the health, diseases, and production of animal hosts. The volume of animal gut metagenomic data, including both 16S amplicon and metagenomic sequencing data, has been increasing exponentially in recent years, making it increasingly difficult for researchers to query, retrieve, and reanalyze experimental data and explore new hypotheses. We designed a database called the domestic animal gut microbiome atlas (ADDAGMA) to house all publicly available, high-throughput sequencing data for the gut microbiome in domestic animals. ADDAGMA enhances the availability and accessibility of the rapidly growing body of metagenomic data. We annotated microbial and metadata from four domestic animals (cattle, horse, pig, and chicken) from 356 published papers to construct a comprehensive database that is equipped with browse and search functions, enabling users to make customized, complicated, biologically relevant queries. Users can quickly and accurately obtain experimental information on sample types, conditions, and sequencing platforms, and experimental results including microbial relative abundances, microbial taxon-associated host phenotype, and P-values for gut microbes of interest. The current version of ADDAGMA includes 290,422 quantification events (changes in abundance) for 3215 microbial taxa associated with 48 phenotypes. ADDAGMA presently covers gut microbiota sequencing data from pig, cattle, horse, and chicken, but will be expanded to include other domestic animals. ADDAGMA is freely available at (http://addagma.omicsbio.info/).
Collapse
Affiliation(s)
- Yueren Xu
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Bingbing Lei
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Qingfeng Zhang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Yunjiao Lei
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Cunyuan Li
- College of Animal Science and Technology, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Xiaoyue Li
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Rui Yao
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Ruirui Hu
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Kaiping Liu
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Yue Wang
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Yuying Cui
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Limin Wang
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, Xinjiang 832003, China
| | - Jihong Dai
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Lei Li
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Wei Ni
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Ping Zhou
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, Xinjiang 832003, China
- Corresponding authors.
| | - Ze-Xian Liu
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
- Corresponding authors.
| | - Shengwei Hu
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
- Corresponding authors.
| |
Collapse
|
36
|
Molina NM, Sola-Leyva A, Haahr T, Aghajanova L, Laudanski P, Castilla JA, Altmäe S. Analysing endometrial microbiome: methodological considerations and recommendations for good practice. Hum Reprod 2021; 36:859-879. [DOI: 10.1093/humrep/deab009] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 12/02/2020] [Indexed: 12/12/2022] Open
Abstract
ABSTRACT
There is growing evidence that the upper female genital tract is not sterile, harbouring its own microbial communities. However, the significance and the potential effect of endometrial microorganisms on reproductive functions remain to be fully elucidated. Analysing the endometrial microbiome, the microbes and their genetic material present in the endometrium, is an emerging area of study. The initial studies suggest it is associated with poor reproductive outcomes and with different gynaecological pathologies. Nevertheless, studying a low-biomass microbial niche as is endometrium, the challenge is to conduct well-designed and well-controlled experiments in order to avoid and adjust for the risk of contamination, especially from the lower genital tract. Herein, we aim to highlight methodological considerations and propose good practice recommendations for future endometrial microbiome studies.
Collapse
Affiliation(s)
- Nerea M Molina
- Department of Biochemistry and Molecular Biology, Faculty of Sciences, University of Granada, Granada 18071, Spain
- Instituto de Investigación Biosanitaria ibs.GRANADA, Granada 18014, Spain
| | - Alberto Sola-Leyva
- Department of Biochemistry and Molecular Biology, Faculty of Sciences, University of Granada, Granada 18071, Spain
- Instituto de Investigación Biosanitaria ibs.GRANADA, Granada 18014, Spain
| | - Thor Haahr
- The Fertility Clinic, Skive Regional Hospital, Skive 7800, Denmark
| | - Lusine Aghajanova
- Division of Reproductive Endocrinology and Infertility, Department of Obstetrics and Gynecology, Stanford School of Medicine, Sunnyvale, CA 94087, USA
| | - Piotr Laudanski
- Department of Obstetrics and Gynecology, Medical University of Warsaw, Warsaw 02-015, Poland
| | - Jose Antonio Castilla
- Instituto de Investigación Biosanitaria ibs.GRANADA, Granada 18014, Spain
- Unidad de Reproducción, UGC de Obstetricia y Ginecología, Hospital Universitario Virgen de las Nieves, Granada 18012, Spain
- CEIFER Biobanco—NextClinics, Granada 18004, Spain
| | - Signe Altmäe
- Department of Biochemistry and Molecular Biology, Faculty of Sciences, University of Granada, Granada 18071, Spain
- Instituto de Investigación Biosanitaria ibs.GRANADA, Granada 18014, Spain
- Competence Centre on Health Technologies, Tartu 50410, Estonia
| |
Collapse
|
37
|
Rigden DJ, Fernández XM. The 2021 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res 2021; 49:D1-D9. [PMID: 33396976 PMCID: PMC7778882 DOI: 10.1093/nar/gkaa1216] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The 2021 Nucleic Acids Research database Issue contains 189 papers spanning a wide range of biological fields and investigation. It includes 89 papers reporting on new databases and 90 covering recent changes to resources previously published in the Issue. A further ten are updates on databases most recently published elsewhere. Seven new databases focus on COVID-19 and SARS-CoV-2 and many others offer resources for studying the virus. Major returning nucleic acid databases include NONCODE, Rfam and RNAcentral. Protein family and domain databases include COG, Pfam, SMART and Panther. Protein structures are covered by RCSB PDB and dispersed proteins by PED and MobiDB. In metabolism and signalling, STRING, KEGG and WikiPathways are featured, along with returning KLIFS and new DKK and KinaseMD, all focused on kinases. IMG/M and IMG/VR update in the microbial and viral genome resources section, while human and model organism genomics resources include Flybase, Ensembl and UCSC Genome Browser. Cancer studies are covered by updates from canSAR and PINA, as well as newcomers CNCdatabase and Oncovar for cancer drivers. Plant comparative genomics is catered for by updates from Gramene and GreenPhylDB. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been substantially updated, revisiting nearly 1000 entries, adding 90 new resources and eliminating 86 obsolete databases, bringing the current total to 1641 databases. It is available at https://www.oxfordjournals.org/nar/database/c/.
Collapse
Affiliation(s)
- Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | | |
Collapse
|