1
|
Jiang Y, McDonald D, Perry D, Knight R, Mirarab S. Scaling DEPP phylogenetic placement to ultra-large reference trees: a tree-aware ensemble approach. Bioinformatics 2024; 40:btae361. [PMID: 38870525 PMCID: PMC11193062 DOI: 10.1093/bioinformatics/btae361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 04/09/2024] [Accepted: 06/12/2024] [Indexed: 06/15/2024] Open
Abstract
MOTIVATION Phylogenetic placement of a query sequence on a backbone tree is increasingly used across biomedical sciences to identify the content of a sample from its DNA content. The accuracy of such analyses depends on the density of the backbone tree, making it crucial that placement methods scale to very large trees. Moreover, a new paradigm has been recently proposed to place sequences on the species tree using single-gene data. The goal is to better characterize the samples and to enable combined analyses of marker-gene (e.g., 16S rRNA gene amplicon) and genome-wide data. The recent method DEPP enables performing such analyses using metric learning. However, metric learning is hampered by a need to compute and save a quadratically growing matrix of pairwise distances during training. Thus, the training phase of DEPP does not scale to more than roughly 10 000 backbone species, a problem that we faced when trying to use our recently released Greengenes2 (GG2) reference tree containing 331 270 species. RESULTS This paper explores divide-and-conquer for training ensembles of DEPP models, culminating in a method called C-DEPP. While divide-and-conquer has been extensively used in phylogenetics, applying divide-and-conquer to data-hungry machine-learning methods needs nuance. C-DEPP uses carefully crafted techniques to enable quasi-linear scaling while maintaining accuracy. C-DEPP enables placing 20 million 16S fragments on the GG2 reference tree in 41 h of computation. AVAILABILITY AND IMPLEMENTATION The dataset and C-DEPP software are freely available at https://github.com/yueyujiang/dataset_cdepp/.
Collapse
Affiliation(s)
- Yueyu Jiang
- Electrical and Computer Engineering Department, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, United States
| | - Daniel McDonald
- Pediatrics Department, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, United States
| | - Daniela Perry
- Pediatrics Department, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, United States
| | - Rob Knight
- Pediatrics Department, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, United States
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, United States
| | - Siavash Mirarab
- Electrical and Computer Engineering Department, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, United States
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, United States
| |
Collapse
|
2
|
Balakrishnan K, Krishnaa D, Balakrishnan G, Manickam M, Abdulkader AM, Dharumadurai D. Association of Bacterial Communities with Psychedelic Mushroom and Soil as Revealed in 16S rRNA Gene Sequencing. Appl Biochem Biotechnol 2024; 196:2566-2590. [PMID: 37103739 DOI: 10.1007/s12010-023-04527-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/11/2023] [Indexed: 04/28/2023]
Abstract
Microbial communities' resident in the mushroom fruiting body and the soil around it play critical roles in the growth and propagation of the mushroom. Among the microbial communities associated with psychedelic mushrooms and the rhizosphere soil, bacterial communities are considered vital since their presence greatly influences the health of the mushrooms. The present study aimed at finding the microbiota present in the psychedelic mushroom Psilocybe cubensis and the soil the mushroom inhabits. The study was conducted at two different locations in Kodaikanal, Tamil Nadu, India. The composition and structure of microbial communities in the mushroom fruiting body and the soil were deciphered. The genomes of the microbial communities were directly assessed. High-throughput amplicon sequencing revealed distinct microbial diversity in the mushroom and the related soil. The interaction of environmental and anthropogenic factors appeared to have a significant impact on the mushroom and soil microbiome. The most abundant bacterial genera were Ochrobactrum, Stenotrophomonas, Achromobacter, and Brevundimonas. Thus, the study advances the knowledge of the composition of the microbiome and microbial ecology of a psychedelic mushroom, and paves the way for in-depth investigation of the influence of microbiota on the mushroom, with special emphasis on the impact of bacterial communities on mushroom growth. Further studies are required for a deeper understanding of the microbial communities that influence the growth of P. cubensis mushroom.
Collapse
Affiliation(s)
- Karthiyayini Balakrishnan
- Department of Microbiology, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
- National Centre for alternatives to Animal Experiments, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
| | - Dheebhashriee Krishnaa
- Department of Microbiology, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
| | - Gowdhami Balakrishnan
- National Centre for alternatives to Animal Experiments, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
- Department of Animal Science, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
| | - Muthuselvam Manickam
- Department of Biotechnology, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
| | - Akbarsha Mohammad Abdulkader
- Mahatma Gandhi-Dorenkamp Centre, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
- Department of Biotechnology & Research Coordinator, National College (Autonomous), Tiruchirappalli, Tamil Nadu, India
| | | |
Collapse
|
3
|
Hang Y, Qu H, Yang J, Li Z, Ma S, Tang C, Wu C, Bao Y, Jiang F, Shu J. Exploration of programmed cell death-associated characteristics and immune infiltration in neonatal sepsis: new insights from bioinformatics analysis and machine learning. BMC Pediatr 2024; 24:67. [PMID: 38245687 PMCID: PMC10799360 DOI: 10.1186/s12887-024-04555-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 01/11/2024] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Neonatal sepsis, a perilous medical situation, is typified by the malfunction of organs and serves as the primary reason for neonatal mortality. Nevertheless, the mechanisms underlying newborn sepsis remain ambiguous. Programmed cell death (PCD) has a connection with numerous infectious illnesses and holds a significant function in newborn sepsis, potentially serving as a marker for diagnosing the condition. METHODS From the GEO public repository, we selected two groups, which we referred to as the training and validation sets, for our analysis of neonatal sepsis. We obtained PCD-related genes from 12 different patterns, including databases and published literature. We first obtained differential expressed genes (DEGs) for neonatal sepsis and controls. Three advanced machine learning techniques, namely LASSO, SVM-RFE, and RF, were employed to identify potential genes connected to PCD. To further validate the results, PPI networks were constructed, artificial neural networks and consensus clustering were used. Subsequently, a neonatal sepsis diagnostic prediction model was developed and evaluated. We conducted an analysis of immune cell infiltration to examine immune cell dysregulation in neonatal sepsis, and we established a ceRNA network based on the identified marker genes. RESULTS Within the context of neonatal sepsis, a total of 49 genes exhibited an intersection between the differentially expressed genes (DEGs) and those associated with programmed cell death (PCD). Utilizing three distinct machine learning techniques, six genes were identified as common to both DEGs and PCD-associated genes. A diagnostic model was subsequently constructed by integrating differential expression profiles, and subsequently validated by conducting artificial neural networks and consensus clustering. Receiver operating characteristic (ROC) curves were employed to assess the diagnostic merit of the model, which yielded promising results. The immune infiltration analysis revealed notable disparities in patients diagnosed with neonatal sepsis. Furthermore, based on the identified marker genes, the ceRNA network revealed an intricate regulatory interplay. CONCLUSION In our investigation, we methodically identified six marker genes (AP3B2, STAT3, TSPO, S100A9, GNS, and CX3CR1). An effective diagnostic prediction model emerged from an exhaustive analysis within the training group (AUC 0.930, 95%CI 0.887-0.965) and the validation group (AUC 0.977, 95%CI 0.935-1.000).
Collapse
Affiliation(s)
- Yun Hang
- Department of Pediatrics, The Fourth Affiliated Hospital of Jiangsu University, Zhenjiang, 212001, China
| | - Huanxia Qu
- Department of Blood Transfusion, Zhenjiang First People's Hospital, Zhenjiang, China
| | - Juanzhi Yang
- Department of Pediatrics, The Fourth Affiliated Hospital of Jiangsu University, Zhenjiang, 212001, China
| | - Zhang Li
- Department of Pediatrics, The Fourth Affiliated Hospital of Jiangsu University, Zhenjiang, 212001, China
| | - Shiqi Ma
- Department of Pediatrics, The Fourth Affiliated Hospital of Jiangsu University, Zhenjiang, 212001, China
| | - Chenlu Tang
- Department of Pediatrics, The Fourth Affiliated Hospital of Jiangsu University, Zhenjiang, 212001, China
| | - Chuyan Wu
- Department of Rehabilitation Medicine, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yunlei Bao
- Department of Neonatology, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, 200011, China
| | - Feng Jiang
- Department of Neonatology, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, 200011, China.
| | - Jin Shu
- Department of Pediatrics, The Fourth Affiliated Hospital of Jiangsu University, Zhenjiang, 212001, China.
| |
Collapse
|
4
|
Kiran A, Hanachi M, Alsayed N, Fassatoui M, Oduaran OH, Allali I, Maslamoney S, Meintjes A, Zass L, Rocha JD, Kefi R, Benkahla A, Ghedira K, Panji S, Mulder N, Fadlelmola FM, Souiai O. The African Human Microbiome Portal: a public web portal of curated metagenomic metadata. Database (Oxford) 2024; 2024:baad092. [PMID: 38204360 PMCID: PMC10782148 DOI: 10.1093/database/baad092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 11/03/2023] [Accepted: 12/21/2023] [Indexed: 01/12/2024]
Abstract
There is growing evidence that comprehensive and harmonized metadata are fundamental for effective public data reusability. However, it is often challenging to extract accurate metadata from public repositories. Of particular concern is the metagenomic data related to African individuals, which often omit important information about the particular features of these populations. As part of a collaborative consortium, H3ABioNet, we created a web portal, namely the African Human Microbiome Portal (AHMP), exclusively dedicated to metadata related to African human microbiome samples. Metadata were collected from various public repositories prior to cleaning, curation and harmonization according to a pre-established guideline and using ontology terms. These metadata sets can be accessed at https://microbiome.h3abionet.org/. This web portal is open access and offers an interactive visualization of 14 889 records from 70 bioprojects associated with 72 peer reviewed research articles. It also offers the ability to download harmonized metadata according to the user's applied filters. The AHMP thereby supports metadata search and retrieve operations, facilitating, thus, access to relevant studies linked to the African Human microbiome. Database URL: https://microbiome.h3abionet.org/.
Collapse
Affiliation(s)
| | - Mariem Hanachi
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institute Pasteur of Tunis, University Tunis El Manar, Tunis 1002, Tunisia
- Faculty of Science of Bizerte, University of Carthage, Tunis, Tunisia
| | - Nihad Alsayed
- Kush Centre for Genomics and Biomedical Informatics, Biotechnology Perspectives Organization, Khartoum, Sudan
| | - Meriem Fassatoui
- Laboratory of Biomedical Genomics & Oncogenetics, Institut Pasteur de Tunis, University Tunis El Manar, Tunis 1002, Tunisia
| | - Ovokeraye H Oduaran
- The Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
| | - Imane Allali
- Laboratory of Human Pathologies Biology, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Suresh Maslamoney
- Computational Biology Division, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| | - Ayton Meintjes
- Computational Biology Division, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| | - Lyndon Zass
- Computational Biology Division, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| | - Jorge Da Rocha
- The Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
| | - Rym Kefi
- Laboratory of Biomedical Genomics & Oncogenetics, Institut Pasteur de Tunis, University Tunis El Manar, Tunis 1002, Tunisia
| | - Alia Benkahla
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institute Pasteur of Tunis, University Tunis El Manar, Tunis 1002, Tunisia
| | - Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institute Pasteur of Tunis, University Tunis El Manar, Tunis 1002, Tunisia
| | - Sumir Panji
- Computational Biology Division, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| | - Faisal M Fadlelmola
- Kush Centre for Genomics and Biomedical Informatics, Biotechnology Perspectives Organization, Khartoum, Sudan
| | - Oussema Souiai
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institute Pasteur of Tunis, University Tunis El Manar, Tunis 1002, Tunisia
- Malawi-Liverpool-Wellcome Trust, Blantyre 3, Malawi
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool CH64 7TE, UK
| |
Collapse
|
5
|
Gao X, Chen K, Xiong J, Zou D, Yang F, Ma Y, Jiang C, Gao X, Wang G, Gu S, Zhang P, Luo S, Huang K, Bao Y, Zhang Z, Ma L, Miao W. The P10K database: a data portal for the protist 10 000 genomes project. Nucleic Acids Res 2024; 52:D747-D755. [PMID: 37930867 PMCID: PMC10767852 DOI: 10.1093/nar/gkad992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/03/2023] [Accepted: 10/17/2023] [Indexed: 11/08/2023] Open
Abstract
Protists, a highly diverse group of microscopic eukaryotic organisms distinct from fungi, animals and plants, exert crucial roles within the earth's biosphere. However, the genomes of only a small fraction of known protist species have been published and made publicly accessible. To address this constraint, the Protist 10 000 Genomes Project (P10K) was initiated, implementing a specialized pipeline for single-cell genome/transcriptome assembly, decontamination and annotation of protists. The resultant P10K database (https://ngdc.cncb.ac.cn/p10k/) serves as a comprehensive platform, collating and disseminating genome sequences and annotations from diverse protist groups. Currently, the P10K database has incorporated 2959 genomes and transcriptomes, including 1101 newly sequenced datasets by P10K and 1858 publicly available datasets. Notably, it covers 45% of the protist orders, with a significant representation (53% coverage) of ciliates, featuring nearly a thousand genomes/transcriptomes. Intriguingly, analysis of the unique codon table usage among ciliates has revealed differences compared to the NCBI taxonomy system, suggesting a need to revise the codon tables used for these species. Collectively, the P10K database serves as a valuable repository of genetic resources for protist research and aims to expand its collection by incorporating more sequenced data and advanced analysis tools to benefit protist studies worldwide.
Collapse
Affiliation(s)
- Xinxin Gao
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kai Chen
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Jie Xiong
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture, Chinese Academy of Sciences, Wuhan 430072, China
| | - Dong Zou
- China National Center for Bioinformation, Beijing 100101, China
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Fangdian Yang
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Yingke Ma
- China National Center for Bioinformation, Beijing 100101, China
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Chuanqi Jiang
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Xiaoxuan Gao
- Shandong University of Technology, Zibo 255000, China
| | - Guangying Wang
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Siyu Gu
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Peng Zhang
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Shuai Luo
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Kaiyao Huang
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- Key laboratory of Lake and Watershed Science for Water Security, Chinese Academy of Sciences, Nanjing 210008, China
| | - Yiming Bao
- University of Chinese Academy of Sciences, Beijing 100049, China
- China National Center for Bioinformation, Beijing 100101, China
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhang Zhang
- University of Chinese Academy of Sciences, Beijing 100049, China
- China National Center for Bioinformation, Beijing 100101, China
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Lina Ma
- University of Chinese Academy of Sciences, Beijing 100049, China
- China National Center for Bioinformation, Beijing 100101, China
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Wei Miao
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- Key laboratory of Lake and Watershed Science for Water Security, Chinese Academy of Sciences, Nanjing 210008, China
- Hubei Hongshan Laboratory, Wuhan 430070, China
| |
Collapse
|
6
|
Song Y, Zhong S, Li Y, Jiang M, Wei Q. Constructing an Interactive and Integrated Analysis and Identification Platform for Pathogenic Microorganisms to Support Surveillance Capacity. Genes (Basel) 2023; 14:2156. [PMID: 38136978 PMCID: PMC10742832 DOI: 10.3390/genes14122156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 11/09/2023] [Accepted: 11/27/2023] [Indexed: 12/24/2023] Open
Abstract
INTRODUCTION Whole genome sequencing (WGS) holds significant promise for epidemiological inquiries, as it enables the identification and tracking of pathogenic origins and dissemination through comprehensive genome analysis. This method is widely preferred for investigating outbreaks and monitoring pathogen activity. However, the effective utilization of microbiome sequencing data remains a challenge for clinical and public health experts. Through the National Pathogen Resource Center, we have constructed a dynamic and interactive online analysis platform to facilitate the in-depth analysis and use of pathogen genomic data, by public health and associated professionals, to support infectious disease surveillance framework building and capacity warnings. METHOD The platform was implemented using the Java programming language, and the front-end pages were developed using the VUE framework, following the MVC (Model-View-Controller) pattern to enable interactive service functionalities for front-end data collection and back-end data computation. Cloud computing services were employed to integrate biological information analysis tools for conducting fundamental analysis on sequencing data. RESULT The platform achieved the goal of non-programming analysis, providing an interactive visual interface that allows users to visually obtain results by setting parameters in web pages. Moreover, the platform allows users to export results in various formats to further support their research. DISCUSSION We have established a dynamic and interactive online platform for bioinformatics analysis. By encapsulating the complex background experiments and analysis processes in a cloud-based service platform, the complex background experiments and analysis processes are presented to the end-user in a simple and interactive manner. It facilitates real-time data mining and analysis by allowing users to independently select parameters and generate analysis results at the click of a button, based on their needs, without the need for a programming foundation.
Collapse
Affiliation(s)
- Yang Song
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Chinese Center for Disease Control and Prevention, Beijing 102206, China;
| | - Songchao Zhong
- National Pathogen Resource Center, Chinese Center for Disease Control and Prevention, Beijing 102206, China; (S.Z.); (Y.L.); (M.J.)
| | - Yixiao Li
- National Pathogen Resource Center, Chinese Center for Disease Control and Prevention, Beijing 102206, China; (S.Z.); (Y.L.); (M.J.)
| | - Mengnan Jiang
- National Pathogen Resource Center, Chinese Center for Disease Control and Prevention, Beijing 102206, China; (S.Z.); (Y.L.); (M.J.)
| | - Qiang Wei
- National Pathogen Resource Center, Chinese Center for Disease Control and Prevention, Beijing 102206, China; (S.Z.); (Y.L.); (M.J.)
| |
Collapse
|
7
|
Kim J, Koh H. MiTree: A Unified Web Cloud Analytic Platform for User-Friendly and Interpretable Microbiome Data Mining Using Tree-Based Methods. Microorganisms 2023; 11:2816. [PMID: 38004827 PMCID: PMC10672986 DOI: 10.3390/microorganisms11112816] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/05/2023] [Accepted: 11/17/2023] [Indexed: 11/26/2023] Open
Abstract
The advent of next-generation sequencing has greatly accelerated the field of human microbiome studies. Currently, investigators are seeking, struggling and competing to find new ways to diagnose, treat and prevent human diseases through the human microbiome. Machine learning is a promising approach to help such an effort, especially due to the high complexity of microbiome data. However, many of the current machine learning algorithms are in a "black box", i.e., they are difficult to understand and interpret. In addition, clinicians, public health practitioners and biologists are not usually skilled at computer programming, and they do not always have high-end computing devices. Thus, in this study, we introduce a unified web cloud analytic platform, named MiTree, for user-friendly and interpretable microbiome data mining. MiTree employs tree-based learning methods, including decision tree, random forest and gradient boosting, that are well understood and suited to human microbiome studies. We also stress that MiTree can address both classification and regression problems through covariate-adjusted or unadjusted analysis. MiTree should serve as an easy-to-use and interpretable data mining tool for microbiome-based disease prediction modeling, and should provide new insights into microbiome-based diagnostics, treatment and prevention. MiTree is an open-source software that is available on our web server.
Collapse
|
8
|
Avila Santos AP, Kabiru Nata'ala M, Kasmanas JC, Bartholomäus A, Keller-Costa T, Jurburg SD, Tal T, Camarinha-Silva A, Saraiva JP, Ponce de Leon Ferreira de Carvalho AC, Stadler PF, Sipoli Sanches D, Rocha U. The AnimalAssociatedMetagenomeDB reveals a bias towards livestock and developed countries and blind spots in functional-potential studies of animal-associated microbiomes. Anim Microbiome 2023; 5:48. [PMID: 37798675 PMCID: PMC10552293 DOI: 10.1186/s42523-023-00267-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 09/18/2023] [Indexed: 10/07/2023] Open
Abstract
BACKGROUND Metagenomic data can shed light on animal-microbiome relationships and the functional potential of these communities. Over the past years, the generation of metagenomics data has increased exponentially, and so has the availability and reusability of data present in public repositories. However, identifying which datasets and associated metadata are available is not straightforward. We created the Animal-Associated Metagenome Metadata Database (AnimalAssociatedMetagenomeDB - AAMDB) to facilitate the identification and reuse of publicly available non-human, animal-associated metagenomic data, and metadata. Further, we used the AAMDB to (i) annotate common and scientific names of the species; (ii) determine the fraction of vertebrates and invertebrates; (iii) study their biogeography; and (iv) specify whether the animals were wild, pets, livestock or used for medical research. RESULTS We manually selected metagenomes associated with non-human animals from SRA and MG-RAST. Next, we standardized and curated 51 metadata attributes (e.g., host, compartment, geographic coordinates, and country). The AAMDB version 1.0 contains 10,885 metagenomes associated with 165 different species from 65 different countries. From the collected metagenomes, 51.1% were recovered from animals associated with medical research or grown for human consumption (i.e., mice, rats, cattle, pigs, and poultry). Further, we observed an over-representation of animals collected in temperate regions (89.2%) and a lower representation of samples from the polar zones, with only 11 samples in total. The most common genus among invertebrate animals was Trichocerca (rotifers). CONCLUSION Our work may guide host species selection in novel animal-associated metagenome research, especially in biodiversity and conservation studies. The data available in our database will allow scientists to perform meta-analyses and test new hypotheses (e.g., host-specificity, strain heterogeneity, and biogeography of animal-associated metagenomes), leveraging existing data. The AAMDB WebApp is a user-friendly interface that is publicly available at https://webapp.ufz.de/aamdb/ .
Collapse
Affiliation(s)
- Anderson Paulo Avila Santos
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
- Institute of Mathematics and Computer Sciences, University of Sao Paulo, Sao Carlos, Brazil
| | - Muhammad Kabiru Nata'ala
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Saxony, Germany
| | - Jonas Coelho Kasmanas
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Saxony, Germany
- Institute of Mathematics and Computer Sciences, University of Sao Paulo, Sao Carlos, Brazil
| | - Alexander Bartholomäus
- GFZ German Research Centre for Geosciences, Section 3.7 Geomicrobiology, 14473, Telegrafenberg, Potsdam, Germany
| | - Tina Keller-Costa
- Institute for Bioengineering and Biosciences (iBB) and Institute for Health and Bioeconomy (i4HB), Instituto Superior Tecnico (IST), Universidade de Lisboa, Lisbon, 1049-001, Portugal
| | - Stephanie D Jurburg
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
- German Centre of Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstraße 4, Leipzig, 04103, Germany
| | - Tamara Tal
- Department of Bioanalytical Ecotoxicology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Amélia Camarinha-Silva
- Hohenheim Center for Livestock Microbiome Research (HoLMiR), University of Hohenheim, Stuttgart, Germany
- Institute of Animal Science, University of Hohenheim, Stuttgart, Germany
| | - João Pedro Saraiva
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
| | | | - Peter F Stadler
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Saxony, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße, 04103, Leipzig, Germany
- Institute for Theoretical Chemistry, Universität Wien, Währingerstraße 17, Vienna, A-1090, Austria
- Center for Scalable Data Analytics and Artificial Intelligence Dresden-Leipzig, Leipzig University, Leipzig, Germany
- Faculdad de Ciencias, Universidad Nacional de Colombia, Sede Bogotá, Bogotá, Colombia
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- The Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA
| | | | - Ulisses Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany.
| |
Collapse
|
9
|
Li C, Ma L, Zou D, Zhang R, Bai X, Li L, Wu G, Huang T, Zhao W, Jin E, Bao Y, Song S. RCoV19: A One-stop Hub for SARS-CoV-2 Genome Data Integration, Variant Monitoring, and Risk Pre-warning. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1066-1079. [PMID: 37898309 PMCID: PMC10928372 DOI: 10.1016/j.gpb.2023.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 10/17/2023] [Accepted: 10/19/2023] [Indexed: 10/30/2023]
Abstract
The Resource for Coronavirus 2019 (RCoV19) is an open-access information resource dedicated to providing valuable data on the genomes, mutations, and variants of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In this updated implementation of RCoV19, we have made significant improvements and advancements over the previous version. Firstly, we have implemented a highly refined genome data curation model. This model now features an automated integration pipeline and optimized curation rules, enabling efficient daily updates of data in RCoV19. Secondly, we have developed a global and regional lineage evolution monitoring platform, alongside an outbreak risk pre-warning system. These additions provide a comprehensive understanding of SARS-CoV-2 evolution and transmission patterns, enabling better preparedness and response strategies. Thirdly, we have developed a powerful interactive mutation spectrum comparison module. This module allows users to compare and analyze mutation patterns, assisting in the detection of potential new lineages. Furthermore, we have incorporated a comprehensive knowledgebase on mutation effects. This knowledgebase serves as a valuable resource for retrieving information on the functional implications of specific mutations. In summary, RCoV19 serves as a vital scientific resource, providing access to valuable data, relevant information, and technical support in the global fight against COVID-19. The complete contents of RCoV19 are available to the public at https://ngdc.cncb.ac.cn/ncov/.
Collapse
Affiliation(s)
- Cuiping Li
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Lina Ma
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dong Zou
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Rongqin Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xue Bai
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Lun Li
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Gangao Wu
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tianhao Huang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Zhao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Enhui Jin
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yiming Bao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Shuhui Song
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
10
|
Li J, Wei C, Zhou T, Mo C, Wang G, He F, Wang P, Qin L, Peng F. A display and analysis platform for gut microbiomes of minority people and phenotypic data in China. Sci Rep 2023; 13:14247. [PMID: 37648696 PMCID: PMC10469205 DOI: 10.1038/s41598-023-36754-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 06/09/2023] [Indexed: 09/01/2023] Open
Abstract
The minority people panmicrobial community database (MPPCD website: http://mppmcdb.cloudna.cn/ ) is the first microbe-disease association database of Chinese ethnic minorities. To research the relationships between intestinal microbes and diseases/health in the ethnic minorities, we collected the microbes of the Han people for comparison. Based on the data, such as age, among the different ethnic groups of the different regions of Sichuan Province, MPPCD not only provided the gut microbial composition but also presented the relative abundance value at the phylum, class, order, family and genus levels in different groups. In addition, differential analysis was performed in different microbes in the two different groups, which contributed to exploring the difference in intestinal microbe structures between the two groups. Meanwhile, a series of related factors, including age, sex, body mass index, ethnicity, physical condition, and living altitude, were included in the MPPCD, with special focus on living altitude. To date, this is the first intestinal microbe database to introduce altitude features. In conclusion, we hope that MPPCD will serve as a fundamental research support for the relationship between human gut microbes and host health and disease, especially in ethnic minorities.
Collapse
Affiliation(s)
- Jun Li
- Department of Gastroenterology, The First Affiliated Hospital of Chengdu Medical College, 278# Bao Guang Road, Xindu District, Chengdu, 610000, Sichuan, People's Republic of China.
| | - Chunxue Wei
- Department of Gastroenterology, The First Affiliated Hospital of Chengdu Medical College, 278# Bao Guang Road, Xindu District, Chengdu, 610000, Sichuan, People's Republic of China
| | - Ting Zhou
- Department of Gastroenterology, The Sixth People's Hospital of Chengdu, Chengdu, Sichuan, China
| | - Chunfen Mo
- Department of Immunology, School of Basic Medical Sciences, Chengdu Medical College, Chengdu, Sichuan, China
| | - Guanjun Wang
- Department of Gastroenterology, The First Affiliated Hospital of Chengdu Medical College, 278# Bao Guang Road, Xindu District, Chengdu, 610000, Sichuan, People's Republic of China
| | - Feng He
- Department of Gastroenterology, The First Affiliated Hospital of Chengdu Medical College, 278# Bao Guang Road, Xindu District, Chengdu, 610000, Sichuan, People's Republic of China
| | - Pengyu Wang
- College of Pharmacy, Chengdu Medical College, Chengdu, Sichuan, China
| | - Ling Qin
- Department of Gastroenterology, The First Affiliated Hospital of Chengdu Medical College, 278# Bao Guang Road, Xindu District, Chengdu, 610000, Sichuan, People's Republic of China
| | - Fujun Peng
- Institute of Basic Medicine, Weifang Medical University, 7166# Baotong West Road, Weifang, 261053, Shandong, People's Republic of China.
| |
Collapse
|
11
|
Liu Y, Chen L, Ma T, Li X, Zheng M, Zhou X, Chen L, Qian X, Xi J, Lu H, Cao H, Ma X, Bian B, Zhang P, Wu J, Gan R, Jia B, Sun L, Ju Z, Gao Y, Wen T, Chen T. EasyAmplicon: An easy-to-use, open-source, reproducible, and community-based pipeline for amplicon data analysis in microbiome research. IMETA 2023; 2:e83. [PMID: 38868346 PMCID: PMC10989771 DOI: 10.1002/imt2.83] [Citation(s) in RCA: 43] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 01/01/2023] [Accepted: 01/10/2023] [Indexed: 06/14/2024]
Abstract
It is difficult for beginners to learn and use amplicon analysis software because there are so many software tools to choose from, and all of them need multiple steps of operation. Herein, we provide a cross-platform, open-source, and community-supported analysis pipeline EasyAmplicon. EasyAmplicon has most of the modules needed for an amplicon analysis, including data quality control, merging of paired-end reads, dereplication, clustering or denoising, chimera detection, generation of feature tables, taxonomic diversity analysis, compositional analysis, biomarker discovery, and publication-quality visualization. EasyAmplicon includes more than 30 cross-platform modules and R packages commonly used in the field. All steps of the pipeline are integrated into RStudio, which reduces learning costs, keeps the flexibility of the analysis process, and facilitates personalized analysis. The pipeline is maintained and updated by the authors and editors of WeChat official account "Meta-genome." Our team will regularly release the latest tutorials both in Chinese and English, read the feedback from users, and provide help to them in the WeChat account and GitHub. The pipeline can be deployed on various platforms, and the installation time is less than half an hour. On an ordinary laptop, the whole analysis process for dozens of samples can be completed within 3 h. The pipeline is available at GitHub (https://github.com/YongxinLiu/EasyAmplicon) and Gitee (https://gitee.com/YongxinLiu/EasyAmplicon).
Collapse
Affiliation(s)
- Yong‐Xin Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhenGuangdongChina
| | - Lei Chen
- Department of Vascular Surgery, Fu Xing HospitalCapital Medical UniversityBeijingChina
| | - Tengfei Ma
- State Key Laboratory of Grassland Agro‐ecosystems, Centre for Grassland Microbiome, College of Pastoral Agricultural Science and TechnologyLanzhou UniversityLanzhouGansuChina
| | - Xiaofang Li
- Centre for Agricultural Resources Research, Institute of Genetics and Developmental BiologyChinese Academy of SciencesShijiazhuangChina
| | - Maosheng Zheng
- College of Environmental Science and EngineeringNorth China Electric Power UniversityBeijingChina
| | - Xin Zhou
- Institute of MicrobiologyChinese Academy of SciencesBeijingChina
| | - Liang Chen
- Institute of MicrobiologyChinese Academy of SciencesBeijingChina
| | - Xubo Qian
- Department of Pediatrics, Affiliated Jinhua HospitalZhejiang University School of MedicineJinhuaZhejiangChina
| | - Jiao Xi
- College of Natural Resources and EnvironmentNorthwest A&F UniversityYanglingShaanxiChina
| | - Hongye Lu
- Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Clinical Research Center for Oral Diseases of Zhejiang Province, School of Stomatology, Zhejiang University School of MedicineStomatology HospitalHangzhouZhejiangChina
| | - Huiluo Cao
- Department of MicrobiologyUniversity of Hong KongHong KongChina
| | - Xiaoya Ma
- Center of Excellence in Fungal ResearchMae Fah Luang UniversityChiang RaiThailand
| | - Bian Bian
- Graduate School of Frontier SciencesUniversity of TokyoChibaJapan
| | - Pengfan Zhang
- Department of Plant‐Microbe InteractionsMax Planck Institute for Plant Breeding ResearchCologneGermany
| | - Jiqiu Wu
- APC Microbiome InstituteUniversity College CorkCorkIreland
- Department of Genetics, University Medical Center GroningenUniversity of GroningenGroningenThe Netherlands
| | - Ren‐You Gan
- Singapore Institute of Food and Biotechnology Innovation (SIFBI), Agency for ScienceTechnology and Research (A*STAR)SingaporeSingapore
| | - Baolei Jia
- Department of Life ScienceChung‐Ang UniversitySeoulRepublic of Korea
| | - Linyang Sun
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhenGuangdongChina
| | - Zhicheng Ju
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhenGuangdongChina
| | - Yunyun Gao
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhenGuangdongChina
| | - Tao Wen
- The Key Laboratory of Plant Immunity Jiangsu Provincial Key Lab for Organic Solid Waste Utilization Jiangsu Collaborative Innovation Center for Solid Organic Waste Resource Utilization, National Engineering Research Center for Organic‐Based FertilizersNanjing Agricultural UniversityNanjingChina
| | - Tong Chen
- National Resource Center for Chinese Materia MedicaChina Academy of Chinese Medical SciencesBeijingChina
| |
Collapse
|
12
|
Lyu F, Han F, Ge C, Mao W, Chen L, Hu H, Chen G, Lang Q, Fang C. OmicStudio: A composable bioinformatics cloud platform with real-time feedback that can generate high-quality graphs for publication. IMETA 2023; 2:e85. [PMID: 38868333 PMCID: PMC10989813 DOI: 10.1002/imt2.85] [Citation(s) in RCA: 46] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 01/06/2023] [Accepted: 01/12/2023] [Indexed: 06/14/2024]
Abstract
OmicStudio focuses on speed, quality together with flexibility. Generally, OmicStudio can not only meet the users' demand of ordinary bioinformatics data analysis, statistics, and visualization, but also provides them freedom of data mining beyond developer's framework. Additionally, unlimited to developer's aesthetics, users can get more elegant graphs through customizing. Available online https://www.omicstudio.cn.
Collapse
Affiliation(s)
- Fengye Lyu
- Operation DepartmentLC‐Bio Technology Co., Ltd.HangzhouChina
| | - Feiran Han
- Operation DepartmentLC‐Bio Technology Co., Ltd.HangzhouChina
| | - Changli Ge
- Operation DepartmentLC‐Bio Technology Co., Ltd.HangzhouChina
| | - Weikang Mao
- Operation DepartmentLC‐Bio Technology Co., Ltd.HangzhouChina
| | - Li Chen
- Operation DepartmentLC‐Bio Technology Co., Ltd.HangzhouChina
| | - Huipeng Hu
- Operation DepartmentLC‐Bio Technology Co., Ltd.HangzhouChina
| | - Guoguo Chen
- Operation DepartmentLC‐Bio Technology Co., Ltd.HangzhouChina
| | - Qiulei Lang
- Operation DepartmentLC‐Bio Technology Co., Ltd.HangzhouChina
| | - Chao Fang
- Operation DepartmentLC‐Bio Technology Co., Ltd.HangzhouChina
| |
Collapse
|
13
|
Li M, Liu J, Zhu J, Wang H, Sun C, Gao NL, Zhao XM, Chen WH. Performance of Gut Microbiome as an Independent Diagnostic Tool for 20 Diseases: Cross-Cohort Validation of Machine-Learning Classifiers. Gut Microbes 2023; 15:2205386. [PMID: 37140125 PMCID: PMC10161951 DOI: 10.1080/19490976.2023.2205386] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/05/2023] Open
Abstract
Cross-cohort validation is essential for gut-microbiome-based disease stratification but was only performed for limited diseases. Here, we systematically evaluated the cross-cohort performance of gut microbiome-based machine-learning classifiers for 20 diseases. Using single-cohort classifiers, we obtained high predictive accuracies in intra-cohort validation (~0.77 AUC), but low accuracies in cross-cohort validation, except the intestinal diseases (~0.73 AUC). We then built combined-cohort classifiers trained on samples combined from multiple cohorts to improve the validation of non-intestinal diseases, and estimated the required sample size to achieve validation accuracies of >0.7. In addition, we observed higher validation performance for classifiers using metagenomic data than 16S amplicon data in intestinal diseases. We further quantified the cross-cohort marker consistency using a Marker Similarity Index and observed similar trends. Together, our results supported the gut microbiome as an independent diagnostic tool for intestinal diseases and revealed strategies to improve cross-cohort performance based on identified determinants of consistent cross-cohort gut microbiome alterations.
Collapse
Affiliation(s)
- Min Li
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Jinxin Liu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Jiaying Zhu
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Huarui Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Chuqing Sun
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Na L Gao
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Xing-Ming Zhao
- Department of Neurology, Zhongshan Hospital, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- International Human Phenome Institutes (Shanghai), Shanghai, China
| | - Wei-Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
- College of Life Science, Henan Normal University, Xinxiang, China
- Institution of Medical Artificial Intelligence, Binzhou Medical University, Yantai, China
| |
Collapse
|
14
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| |
Collapse
|
15
|
Nata’ala MK, Avila Santos AP, Coelho Kasmanas J, Bartholomäus A, Saraiva JP, Godinho Silva S, Keller-Costa T, Costa R, Gomes NCM, Ponce de Leon Ferreira de Carvalho AC, Stadler PF, Sipoli Sanches D, Nunes da Rocha U. MarineMetagenomeDB: a public repository for curated and standardized metadata for marine metagenomes. ENVIRONMENTAL MICROBIOME 2022; 17:57. [PMID: 36401317 PMCID: PMC9675116 DOI: 10.1186/s40793-022-00449-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 09/15/2022] [Indexed: 05/17/2023]
Abstract
BACKGROUND Metagenomics is an expanding field within microbial ecology, microbiology, and related disciplines. The number of metagenomes deposited in major public repositories such as Sequence Read Archive (SRA) and Metagenomic Rapid Annotations using Subsystems Technology (MG-RAST) is rising exponentially. However, data mining and interpretation can be challenging due to mis-annotated and misleading metadata entries. In this study, we describe the Marine Metagenome Metadata Database (MarineMetagenomeDB) to help researchers identify marine metagenomes of interest for re-analysis and meta-analysis. To this end, we have manually curated the associated metadata of several thousands of microbial metagenomes currently deposited at SRA and MG-RAST. RESULTS In total, 125 terms were curated according to 17 different classes (e.g., biome, material, oceanic zone, geographic feature and oceanographic phenomena). Other standardized features include sample attributes (e.g., salinity, depth), sample location (e.g., latitude, longitude), and sequencing features (e.g., sequencing platform, sequence count). MarineMetagenomeDB version 1.0 contains 11,449 marine metagenomes from SRA and MG-RAST distributed across all oceans and several seas. Most samples were sequenced using Illumina sequencing technology (84.33%). More than 55% of the samples were collected from the Pacific and the Atlantic Oceans. About 40% of the samples had their biomes assigned as 'ocean'. The 'Quick Search' and 'Advanced Search' tabs allow users to use different filters to select samples of interest dynamically in the web app. The interactive map allows the visualization of samples based on their location on the world map. The web app is also equipped with a novel download tool (on both Windows and Linux operating systems), that allows easy download of raw sequence data of selected samples from their respective repositories. As a use case, we demonstrated how to use the MarineMetagenomeDB web app to select estuarine metagenomes for potential large-scale microbial biogeography studies. CONCLUSION The MarineMetagenomeDB is a powerful resource for non-bioinformaticians to find marine metagenome samples with curated metadata and stimulate meta-studies involving marine microbiomes. Our user-friendly web app is publicly available at https://webapp.ufz.de/marmdb/ .
Collapse
Affiliation(s)
- Muhammad Kabiru Nata’ala
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ GmbH, 04318 Leipzig, Saxony Germany
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, 04107 Leipzig, Saxony Germany
| | - Anderson P. Avila Santos
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ GmbH, 04318 Leipzig, Saxony Germany
- Institute of Mathematics and Computer Sciences, University of Sao Paulo, São Carlos, Brazil
| | - Jonas Coelho Kasmanas
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ GmbH, 04318 Leipzig, Saxony Germany
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, 04107 Leipzig, Saxony Germany
- Institute of Mathematics and Computer Sciences, University of Sao Paulo, São Carlos, Brazil
| | - Alexander Bartholomäus
- Section 3.7 Geomicrobiology, GFZ German Research Centre for Geosciences, 14473 Telegrafenberg, Potsdam Germany
| | - João Pedro Saraiva
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ GmbH, 04318 Leipzig, Saxony Germany
| | - Sandra Godinho Silva
- Department of Bioengineering and Institute for Bioengineering and Biosciences, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal
| | - Tina Keller-Costa
- Department of Bioengineering and Institute for Bioengineering and Biosciences, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal
| | - Rodrigo Costa
- Department of Bioengineering and Institute for Bioengineering and Biosciences, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal
| | - Newton C. M. Gomes
- Department of Biology and Centre for Environmental and Marine Studies (CESAM), University of Aveiro, 3810-193 Aveiro, Portugal
| | | | - Peter F. Stadler
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, 04107 Leipzig, Saxony Germany
| | | | - Ulisses Nunes da Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ GmbH, 04318 Leipzig, Saxony Germany
| |
Collapse
|
16
|
Review of the Current State of Freely Accessible Web Tools for the Analysis of 16S rRNA Sequencing of the Gut Microbiome. Int J Mol Sci 2022; 23:ijms231810865. [PMID: 36142775 PMCID: PMC9501225 DOI: 10.3390/ijms231810865] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 09/13/2022] [Accepted: 09/15/2022] [Indexed: 11/16/2022] Open
Abstract
Owing to the emergence and improvement of high-throughput technology and the associated reduction in costs, next-generation sequencing (NGS) technology has made large-scale sampling and sequencing possible. With the large volume of data produced, the processing and downstream analysis of data are important for ensuring meaningful results and interpretation. Problems in data analysis may be encountered if researchers have little experience in using programming languages, especially if they are clinicians and beginners in the field. A strategy for solving this problem involves ensuring easy access to commercial software and tools. Here, we observed the current status of free web-based tools for microbiome analysis that can help users analyze and handle microbiome data effortlessly. We limited our search to freely available web-based tools and identified MicrobiomeAnalyst, Mian, gcMeta, VAMPS, and Microbiome Toolbox. We also highlighted the various analyses that each web tool offers, how users can analyze their data using each web tool, and noted some of their limitations. From the abovementioned list, gcMeta, VAMPS, and Microbiome Toolbox had several issues that made the analysis more difficult. Over time, as more data are generated and accessed, more users will analyze microbiome data. Thus, the availability of free and easily accessible web tools can enable the easy use and analysis of microbiome data, especially for those users with less experience in using command-line interfaces.
Collapse
|
17
|
Agostinetto G, Bozzi D, Porro D, Casiraghi M, Labra M, Bruno A. SKIOME Project: a curated collection of skin microbiome datasets enriched with study-related metadata. Database (Oxford) 2022; 2022:6586378. [PMID: 35576001 PMCID: PMC9216470 DOI: 10.1093/database/baac033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 02/25/2022] [Accepted: 05/09/2022] [Indexed: 04/07/2023]
Abstract
Large amounts of data from microbiome-related studies have been (and are currently being) deposited on international public databases. These datasets represent a valuable resource for the microbiome research community and could serve future researchers interested in integrating multiple datasets into powerful meta-analyses. However, this huge amount of data lacks harmonization and it is far from being completely exploited in its full potential to build a foundation that places microbiome research at the nexus of many subdisciplines within and beyond biology. Thus, it urges the need for data accessibility and reusability, according to findable, accessible, interoperable and reusable (FAIR) principles, as supported by National Microbiome Data Collaborative and FAIR Microbiome. To tackle the challenge of accelerating discovery and advances in skin microbiome research, we collected, integrated and organized existing microbiome data resources from human skin 16S rRNA amplicon-sequencing experiments. We generated a comprehensive collection of datasets, enriched in metadata, and organized this information into data frames ready to be integrated into microbiome research projects and advanced post-processing analyses, such as data science applications (e.g. machine learning). Furthermore, we have created a data retrieval and curation framework built on three different stages to maximize the retrieval of datasets and metadata associated with them. Lastly, we highlighted some caveats regarding metadata retrieval and suggested ways to improve future metadata submissions. Overall, our work resulted in a curated skin microbiome datasets collection accompanied by a state-of-the-art analysis of the last 10 years of the skin microbiome field. Database URL: https://github.com/giuliaago/SKIOMEMetadataRetrieval.
Collapse
Affiliation(s)
- Giulia Agostinetto
- *Corresponding author: Giulia Agostinetto. E-mail: and Antonia Bruno. Tel: +0039 0264483413; E-mail:
| | | | - Danilo Porro
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza, 2, Milan 20126, Italy
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), via Fratelli Cervi, 93, Segrate (MI) 20054, Italy
| | - Maurizio Casiraghi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza, 2, Milan 20126, Italy
| | - Massimo Labra
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza, 2, Milan 20126, Italy
| | - Antonia Bruno
- *Corresponding author: Giulia Agostinetto. E-mail: and Antonia Bruno. Tel: +0039 0264483413; E-mail:
| |
Collapse
|
18
|
Chen Z, Azman AS, Chen X, Zou J, Tian Y, Sun R, Xu X, Wu Y, Lu W, Ge S, Zhao Z, Yang J, Leung DT, Domman DB, Yu H. Global landscape of SARS-CoV-2 genomic surveillance and data sharing. Nat Genet 2022; 54:499-507. [PMID: 35347305 PMCID: PMC9005350 DOI: 10.1038/s41588-022-01033-y] [Citation(s) in RCA: 116] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 02/11/2022] [Indexed: 12/02/2022]
Abstract
Genomic surveillance has shaped our understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants. We performed a global landscape analysis on SARS-CoV-2 genomic surveillance and genomic data using a collection of country-specific data. Here, we characterize increasing circulation of the Alpha variant in early 2021, subsequently replaced by the Delta variant around May 2021. SARS-CoV-2 genomic surveillance and sequencing availability varied markedly across countries, with 45 countries performing a high level of routine genomic surveillance and 96 countries with a high availability of SARS-CoV-2 sequencing. We also observed a marked heterogeneity of sequencing percentage, sequencing technologies, turnaround time and completeness of released metadata across regions and income groups. A total of 37% of countries with explicit reporting on variants shared less than half of their sequences of variants of concern (VOCs) in public repositories. Our findings indicate an urgent need to increase timely and full sharing of sequences, the standardization of metadata files and support for countries with limited sequencing and bioinformatics capacity.
Collapse
Affiliation(s)
- Zhiyuan Chen
- Department of Infectious Diseases, Huashan Hospital, School of Public Health, Fudan University, Shanghai, China
- Key Laboratory of Public Health Safety, Fudan University, Ministry of Education, Shanghai, China
| | - Andrew S Azman
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Institute of Global Health, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Xinhua Chen
- Department of Infectious Diseases, Huashan Hospital, School of Public Health, Fudan University, Shanghai, China
- Key Laboratory of Public Health Safety, Fudan University, Ministry of Education, Shanghai, China
| | - Junyi Zou
- Department of Infectious Diseases, Huashan Hospital, School of Public Health, Fudan University, Shanghai, China
- Key Laboratory of Public Health Safety, Fudan University, Ministry of Education, Shanghai, China
| | - Yuyang Tian
- Department of Infectious Diseases, Huashan Hospital, School of Public Health, Fudan University, Shanghai, China
- Key Laboratory of Public Health Safety, Fudan University, Ministry of Education, Shanghai, China
| | - Ruijia Sun
- Department of Infectious Diseases, Huashan Hospital, School of Public Health, Fudan University, Shanghai, China
- Key Laboratory of Public Health Safety, Fudan University, Ministry of Education, Shanghai, China
| | - Xiangyanyu Xu
- Department of Infectious Diseases, Huashan Hospital, School of Public Health, Fudan University, Shanghai, China
- Key Laboratory of Public Health Safety, Fudan University, Ministry of Education, Shanghai, China
| | - Yani Wu
- School of Public Health, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Wanying Lu
- Department of Infectious Diseases, Huashan Hospital, School of Public Health, Fudan University, Shanghai, China
- Key Laboratory of Public Health Safety, Fudan University, Ministry of Education, Shanghai, China
| | - Shijia Ge
- Department of Infectious Diseases, Huashan Hospital, School of Public Health, Fudan University, Shanghai, China
| | - Zeyao Zhao
- Department of Infectious Diseases, Huashan Hospital, School of Public Health, Fudan University, Shanghai, China
- Key Laboratory of Public Health Safety, Fudan University, Ministry of Education, Shanghai, China
| | - Juan Yang
- Department of Infectious Diseases, Huashan Hospital, School of Public Health, Fudan University, Shanghai, China
- Key Laboratory of Public Health Safety, Fudan University, Ministry of Education, Shanghai, China
| | - Daniel T Leung
- Division of Infectious Diseases, University of Utah School of Medicine, Salt Lake City, UT, USA
- Division of Microbiology & Immunology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Daryl B Domman
- Center for Global Health, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | - Hongjie Yu
- Department of Infectious Diseases, Huashan Hospital, School of Public Health, Fudan University, Shanghai, China.
- Key Laboratory of Public Health Safety, Fudan University, Ministry of Education, Shanghai, China.
- Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, China.
- National Medical Center for Infectious Diseases, Huashan Hospital, Fudan University, Shanghai, China.
| |
Collapse
|
19
|
Chen Y, Li J, Zhang Y, Zhang M, Sun Z, Jing G, Huang S, Su X. Parallel-Meta Suite: Interactive and rapid microbiome data analysis on multiple platforms. IMETA 2022; 1:e1. [PMID: 38867729 PMCID: PMC10989749 DOI: 10.1002/imt2.1] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/13/2021] [Accepted: 12/17/2021] [Indexed: 06/14/2024]
Abstract
Massive microbiome sequencing data has been generated, which elucidates associations between microbes and their environmental phenotypes such as host health or ecosystem status. Outstanding bioinformatic tools are the basis to decipher the biological information hidden under microbiome data. However, most approaches placed difficulties on the accessibility to nonprofessional users. On the other side, the computing throughput has become a significant bottleneck of many analytical pipelines in processing large-scale datasets. In this study, we introduce Parallel-Meta Suite (PMS), an interactive software package for fast and comprehensive microbiome data analysis, visualization, and interpretation. It covers a wide array of functions for data preprocessing, statistics, visualization by state-of-the-art algorithms in a user-friendly graphical interface, which is accessible to diverse users. To meet the rapidly increasing computational demands, the entire procedure of PMS has been optimized by a parallel computing scheme, enabling the rapid processing of thousands of samples. PMS is compatible with multiple platforms, and an installer has been integrated for full-automatic installation.
Collapse
Affiliation(s)
- Yuzhu Chen
- College of Computer Science and TechnologyQingdao UniversityQingdaoShandongChina
| | - Jian Li
- College of Computer Science and TechnologyQingdao UniversityQingdaoShandongChina
| | - Yufeng Zhang
- College of Computer Science and TechnologyQingdao UniversityQingdaoShandongChina
| | - Mingqian Zhang
- College of Computer Science and TechnologyQingdao UniversityQingdaoShandongChina
| | - Zheng Sun
- Single‐Cell Center, Qingdao Institute of BioEnergy and Bioprocess TechnologyChinese Academy of SciencesQingdaoShandongChina
| | - Gongchao Jing
- Single‐Cell Center, Qingdao Institute of BioEnergy and Bioprocess TechnologyChinese Academy of SciencesQingdaoShandongChina
| | - Shi Huang
- Faculty of DentistryThe University of Hong KongHong KongHong Kong SARChina
| | - Xiaoquan Su
- College of Computer Science and TechnologyQingdao UniversityQingdaoShandongChina
- Single‐Cell Center, Qingdao Institute of BioEnergy and Bioprocess TechnologyChinese Academy of SciencesQingdaoShandongChina
| |
Collapse
|
20
|
Abdill RJ, Adamowicz EM, Blekhman R. Public human microbiome data are dominated by highly developed countries. PLoS Biol 2022; 20:e3001536. [PMID: 35167588 PMCID: PMC8846514 DOI: 10.1371/journal.pbio.3001536] [Citation(s) in RCA: 74] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 01/11/2022] [Indexed: 02/07/2023] Open
Abstract
The importance of sampling from globally representative populations has been well established in human genomics. In human microbiome research, however, we lack a full understanding of the global distribution of sampling in research studies. This information is crucial to better understand global patterns of microbiome-associated diseases and to extend the health benefits of this research to all populations. Here, we analyze the country of origin of all 444,829 human microbiome samples that are available from the world’s 3 largest genomic data repositories, including the Sequence Read Archive (SRA). The samples are from 2,592 studies of 19 body sites, including 220,017 samples of the gut microbiome. We show that more than 71% of samples with a known origin come from Europe, the United States, and Canada, including 46.8% from the US alone, despite the country representing only 4.3% of the global population. We also find that central and southern Asia is the most underrepresented region: Countries such as India, Pakistan, and Bangladesh account for more than a quarter of the world population but make up only 1.8% of human microbiome samples. These results demonstrate a critical need to ensure more global representation of participants in microbiome studies. The importance of sampling from globally representative populations has been well established in human genomics, but what about the microbiome? This study shows that metadata from almost half a million samples reveals worldwide human microbiome research is skewed heavily in favor of Europe and North America and excludes large but less developed nations in Asia and Africa.
Collapse
Affiliation(s)
- Richard J. Abdill
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Elizabeth M. Adamowicz
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Ran Blekhman
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
- Department of Ecology, Evolution and Behavior, University of Minnesota, St. Paul, Minnesota, United States of America
- * E-mail:
| |
Collapse
|
21
|
Ma L, Li H, Lan J, Hao X, Liu H, Wang X, Huang Y. Comprehensive analyses of bioinformatics applications in the fight against COVID-19 pandemic. Comput Biol Chem 2021; 95:107599. [PMID: 34773807 PMCID: PMC8560182 DOI: 10.1016/j.compbiolchem.2021.107599] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 09/24/2021] [Accepted: 10/29/2021] [Indexed: 02/07/2023]
Abstract
Novel coronavirus disease 2019 (COVID-19) is a global pandemic caused by severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2), which can be transmitted from person to person. As of September 21, 2021, over 228 million cases were diagnosed as COVID-19 infection in more than 200 countries and regions worldwide. The death toll is more than 4.69 million and the mortality rate has reached about 2.05% as it has gradually become a global plague, and the numbers are growing. Therefore, it is important to gain a deeper understanding of the genome and protein characteristics, clinical diagnostics, pathogenic mechanisms, and the development of antiviral drugs and vaccines against the novel coronavirus to deal with the COVID-19 pandemic. The traditional biology technologies are limited for COVID-19-related studies to understand the pandemic happening. Bioinformatics is the application of computational methods and analytical tools in the field of biological research which has obvious advantages in predicting the structure, product, function, and evolution of unknown genes and proteins, and in screening drugs and vaccines from a large amount of sequence information. Here, we comprehensively summarized several of the most important methods and applications relating to COVID-19 based on currently available reports of bioinformatics technologies, focusing on future research for overcoming the virus pandemic. Based on the next-generation sequencing (NGS) and third-generation sequencing (TGS) technology, not only virus can be detected, but also high quality SARS-CoV-2 genome could be obtained quickly. The emergence of data of genome sequences, variants, haplotypes of SARS-CoV-2 help us to understand genome and protein structure, variant calling, mutation, and other biological characteristics. After sequencing alignment and phylogenetic analysis, the bat may be the natural host of the novel coronavirus. Single-cell RNA sequencing provide abundant resource for discovering the mechanism of immune response induced by COVID-19. As an entry receptor, angiotensin-converting enzyme 2 (ACE2) can be used as a potential drug target to treat COVID-19. Molecular dynamics simulation, molecular docking and artificial intelligence (AI) technology of bioinformatics methods based on drug databases for SARS-CoV-2 can accelerate the development of drugs. Meanwhile, computational approaches are helpful to identify suitable vaccines to prevent COVID-19 infection through reverse vaccinology, Immunoinformatics and structural vaccinology.
Collapse
Affiliation(s)
- Lifei Ma
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100005, China,College of Lab Medicine, Hebei North University, Zhangjiakou, Hebei 075000, China,Corresponding author at: State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Huiyang Li
- Tianjin Key Laboratory of Biomaterial Research, Institute of Biomedical Engineering, Chinese Academy of Medical Science and Peking Union Medical College, Tianjin 300192, China
| | - Jinping Lan
- College of Lab Medicine, Hebei North University, Zhangjiakou, Hebei 075000, China
| | - Xiuqing Hao
- The First Affiliated Hospital of Hebei North University, Zhangjiakou, Hebei 075000, China
| | - Huiying Liu
- The First Affiliated Hospital of Hebei North University, Zhangjiakou, Hebei 075000, China
| | - Xiaoman Wang
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100005, China,Corresponding authors
| | - Yong Huang
- College of Lab Medicine, Hebei North University, Zhangjiakou, Hebei 075000, China,Corresponding authors
| |
Collapse
|
22
|
Dai D, Zhu J, Sun C, Li M, Liu J, Wu S, Ning K, He LJ, Zhao XM, Chen WH. GMrepo v2: a curated human gut microbiome database with special focus on disease markers and cross-dataset comparison. Nucleic Acids Res 2021; 50:D777-D784. [PMID: 34788838 PMCID: PMC8728112 DOI: 10.1093/nar/gkab1019] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 10/05/2021] [Accepted: 10/13/2021] [Indexed: 02/07/2023] Open
Abstract
GMrepo (data repository for Gut Microbiota) is a database of curated and consistently annotated human gut metagenomes. Its main purposes are to increase the reusability and accessibility of human gut metagenomic data, and enable cross-project and phenotype comparisons. To achieve these goals, we performed manual curation on the meta-data and organized the datasets in a phenotype-centric manner. GMrepo v2 contains 353 projects and 71,642 runs/samples, which are significantly increased from the previous version. Among these runs/samples, 45,111 and 26,531 were obtained by 16S rRNA amplicon and whole-genome metagenomics sequencing, respectively. We also increased the number of phenotypes from 92 to 133. In addition, we introduced disease-marker identification and cross-project/phenotype comparison. We first identified disease markers between two phenotypes (e.g. health versus diseases) on a per-project basis for selected projects. We then compared the identified markers for each phenotype pair across datasets to facilitate the identification of consistent microbial markers across datasets. Finally, we provided a marker-centric view to allow users to check if a marker has different trends in different diseases. So far, GMrepo includes 592 marker taxa (350 species and 242 genera) for 47 phenotype pairs, identified from 83 selected projects. GMrepo v2 is freely available at: https://gmrepo.humangut.info.
Collapse
Affiliation(s)
- Die Dai
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-Imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Jiaying Zhu
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-Imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Chuqing Sun
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-Imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Min Li
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-Imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Jinxin Liu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Sicheng Wu
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-Imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Kang Ning
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-Imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Li-Jie He
- Department of Oncology, The People's Hospital of Liaoning Province, People's Hospital of China Medical University 110016Shenyang, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, China.,Research Institute of Intelligent Complex System, Fudan University, Shanghai 200433, China
| | - Wei-Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-Imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China.,Institution of Medical Artificial Intelligence, Binzhou Medical University, Yantai 264003, China
| |
Collapse
|
23
|
Shao L, Liao J, Qian J, Chen W, Fan X. MetaGeneBank: a standardized database to study deep sequenced metagenomic data from human fecal specimen. BMC Microbiol 2021; 21:263. [PMID: 34592929 PMCID: PMC8485520 DOI: 10.1186/s12866-021-02321-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 08/23/2021] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Microbiome big data from population-scale cohorts holds the key to unleash the power of microbiomes to overcome critical challenges in disease control, treatment and precision medicine. However, variations introduced during data generation and processing limit the comparisons among independent studies in respect of interpretability. Although multiple databases have been constructed as platforms for data reuse, they are of limited value since only raw sequencing files are considered. DESCRIPTION Here, we present MetaGeneBank, a standardized database that provides details on sample collection and sequencing, and abundances of genes, microbiota and molecular functions for 4470 raw sequencing files (over 12 TB) collected from 16 studies covering over 10 types of diseases and 14 countries using a unified data-processing pipeline. The incorporation of tools that enable browsing and searching with descriptive attributes, gene sequences, microbiota and functions makes the database user-friendly. We found that the source of specimen contributes more than sequencing centers or platforms to the variations of microbiota. Special attention should be paid when re-analyzing sequencing files from different countries. CONCLUSIONS Collectively, MetaGeneBank provides a gateway to utilize the untapped potential of gut metagenomic data in helping fighting against human diseases. With the continuous updating of the database in terms of data volume, data types and sample types, MetaGeneBank would undoubtedly be the benchmarking database in the future in respect of data reuse, and would be valuable in translational science.
Collapse
Affiliation(s)
- Li Shao
- Hangzhou Normal University, Institute of Translational Medicine, The Affiliated Hospital of Hangzhou Normal University, Hangzhou, 311121, Zhejiang, China.,iMedicine Lab, Alibaba-Zhejiang University Joint Research Center for Future Digital Health , Hangzhou, 310018, Zhejiang, China
| | - Jie Liao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University , Hangzhou, 310003, China
| | - Jingyang Qian
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University , Hangzhou, 310003, China
| | - Wenbin Chen
- The First Affiliated Hospital, School of Medicine, Zhejiang University , Hangzhou, 310003, Zhejiang, China
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University , Hangzhou, 310003, China. .,iMedicine Lab, Alibaba-Zhejiang University Joint Research Center for Future Digital Health , Hangzhou, 310018, Zhejiang, China. .,Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou , Hangzhou, 310058, China.
| |
Collapse
|
24
|
Yu H, Chen Z, Azman A, Chen X, Zou J, Tian Y, Sun R, Xu X, Wu Y, Lu W, Ge S, Zhao Z, Yang J, Leung D, Domman D. Global landscape of SARS-CoV-2 genomic surveillance, public availability extent of genomic data, and epidemic shaped by variants. RESEARCH SQUARE 2021:rs.3.rs-927070. [PMID: 34611660 PMCID: PMC8491853 DOI: 10.21203/rs.3.rs-927070/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Genomic surveillance has shaped our understanding of SARS-CoV-2 variants, which have proliferated globally in 2021.We collected country-specific data on SARS-CoV-2 genomic surveillance, sequencing capabilities, public genomic data from multiple public repositories, and aggregated publicly available variant data. Then, different proxies were used to estimate the sequencing coverage and public availability extent of genomic data, in addition to describing the global dissemination of variants. We found that the COVID-19 global epidemic clearly featured increasing circulation of Alpha since the start of 2021, which was rapidly replaced by the Delta variant starting around May 2021. SARS-CoV-2 genomic surveillance and sequencing availability varied markedly across countries, with 63 countries performing routine genomic surveillance and 79 countries with high availability of SARS-CoV-2 sequencing. We also observed a marked heterogeneity of sequenced coverage across regions and countries. Across different variants, 21-46% of countries with explicit reporting on variants shared less than half of their variant sequences in public repositories. Our findings indicated an urgent need to expand sequencing capacity of virus isolates, enhance the sharing of sequences, the standardization of metadata files, and supportive networks for countries with no sequencing capability.
Collapse
Affiliation(s)
| | | | - Andrew Azman
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | | | | | | | | | | | | | | | | | | | - Daniel Leung
- University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Daryl Domman
- Center for Global Health, Department of Internal Medicine, University of New Mexico Health Sciences Center, New Mexico, USA
| |
Collapse
|
25
|
Rachtman E, Bafna V, Mirarab S. CONSULT: accurate contamination removal using locality-sensitive hashing. NAR Genom Bioinform 2021; 3:lqab071. [PMID: 34377979 PMCID: PMC8340999 DOI: 10.1093/nargab/lqab071] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 06/30/2021] [Accepted: 07/19/2021] [Indexed: 12/27/2022] Open
Abstract
A fundamental question appears in many bioinformatics applications: Does a sequencing read belong to a large dataset of genomes from some broad taxonomic group, even when the closest match in the set is evolutionarily divergent from the query? For example, low-coverage genome sequencing (skimming) projects either assemble the organelle genome or compute genomic distances directly from unassembled reads. Using unassembled reads needs contamination detection because samples often include reads from unintended groups of species. Similarly, assembling the organelle genome needs distinguishing organelle and nuclear reads. While k-mer-based methods have shown promise in read-matching, prior studies have shown that existing methods are insufficiently sensitive for contamination detection. Here, we introduce a new read-matching tool called CONSULT that tests whether k-mers from a query fall within a user-specified distance of the reference dataset using locality-sensitive hashing. Taking advantage of large memory machines available nowadays, CONSULT libraries accommodate tens of thousands of microbial species. Our results show that CONSULT has higher true-positive and lower false-positive rates of contamination detection than leading methods such as Kraken-II and improves distance calculation from genome skims. We also demonstrate that CONSULT can distinguish organelle reads from nuclear reads, leading to dramatic improvements in skim-based mitochondrial assemblies.
Collapse
Affiliation(s)
- Eleonora Rachtman
- Bioinformatics and Systems Biology Graduate Program, UC San Diego, CA 92093, USA
| | - Vineet Bafna
- Department of Computer Science and Engineering, UC San Diego, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, CA 92093, USA
| |
Collapse
|
26
|
Sun L, Bao L, Phurbu D, Qiao S, Sun S, Perma Y, Liu H. Amelioration of metabolic disorders by a mushroom-derived polyphenols correlates with the reduction of Ruminococcaceae in gut of DIO mice. FOOD SCIENCE AND HUMAN WELLNESS 2021. [DOI: 10.1016/j.fshw.2021.04.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
27
|
Liu YX, Qin Y, Chen T, Lu M, Qian X, Guo X, Bai Y. A practical guide to amplicon and metagenomic analysis of microbiome data. Protein Cell 2021; 12:315-330. [PMID: 32394199 PMCID: PMC8106563 DOI: 10.1007/s13238-020-00724-8] [Citation(s) in RCA: 326] [Impact Index Per Article: 108.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 04/10/2020] [Indexed: 12/22/2022] Open
Abstract
Advances in high-throughput sequencing (HTS) have fostered rapid developments in the field of microbiome research, and massive microbiome datasets are now being generated. However, the diversity of software tools and the complexity of analysis pipelines make it difficult to access this field. Here, we systematically summarize the advantages and limitations of microbiome methods. Then, we recommend specific pipelines for amplicon and metagenomic analyses, and describe commonly-used software and databases, to help researchers select the appropriate tools. Furthermore, we introduce statistical and visualization methods suitable for microbiome analysis, including alpha- and beta-diversity, taxonomic composition, difference comparisons, correlation, networks, machine learning, evolution, source tracing, and common visualization styles to help researchers make informed choices. Finally, a step-by-step reproducible analysis guide is introduced. We hope this review will allow researchers to carry out data analysis more effectively and to quickly select the appropriate tools in order to efficiently mine the biological significance behind the data.
Collapse
Affiliation(s)
- Yong-Xin Liu
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
- CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing, 100049, China.
- CAS-JIC Centre of Excellence for Plant and Microbial Science, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Yuan Qin
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing, 100049, China
- CAS-JIC Centre of Excellence for Plant and Microbial Science, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Tong Chen
- National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Meiping Lu
- Department of Rheumatology Immunology & Allergy, Children's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province, 310053, China
| | - Xubo Qian
- Department of Rheumatology Immunology & Allergy, Children's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province, 310053, China
| | - Xiaoxuan Guo
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing, 100049, China
- CAS-JIC Centre of Excellence for Plant and Microbial Science, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yang Bai
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
- CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing, 100049, China.
- CAS-JIC Centre of Excellence for Plant and Microbial Science, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
- College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
28
|
Yang J, Chun J. Taxonomic composition and variation in the gut microbiota of laboratory mice. Mamm Genome 2021; 32:297-310. [PMID: 33893864 DOI: 10.1007/s00335-021-09871-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 04/10/2021] [Indexed: 12/14/2022]
Abstract
The gut microbiota can affect host health, including humans. Mouse models have been used extensively to study the relationships between the host and the gut microbiota. With the development of cost-effective high-throughput DNA sequencing, several methods have been used to identify members of the gut microbiota of laboratory mice. In recent years, the amount of research and knowledge about the mouse gut microbiota has exploded, leading to significant breakthroughs in understanding of the taxonomic composition of and variation in this community. In addition, the rapidly increasing volume of data has allowed the development of public resources for exploring the mouse gut microbiota. In this review, we describe the concepts and pros and cons of basic methodologies that can be used to determine the gut bacterial profile in laboratory mice. We also present the key bacterial components of the mouse gut microbiota from the phylum to the species level and then compare them with those identified in other references. Additionally, we discuss variations in the mouse gut microbiota and their association with experiments using mice. Finally, we summarize the properties and functions of currently available public resources for exploring the mouse gut microbiota.
Collapse
Affiliation(s)
- Junwon Yang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Korea.,Institute of Molecular Biology & Genetics, Seoul National University, Seoul, 08826, Korea.,Department of Biological Sciences, Seoul National University, Seoul, 08826, Korea
| | - Jongsik Chun
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Korea. .,Institute of Molecular Biology & Genetics, Seoul National University, Seoul, 08826, Korea. .,Department of Biological Sciences, Seoul National University, Seoul, 08826, Korea.
| |
Collapse
|
29
|
Zeng T, Yu X, Chen Z. Applying artificial intelligence in the microbiome for gastrointestinal diseases: A review. J Gastroenterol Hepatol 2021; 36:832-840. [PMID: 33880762 DOI: 10.1111/jgh.15503] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 03/18/2021] [Accepted: 03/18/2021] [Indexed: 12/20/2022]
Abstract
For a long time, gut bacteria have been recognized for their important roles in the occurrence and progression of gastrointestinal diseases like colorectal cancer, and the ever-increasing amounts of microbiome data combined with other high-quality clinical and imaging datasets are leading the study of gastrointestinal diseases into an era of biomedical big data. The "omics" technologies used for microbiome analysis continuously evolve, and the machine learning or artificial intelligence technologies are key to extract the relevant information from microbiome data. This review intends to provide a focused summary of recent research and applications of microbiome big data and to discuss the use of artificial intelligence to combat gastrointestinal diseases.
Collapse
Affiliation(s)
- Tao Zeng
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
| | - Xiangtian Yu
- Clinical Reasearch Center, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Zhangran Chen
- Institute for Microbial Ecology, School of Medicine, Xiamen University, Xiamen, China
| |
Collapse
|
30
|
Tang J, Wu X, Mou M, Wang C, Wang L, Li F, Guo M, Yin J, Xie W, Wang X, Wang Y, Ding Y, Xue W, Zhu F. GIMICA: host genetic and immune factors shaping human microbiota. Nucleic Acids Res 2021; 49:D715-D722. [PMID: 33045729 PMCID: PMC7779047 DOI: 10.1093/nar/gkaa851] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/09/2020] [Accepted: 10/08/2020] [Indexed: 01/09/2023] Open
Abstract
Besides the environmental factors having tremendous impacts on the composition of microbial community, the host factors have recently gained extensive attentions on their roles in shaping human microbiota. There are two major types of host factors: host genetic factors (HGFs) and host immune factors (HIFs). These factors of each type are essential for defining the chemical and physical landscapes inhabited by microbiota, and the collective consideration of both types have great implication to serve comprehensive health management. However, no database was available to provide the comprehensive factors of both types. Herein, a database entitled 'Host Genetic and Immune Factors Shaping Human Microbiota (GIMICA)' was constructed. Based on the 4257 microbes confirmed to inhabit nine sites of human body, 2851 HGFs (1368 single nucleotide polymorphisms (SNPs), 186 copy number variations (CNVs), and 1297 non-coding ribonucleic acids (RNAs)) modulating the expression of 370 microbes were collected, and 549 HIFs (126 lymphocytes and phagocytes, 387 immune proteins, and 36 immune pathways) regulating the abundance of 455 microbes were also provided. All in all, GIMICA enabled the collective consideration not only between different types of host factor but also between the host and environmental ones, which is freely accessible without login requirement at: https://idrblab.org/gimica/.
Collapse
Affiliation(s)
- Jing Tang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,College of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Xianglu Wu
- Joint International Research Lab of Reproductive and Development, Department of Reproductive Biology, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Chuan Wang
- College of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lidan Wang
- College of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Maiyuan Guo
- College of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Wenqin Xie
- College of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Xiaona Wang
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Yingxiong Wang
- College of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.,Joint International Research Lab of Reproductive and Development, Department of Reproductive Biology, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Yubin Ding
- Joint International Research Lab of Reproductive and Development, Department of Reproductive Biology, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
31
|
Kasmanas JC, Bartholomäus A, Corrêa FB, Tal T, Jehmlich N, Herberth G, von Bergen M, Stadler PF, Carvalho ACPDLFD, Nunes da Rocha U. HumanMetagenomeDB: a public repository of curated and standardized metadata for human metagenomes. Nucleic Acids Res 2021; 49:D743-D750. [PMID: 33221926 PMCID: PMC7778935 DOI: 10.1093/nar/gkaa1031] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 10/15/2020] [Accepted: 10/21/2020] [Indexed: 12/30/2022] Open
Abstract
Metagenomics became a standard strategy to comprehend the functional potential of microbial communities, including the human microbiome. Currently, the number of metagenomes in public repositories is increasing exponentially. The Sequence Read Archive (SRA) and the MG-RAST are the two main repositories for metagenomic data. These databases allow scientists to reanalyze samples and explore new hypotheses. However, mining samples from them can be a limiting factor, since the metadata available in these repositories is often misannotated, misleading, and decentralized, creating an overly complex environment for sample reanalysis. The main goal of the HumanMetagenomeDB is to simplify the identification and use of public human metagenomes of interest. HumanMetagenomeDB version 1.0 contains metadata of 69 822 metagenomes. We standardized 203 attributes, based on standardized ontologies, describing host characteristics (e.g. sex, age and body mass index), diagnosis information (e.g. cancer, Crohn's disease and Parkinson), location (e.g. country, longitude and latitude), sampling site (e.g. gut, lung and skin) and sequencing attributes (e.g. sequencing platform, average length and sequence quality). Further, HumanMetagenomeDB version 1.0 metagenomes encompass 58 countries, 9 main sample sites (i.e. body parts), 58 diagnoses and multiple ages, ranging from just born to 91 years old. The HumanMetagenomeDB is publicly available at https://webapp.ufz.de/hmgdb/.
Collapse
Affiliation(s)
- Jonas Coelho Kasmanas
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil.,Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Saxony 04318, Germany.,Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Saxony 04107, Germany
| | - Alexander Bartholomäus
- GFZ German Research Centre for Geosciences, Section 3.7 Geomicrobiology, Telegrafenberg, 14473 Potsdam, Germany
| | - Felipe Borim Corrêa
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Saxony 04318, Germany.,Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Saxony 04107, Germany
| | - Tamara Tal
- Department of Bioanalytical Ecotoxicology, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Saxony 04318, Germany
| | - Nico Jehmlich
- Department of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Saxony 04318, Germany
| | - Gunda Herberth
- Department of Environmental Immunology, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Saxony 04318, Germany
| | - Martin von Bergen
- Department of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Saxony 04318, Germany.,Institute of Biochemistry, Faculty of Life Sciences, University of Leipzig, Leipzig, Saxony 04107, Germany
| | - Peter F Stadler
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Saxony 04107, Germany
| | | | - Ulisses Nunes da Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Saxony 04318, Germany
| |
Collapse
|
32
|
Microbiome Search Engine 2: a Platform for Taxonomic and Functional Search of Global Microbiomes on the Whole-Microbiome Level. mSystems 2021; 6:6/1/e00943-20. [PMID: 33468706 PMCID: PMC7820668 DOI: 10.1128/msystems.00943-20] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird’s-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Metagenomic data sets from diverse environments have been growing rapidly. To ensure accessibility and reusability, tools that quickly and informatively correlate new microbiomes with existing ones are in demand. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes in the global metagenome data space based on the taxonomic or functional similarity of a whole microbiome to those in the database. MSE 2 consists of (i) a well-organized and regularly updated microbiome database that currently contains over 250,000 metagenomic shotgun and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies, (ii) an enhanced search engine that enables real-time and fast (<0.5 s per query) searches against the entire database for best-matched microbiomes using overall taxonomic or functional profiles, and (iii) a Web-based graphical user interface for user-friendly searching, data browsing, and tutoring. MSE 2 is freely accessible via http://mse.ac.cn. For standalone searches of customized microbiome databases, the kernel of the MSE 2 search engine is provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms). IMPORTANCE A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird’s-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Key improvements include database extension, data compatibility, a search engine kernel, and a user interface. The new ability to search the microbiome space via functional similarity greatly expands the scope of search-based mining of the microbiome big data.
Collapse
|
33
|
Ghosh A, Firdous S, Saha S. Bioinformatics for Human Microbiome. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
34
|
Song S, Ma L, Zou D, Tian D, Li C, Zhu J, Chen M, Wang A, Ma Y, Li M, Teng X, Cui Y, Duan G, Zhang M, Jin T, Shi C, Du Z, Zhang Y, Liu C, Li R, Zeng J, Hao L, Jiang S, Chen H, Han D, Xiao J, Zhang Z, Zhao W, Xue Y, Bao Y. The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVR. GENOMICS, PROTEOMICS & BIOINFORMATICS 2020; 18:749-759. [PMID: 33704069 PMCID: PMC7836967 DOI: 10.1016/j.gpb.2020.09.001] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 09/17/2020] [Accepted: 09/24/2020] [Indexed: 01/24/2023]
Abstract
On January 22, 2020, China National Center for Bioinformation (CNCB) released the 2019 Novel Coronavirus Resource (2019nCoVR), an open-access information resource for the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 2019nCoVR features a comprehensive integration of sequence and clinical information for all publicly available SARS-CoV-2 isolates, which are manually curated with value-added annotations and quality evaluated by an automated in-house pipeline. Of particular note, 2019nCoVR offers systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale. It provides all identified variants and their detailed statistics for each virus isolate, and congregates the quality score, functional annotation, and population frequency for each variant. Spatiotemporal change for each variant can be visualized and historical viral haplotype network maps for the course of the outbreak are also generated based on all complete and high-quality genomes available. Moreover, 2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on the coronavirus disease 2019 (COVID-19), including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC. Furthermore, by linking with relevant databases in CNCB, 2019nCoVR offers data submission services for raw sequence reads and assembled genomes, and data sharing with NCBI. Collectively, SARS-CoV-2 is updated daily to collect the latest information on genome sequences, variants, haplotypes, and literature for a timely reflection, making 2019nCoVR a valuable resource for the global research community. 2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/.
Collapse
Affiliation(s)
- Shuhui Song
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lina Ma
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Dong Zou
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Dongmei Tian
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Cuiping Li
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Junwei Zhu
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Meili Chen
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Anke Wang
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yingke Ma
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Mengwei Li
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xufei Teng
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ying Cui
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guangya Duan
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Mochen Zhang
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tong Jin
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chengmin Shi
- China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhenglin Du
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yadong Zhang
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chuandong Liu
- China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Rujiao Li
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Jingyao Zeng
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Lili Hao
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Shuai Jiang
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Hua Chen
- China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| | - Dali Han
- China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Jingfa Xiao
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhang Zhang
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Wenming Zhao
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Yongbiao Xue
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Yiming Bao
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
35
|
Zhou Z, Ge S, Li Y, Ma W, Liu Y, Hu S, Zhang R, Ma Y, Du K, Syed A, Chen P. Human Gut Microbiome-Based Knowledgebase as a Biomarker Screening Tool to Improve the Predicted Probability for Colorectal Cancer. Front Microbiol 2020; 11:596027. [PMID: 33329482 PMCID: PMC7717945 DOI: 10.3389/fmicb.2020.596027] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 10/29/2020] [Indexed: 12/19/2022] Open
Abstract
Colorectal cancer (CRC) is a common clinical malignancy globally ranked as the fourth leading cause of cancer mortality. Some microbes are known to contribute to adenoma-carcinoma transition and possess diagnostic potential. Advances in high-throughput sequencing technology and functional studies have provided significant insights into the landscape of the gut microbiome and the fundamental roles of its components in carcinogenesis. Integration of scattered knowledge is highly beneficial for future progress. In this study, literature review and information extraction were performed, with the aim of integrating the available data resources and facilitating comparative research. A knowledgebase of the human CRC microbiome was compiled to facilitate understanding of diagnosis, and the global signatures of CRC microbes, sample types, algorithms, differential microorganisms and various panels of markers plus their diagnostic performance were evaluated based on statistical and phylogenetic analyses. Additionally, prospects about current changelings and solution strategies were outlined for identifying future research directions. This type of data integration strategy presents an effective platform for inquiry and comparison of relevant information, providing a tool for further study about CRC-related microbes and exploration of factors promoting clinical transformation (available at: http://gsbios.com/index/experimental/dts_ mben?id=1).
Collapse
Affiliation(s)
- Zhongkun Zhou
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | - Shiqiang Ge
- Department of Electronic Information Engineering, Lanzhou Vocational Technical College, Lanzhou, China
| | - Yang Li
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | - Wantong Ma
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | - Yuheng Liu
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | - Shujian Hu
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | - Rentao Zhang
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | - Yunhao Ma
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | - Kangjia Du
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | | | - Peng Chen
- School of Pharmacy, Lanzhou University, Lanzhou, China
| |
Collapse
|
36
|
Su X, Jing G, Zhang Y, Wu S. Method development for cross-study microbiome data mining: Challenges and opportunities. Comput Struct Biotechnol J 2020; 18:2075-2080. [PMID: 32802279 PMCID: PMC7419250 DOI: 10.1016/j.csbj.2020.07.020] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 07/22/2020] [Accepted: 07/24/2020] [Indexed: 01/26/2023] Open
Abstract
During the past decade, tremendous amount of microbiome sequencing data has been generated to study on the dynamic associations between microbial profiles and environments. How to precisely and efficiently decipher large-scale of microbiome data and furtherly take advantages from it has become one of the most essential bottlenecks for microbiome research at present. In this mini-review, we focus on the three key steps of analyzing cross-study microbiome datasets, including microbiome profiling, data integrating and data mining. By introducing the current bioinformatics approaches and discussing their limitations, we prospect the opportunities in development of computational methods for the three steps, and propose the promising solutions to multi-omics data analysis for comprehensive understanding and rapid investigation of microbiome from different angles, which could potentially promote the data-driven research by providing a broader view of the "microbiome data space".
Collapse
Affiliation(s)
- Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071 China
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong 266101 China
| | - Gongchao Jing
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong 266101 China
| | - Yufeng Zhang
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071 China
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong 266101 China
| | - Shunyao Wu
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071 China
| |
Collapse
|
37
|
Zhao D, Zhang S, Xue Q, Chen J, Zhou J, Cheng F, Li M, Zhu Y, Yu H, Hu S, Zheng Y, Liu S, Xiang H. Abundant Taxa and Favorable Pathways in the Microbiome of Soda-Saline Lakes in Inner Mongolia. Front Microbiol 2020; 11:1740. [PMID: 32793172 PMCID: PMC7393216 DOI: 10.3389/fmicb.2020.01740] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2020] [Accepted: 07/03/2020] [Indexed: 12/14/2022] Open
Abstract
Soda-saline lakes are a special type of alkaline lake in which the chloride concentration is greater than the carbonate/bicarbonate concentration. Due to the high pH and a usually higher osmotic pressure than that of a normal soda lake, the microbes may need more energy to thrive in such a double-extreme environment. In this study, we systematically investigated the microbiome of the brine and sediment samples of nine artificially separated ponds (salinities from 5.5% to saturation) within two soda-saline lakes in Inner Mongolia of China, assisted by deep metagenomic sequencing. The main inorganic ions shaped the microbial community in both the brines and sediments, and the chloride concentration exhibited the most significant effect. A total of 385 metagenome-assembled genomes (MAGs) were generated, in which 38 MAGs were revealed as the abundant species in at least one of the eighteen different samples. Interestingly, these abundant species also represented the most branches of the microbiome of the soda-saline lakes at the phylum level. These abundant taxa were close relatives of microorganisms from classic soda lakes and neutral saline environments, but forming a combination of both habitats. Notably, approximately half of the abundant MAGs had the potential to drive dissimilatory sulfur cycling. These MAGs included four autotrophic Ectothiorhodospiraceae MAGs, one Cyanobacteria MAG and nine heterotrophic MAGs with the potential to oxidize sulfur, as well as four abundant MAGs containing genes for elemental sulfur respiration. The possible reason is that reductive sulfur compounds could provide additional energy for the related species, and reductions of oxidative sulfur compounds are more prone to occur under alkaline conditions which support the sulfur cycling. In addition, a unique 1,4-alpha-glucan phosphorylation pathway, but not a normal hydrolysis one, was found in the abundant Candidatus Nanohaloarchaeota MAG NHA-1, which would produce more energy in polysaccharide degradation. In summary, this work has revealed the abundant taxa and favorable pathways in the soda-saline lakes, indicating that efficient energy regeneration pathway may increase the capacity for environmental adaptation in such saline-alkaline environments. These findings may help to elucidate the relationship between microbial metabolism and adaptation to extreme environments.
Collapse
Affiliation(s)
- Dahe Zhao
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Shengjie Zhang
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Qiong Xue
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Junyu Chen
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jian Zhou
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Feiyue Cheng
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Ming Li
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Yaxin Zhu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Haiying Yu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Songnian Hu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Yanning Zheng
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Shuangjiang Liu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Hua Xiang
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
38
|
Zhang Q, Yu K, Li S, Zhang X, Zhao Q, Zhao X, Liu Z, Cheng H, Liu ZX, Li X. gutMEGA: a database of the human gut MEtaGenome Atlas. Brief Bioinform 2020; 22:5851266. [PMID: 32496513 DOI: 10.1093/bib/bbaa082] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 04/03/2020] [Accepted: 04/21/2020] [Indexed: 02/07/2023] Open
Abstract
The gut microbiota plays important roles in human health through regulating both physiological homeostasis and disease emergence. The accumulation of metagenomic sequencing studies enables us to better understand the temporal and spatial variations of the gut microbiota under different physiological and pathological conditions. However, it is inconvenient for scientists to query and retrieve published data; thus, a comprehensive resource for the quantitative gut metagenome is urgently needed. In this study, we developed gut MEtaGenome Atlas (gutMEGA), a well-annotated comprehensive database, to curate and host published quantitative gut microbiota datasets from Homo sapiens. By carefully curating the gut microbiota composition, phenotypes and experimental information, gutMEGA finally integrated 59 132 quantification events for 6457 taxa at seven different levels (kingdom, phylum, class, order, family, genus and species) under 776 conditions. Moreover, with various browsing and search functions, gutMEGA provides a fast and simple way for users to obtain the relative abundances of intestinal microbes among phenotypes. Overall, gutMEGA is a convenient and comprehensive resource for gut metagenome research, which can be freely accessed at http://gutmega.omicsbio.info.
Collapse
|
39
|
Wu S, Sun C, Li Y, Wang T, Jia L, Lai S, Yang Y, Luo P, Dai D, Yang YQ, Luo Q, Gao NL, Ning K, He LJ, Zhao XM, Chen WH. GMrepo: a database of curated and consistently annotated human gut metagenomes. Nucleic Acids Res 2020; 48:D545-D553. [PMID: 31504765 PMCID: PMC6943048 DOI: 10.1093/nar/gkz764] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Revised: 08/20/2019] [Accepted: 08/30/2019] [Indexed: 12/29/2022] Open
Abstract
GMrepo (data repository for Gut Microbiota) is a database of curated and consistently annotated human gut metagenomes. Its main purpose is to facilitate the reusability and accessibility of the rapidly growing human metagenomic data. This is achieved by consistently annotating the microbial contents of collected samples using state-of-art toolsets and by manual curation of the meta-data of the corresponding human hosts. GMrepo organizes the collected samples according to their associated phenotypes and includes all possible related meta-data such as age, sex, country, body-mass-index (BMI) and recent antibiotics usage. To make relevant information easier to access, GMrepo is equipped with a graphical query builder, enabling users to make customized, complex and biologically relevant queries. For example, to find (1) samples from healthy individuals of 18 to 25 years old with BMIs between 18.5 and 24.9, or (2) projects that are related to colorectal neoplasms, with each containing >100 samples and both patients and healthy controls. Precomputed species/genus relative abundances, prevalence within and across phenotypes, and pairwise co-occurrence information are all available at the website and accessible through programmable interfaces. So far, GMrepo contains 58 903 human gut samples/runs (including 17 618 metagenomes and 41 285 amplicons) from 253 projects concerning 92 phenotypes. GMrepo is freely available at: https://gmrepo.humangut.info.
Collapse
Affiliation(s)
- Sicheng Wu
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 430074 Wuhan, Hubei, China
| | - Chuqing Sun
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 430074 Wuhan, Hubei, China
| | - Yanze Li
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 430074 Wuhan, Hubei, China
| | - Teng Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 430074 Wuhan, Hubei, China
| | - Longhao Jia
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 430074 Wuhan, Hubei, China
| | - Senying Lai
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 430074 Wuhan, Hubei, China
| | - Yaling Yang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 430074 Wuhan, Hubei, China.,Shenzhen Digital Life Institute, 518053 Shenzhen, Guangdong, China
| | - Pengyu Luo
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 430074 Wuhan, Hubei, China
| | - Die Dai
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 430074 Wuhan, Hubei, China
| | - Yong-Qing Yang
- Huazhong University of Science and Technology School of Physics, 430070 Wuhan, Hubei, China
| | - Qibin Luo
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - Na L Gao
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 430074 Wuhan, Hubei, China.,Institute for Computer Science and Dept. of Biology, Heinrich Heine University, 40225 Duesseldorf, Germany
| | - Kang Ning
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 430074 Wuhan, Hubei, China.,Huazhong University of Science and Technology Ezhou Industrial Technology Research Institute, 436044 Ezhou, Hubei, China
| | - Li-Jie He
- Department of Medical Oncology, People's Hospital of Liaoning Province, 110016 Shenyang, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, 200433 Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, China
| | - Wei-Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 430074 Wuhan, Hubei, China.,Huazhong University of Science and Technology Ezhou Industrial Technology Research Institute, 436044 Ezhou, Hubei, China.,College of Life Science, HeNan Normal University, 453007 Xinxiang, Henan, China
| |
Collapse
|
40
|
Jo J, Oh J, Park C. Microbial community analysis using high-throughput sequencing technology: a beginner's guide for microbiologists. J Microbiol 2020; 58:176-192. [PMID: 32108314 DOI: 10.1007/s12275-020-9525-5] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 12/11/2019] [Accepted: 12/16/2019] [Indexed: 12/19/2022]
Abstract
Microbial communities present in diverse environments from deep seas to human body niches play significant roles in the complex ecosystem and human health. Characterizing their structural and functional diversities is indispensable, and many approaches, such as microscopic observation, DNA fingerprinting, and PCR-based marker gene analysis, have been successfully applied to identify microorganisms. Since the revolutionary improvement of DNA sequencing technologies, direct and high-throughput analysis of genomic DNA from a whole environmental community without prior cultivation has become the mainstream approach, overcoming the constraints of the classical approaches. Here, we first briefly review the history of environmental DNA analysis applications with a focus on profiling the taxonomic composition and functional potentials of microbial communities. To this end, we aim to introduce the shotgun metagenomic sequencing (SMS) approach, which is used for the untargeted ("shotgun") sequencing of all ("meta") microbial genomes ("genomic") present in a sample. SMS data analyses are performed in silico using various software programs; however, in silico analysis is typically regarded as a burden on wet-lab experimental microbiologists. Therefore, in this review, we present microbiologists who are unfamiliar with in silico analyses with a basic and practical SMS data analysis protocol. This protocol covers all the bioinformatics processes of the SMS analysis in terms of data preprocessing, taxonomic profiling, functional annotation, and visualization.
Collapse
Affiliation(s)
- Jihoon Jo
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Jooseong Oh
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Chungoo Park
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea.
| |
Collapse
|
41
|
Li J, Chen Z, Wang Y. Contents, Construction Methods, Data Resources, and Functions Comparative Analysis of Bacteria Databases. Int J Biol Sci 2020; 16:838-848. [PMID: 32071553 PMCID: PMC7019132 DOI: 10.7150/ijbs.39289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 11/11/2019] [Indexed: 11/20/2022] Open
Abstract
Many bacterial-related databases are developed to meet the researchers' needs of analysis and search for a number of bacterial information. However, these databases have different data resources, construction methods, data formats, and analysis tools. It's difficult for researchers to select appropriate databases and analysis tools to promote their researches. In the paper, we compared the contents, construction methods, data sources, update frequency, scope and scale of data, analysis tools, and features of nine famous bacterial databases: CARD, EffectiveDB, MBGD, MPD, PATRCI, PHI-base, VFDB, gcMeta and SILVA, and help researchers to better make better use of these databases. In addition, we also hope this review can help researchers develop a more comprehensive database and better tools to meet the needs of researchers.
Collapse
Affiliation(s)
- Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, China
| | - Zhuo Chen
- School of Computer Science and Technology, Harbin Institute of Technology, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, China
| |
Collapse
|
42
|
Badal VD, Wright D, Katsis Y, Kim HC, Swafford AD, Knight R, Hsu CN. Challenges in the construction of knowledge bases for human microbiome-disease associations. MICROBIOME 2019; 7:129. [PMID: 31488215 PMCID: PMC6728997 DOI: 10.1186/s40168-019-0742-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 08/20/2019] [Indexed: 05/05/2023]
Abstract
The last few years have seen tremendous growth in human microbiome research, with a particular focus on the links to both mental and physical health and disease. Medical and experimental settings provide initial sources of information about these links, but individual studies produce disconnected pieces of knowledge bounded in context by the perspective of expert researchers reading full-text publications. Building a knowledge base (KB) consolidating these disconnected pieces is an essential first step to democratize and accelerate the process of accessing the collective discoveries of human disease connections to the human microbiome. In this article, we survey the existing tools and development efforts that have been produced to capture portions of the information needed to construct a KB of all known human microbiome-disease associations and highlight the need for additional innovations in natural language processing (NLP), text mining, taxonomic representations, and field-wide vocabulary standardization in human microbiome research. Addressing these challenges will enable the construction of KBs that help identify new insights amenable to experimental validation and potentially clinical decision support.
Collapse
Affiliation(s)
- Varsha Dave Badal
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
| | - Dustin Wright
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
- Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
| | - Yannis Katsis
- Scalable Knowledge Intelligence, IBM Research-Almaden, 650 Harry Road, San Jose, CA 95120 USA
| | - Ho-Cheol Kim
- Scalable Knowledge Intelligence, IBM Research-Almaden, 650 Harry Road, San Jose, CA 95120 USA
| | - Austin D. Swafford
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
| | - Rob Knight
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
- Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
- UCSD Health Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
| | - Chun-Nan Hsu
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
- Department of Neurosciences and Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
| |
Collapse
|
43
|
Lajoie G, Kembel SW. Making the Most of Trait-Based Approaches for Microbial Ecology. Trends Microbiol 2019; 27:814-823. [PMID: 31296406 DOI: 10.1016/j.tim.2019.06.003] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 06/12/2019] [Accepted: 06/13/2019] [Indexed: 12/13/2022]
Abstract
There is an increasing interest in applying trait-based approaches to microbial ecology, but the question of how and why to do it is still lagging behind. By anchoring our discussion of these questions in a framework derived from epistemology, we broaden the scope of trait-based approaches to microbial ecology from one oriented mostly around explanation towards one inclusive of the predictive and integrative potential of these approaches. We use case studies from macro-organismal ecology to concretely show how these goals for knowledge development can be fulfilled and propose clear directions, adapted to the biological reality of microbes, to make the most of recent advancements in the measurement of microbial phenotypes and traits.
Collapse
Affiliation(s)
- Geneviève Lajoie
- Département des Sciences Biologiques, Université du Québec à Montréal, 141 Avenue du Président-Kennedy, Montréal, Canada, H2X 1Y4.
| | - Steven W Kembel
- Département des Sciences Biologiques, Université du Québec à Montréal, 141 Avenue du Président-Kennedy, Montréal, Canada, H2X 1Y4
| |
Collapse
|
44
|
Morton JT, Marotz C, Washburne A, Silverman J, Zaramela LS, Edlund A, Zengler K, Knight R. Establishing microbial composition measurement standards with reference frames. Nat Commun 2019; 10:2719. [PMID: 31222023 PMCID: PMC6586903 DOI: 10.1038/s41467-019-10656-5] [Citation(s) in RCA: 356] [Impact Index Per Article: 71.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 05/14/2019] [Indexed: 12/30/2022] Open
Abstract
Differential abundance analysis is controversial throughout microbiome research. Gold standard approaches require laborious measurements of total microbial load, or absolute number of microorganisms, to accurately determine taxonomic shifts. Therefore, most studies rely on relative abundance data. Here, we demonstrate common pitfalls in comparing relative abundance across samples and identify two solutions that reveal microbial changes without the need to estimate total microbial load. We define the notion of "reference frames", which provide deep intuition about the compositional nature of microbiome data. In an oral time series experiment, reference frames alleviate false positives and produce consistent results on both raw and cell-count normalized data. Furthermore, reference frames identify consistent, differentially abundant microbes previously undetected in two independent published datasets from subjects with atopic dermatitis. These methods allow reassessment of published relative abundance data to reveal reproducible microbial changes from standard sequencing output without the need for new assays.
Collapse
Affiliation(s)
- James T Morton
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92093, USA
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Clarisse Marotz
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Alex Washburne
- Department of Microbiology and Immunology, Montana State University, Bozeman, MT, 59717, USA
| | - Justin Silverman
- Program in Computational Biology and Bioinformatics, Duke University, Durham, 27708, USA
- Medical Scientist Training Program, Duke University, Durham, 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, 27708, USA
| | - Livia S Zaramela
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Anna Edlund
- J. Craig Venter Institute, Genomic Medicine Group, La Jolla, CA, 92037, USA
| | - Karsten Zengler
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92093, USA.
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, 92093, USA.
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, 92093, USA.
| | - Rob Knight
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92093, USA.
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, 92093, USA.
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|