1
|
Zheng F, Lin Y, Qiu L, Zheng Y, Zeng M, Lin X, He Q, Lin Y, Chen L, Lin X, Chen X, Lin L, Wang L, He J, Lin F, Yang K, Wang N, Lin M, Lian S, Wang Z. Age at onset mediates genetic impact on disease severity in facioscapulohumeral muscular dystrophy. Brain 2024:awae309. [PMID: 39711249 DOI: 10.1093/brain/awae309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 07/24/2024] [Accepted: 09/08/2024] [Indexed: 12/24/2024] Open
Abstract
Facioscapulohumeral muscular dystrophy type 1 (FSHD1) patients exhibit marked variability in both age at onset (AAO) and disease severity. Early onset FSHD1 patients are at an increased risk of severe weakness, and early onset has been tentatively linked to the length of D4Z4 repeat units (RUs) and methylation levels. The present study explored potential relationships among genetic characteristics, AAO and disease severity in FSHD1. This retrospective and observational cohort study was conducted at the Fujian Neuromedical Centre (FNMC) in China. Genetically confirmed participants with FSHD1 recruited from 2001 to 2023 underwent distal D4Z4 methylation assessment. Disease severity was assessed by FSHD clinical score, age-corrected clinical severity score (ACSS) and onset age of lower extremity involvement. Mediation analyses were used to explore relationships among genetic characteristics, AAO and disease severity. Finally, machine learning was employed to explore AAO prediction in FSHD1. A total of 874 participants (including 804 symptomatic patients and 70 asymptomatic carriers) were included. Multivariate Cox regression analyses indicated that male gender, low DUZ4 RUs, low CpG6 methylation levels, non-mosaic mutation and de novo mutation were independently associated with early onset in FSHD1. Early onset patients (AAO < 10 years) had both a significantly higher proportion and an earlier median onset age of lower extremity involvement compared to the typical adolescent onset (10 ≤ AAO < 20 years), typical adult onset (20 ≤ AAO < 30 years) and late onset (AAO ≥ 30 years) subgroups. AAO was negatively correlated with both clinical score and ACSS. We found that AAO exerted mediation effects, accounting for 12.2% of the total effect of D4Z4 RUs and CpG6 methylation levels on ACSS and 38.6% of the total effect of D4Z4 RUs and CpG6 methylation levels on onset age of lower extremity involvement. A random forest model that incorporated variables including gender, age at examination, inheritance pattern, mosaic mutation, D4Z4 RUs and D4Z4 methylation levels (at CpG3, CpG6 and CpG10 loci) performed well for AAO prediction. The predicted AAO (pAAO) was negatively correlated with ACSS (Spearman's ρ = -0.692). Our study revealed independent contributions from D4Z4 RUs, D4Z4 methylation levels, mosaic mutation and inheritance pattern on AAO variation in FSHD1. AAO mediates effects of D4Z4 RUs and methylation levels on disease severity. The pAAO values from our random forest model informatively reflect disease severity, offering insights that can support efficacious patient management.
Collapse
Affiliation(s)
- Fuze Zheng
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
| | - Yawen Lin
- College of Computer and Data Science, Fuzhou University, and Fujian Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou 350108, China
| | - Liangliang Qiu
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
- Department of Neurology, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China
| | - Ying Zheng
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
| | - Minghui Zeng
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
| | - Xiaodan Lin
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
| | - Qifang He
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
| | - Yuhua Lin
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
| | - Long Chen
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
| | - Xin Lin
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
- Department of Neurology, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China
| | - Xinyue Chen
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
| | - Lin Lin
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
| | - Lili Wang
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
| | - Junjie He
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
| | - Feng Lin
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
- Department of Neurology, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China
| | - Kang Yang
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
- Department of Neurology, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China
| | - Ning Wang
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
- Department of Neurology, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China
| | - Minting Lin
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
- Department of Neurology, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China
| | - Sheng Lian
- College of Computer and Data Science, Fuzhou University, and Fujian Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou 350108, China
| | - Zhiqiang Wang
- Department of Neurology and Institute of Neurology of First Affiliated Hospital, Institute of Neuroscience, and Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou 350005, China
- Department of Neurology, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China
| |
Collapse
|
2
|
Jiang C, Yang J, Peng X, Li X. A permutable MLP-like architecture for disease prediction from gut metagenomic data. BMC Bioinformatics 2024; 25:246. [PMID: 39048979 PMCID: PMC11270793 DOI: 10.1186/s12859-024-05856-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 07/05/2024] [Indexed: 07/27/2024] Open
Abstract
Metagenomic data plays a crucial role in analyzing the relationship between microbes and diseases. However, the limited number of samples, high dimensionality, and sparsity of metagenomic data pose significant challenges for the application of deep learning in data classification and prediction. Previous studies have shown that utilizing the phylogenetic tree structure to transform metagenomic abundance data into a 2D matrix input for convolutional neural networks (CNNs) improves classification performance. Inspired by the success of a Permutable MLP-like architecture in visual recognition, we propose Metagenomic Permutator (MetaP), which applied the Permutable MLP-like network structure to capture the phylogenetic information of microbes within the 2D matrix formed by phylogenetic tree. Our experiments demonstrate that our model achieved competitive performance compared to other deep neural networks and traditional machine learning, and has good prospects for multi-classification and large sample sizes. Furthermore, we utilize the SHAP (SHapley Additive exPlanations) method to interpret our model predictions, identifying the microbial features that are associated with diseases.
Collapse
Affiliation(s)
- Cong Jiang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China
| | - Jian Yang
- Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders and National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, China
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, China
| | - Xiaogang Peng
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China.
| | - Xiaozheng Li
- College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, China.
- JCY Biotech Ltd., Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, China.
| |
Collapse
|
3
|
Masuda S, Gan P, Kiguchi Y, Anda M, Sasaki K, Shibata A, Iwasaki W, Suda W, Shirasu K. Uncovering microbiomes of the rice phyllosphere using long-read metagenomic sequencing. Commun Biol 2024; 7:357. [PMID: 38538803 PMCID: PMC10973392 DOI: 10.1038/s42003-024-05998-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 02/29/2024] [Indexed: 12/14/2024] Open
Abstract
The plant microbiome is crucial for plant growth, yet many important questions remain, such as the identification of specific bacterial species in plants, their genetic content, and location of these genes on chromosomes or plasmids. To gain insights into the genetic makeup of the rice-phyllosphere, we perform a metagenomic analysis using long-read sequences. Here, 1.8 Gb reads are assembled into 26,067 contigs including 142 circular sequences. Within these contigs, 669 complete 16S rRNA genes are clustered into 166 bacterial species, 121 of which show low identity (<97%) to defined sequences, suggesting novel species. The circular contigs contain novel chromosomes and a megaplasmid, and most of the smaller circular contigs are defined as novel plasmids or bacteriophages. One circular contig represents the complete chromosome of a difficult-to-culture bacterium Candidatus Saccharibacteria. Our findings demonstrate the efficacy of long-read-based metagenomics for profiling microbial communities and discovering novel sequences in plant-microbiome studies.
Collapse
Affiliation(s)
- Sachiko Masuda
- RIKEN Center for Sustainable Resource Science, Kanagawa, Japan
| | - Pamela Gan
- RIKEN Center for Sustainable Resource Science, Kanagawa, Japan
| | - Yuya Kiguchi
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
- Cooperative Major in Advanced Health Science, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
- RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
| | - Mizue Anda
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Kazuhiro Sasaki
- Institute for Sustainable Agro‑ecosystem Services, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
- Japan International Research Center for Agricultural Sciences, Ibaraki, Japan
| | - Arisa Shibata
- RIKEN Center for Sustainable Resource Science, Kanagawa, Japan
| | - Wataru Iwasaki
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Wataru Suda
- RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
| | - Ken Shirasu
- RIKEN Center for Sustainable Resource Science, Kanagawa, Japan.
- Graduate School of Science, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
4
|
Chakraborty N. Metabolites: a converging node of host and microbe to explain meta-organism. Front Microbiol 2024; 15:1337368. [PMID: 38505556 PMCID: PMC10949987 DOI: 10.3389/fmicb.2024.1337368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 02/13/2024] [Indexed: 03/21/2024] Open
Abstract
Meta-organisms encompassing the host and resident microbiota play a significant role in combatting diseases and responding to stress. Hence, there is growing traction to build a knowledge base about this ecosystem, particularly to characterize the bidirectional relationship between the host and microbiota. In this context, metabolomics has emerged as the major converging node of this entire ecosystem. Systematic comprehension of this resourceful omics component can elucidate the organism-specific response trajectory and the communication grid across the ecosystem embodying meta-organisms. Translating this knowledge into designing nutraceuticals and next-generation therapy are ongoing. Its major hindrance is a significant knowledge gap about the underlying mechanisms maintaining a delicate balance within this ecosystem. To bridge this knowledge gap, a holistic picture of the available information has been presented with a primary focus on the microbiota-metabolite relationship dynamics. The central theme of this article is the gut-brain axis and the participating microbial metabolites that impact cerebral functions.
Collapse
Affiliation(s)
- Nabarun Chakraborty
- Medical Readiness Systems Biology, CMPN, WRAIR, Silver Spring, MD, United States
| |
Collapse
|
5
|
Miao Y, Sun Z, Ma C, Lin C, Wang G, Yang C. VirGrapher: a graph-based viral identifier for long sequences from metagenomes. Brief Bioinform 2024; 25:bbae036. [PMID: 38343326 PMCID: PMC10859693 DOI: 10.1093/bib/bbae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/15/2024] [Accepted: 01/18/2024] [Indexed: 02/15/2024] Open
Abstract
Viruses are the most abundant biological entities on earth and are important components of microbial communities. A metagenome contains all microorganisms from an environmental sample. Correctly identifying viruses from these mixed sequences is critical in viral analyses. It is common to identify long viral sequences, which has already been passed thought pipelines of assembly and binning. Existing deep learning-based methods divide these long sequences into short subsequences and identify them separately. This makes the relationships between them be omitted, leading to poor performance on identifying long viral sequences. In this paper, VirGrapher is proposed to improve the identification performance of long viral sequences by constructing relationships among short subsequences from long ones. VirGrapher see a long sequence as a graph and uses a Graph Convolutional Network (GCN) model to learn multilayer connections between nodes from sequences after a GCN-based node embedding model. VirGrapher achieves a better AUC value and accuracy on validation set, which is better than three benchmark methods.
Collapse
Affiliation(s)
- Yan Miao
- College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, 150040, Heilongjiang Province, China
| | - Zhenyuan Sun
- College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, 150040, Heilongjiang Province, China
| | - Chenjing Ma
- College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, 150040, Heilongjiang Province, China
| | - Chen Lin
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiangannan Road, 361104, Fujian Province, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, 150040, Heilongjiang Province, China
| | - Chunxue Yang
- College of Landscape Architecture, Northeast Forestry University, Hexing Road, 150040, Heilongjiang Province, China
| |
Collapse
|
6
|
Monshizadeh M, Ye Y. Incorporating metabolic activity, taxonomy and community structure to improve microbiome-based predictive models for host phenotype prediction. Gut Microbes 2024; 16:2302076. [PMID: 38214657 PMCID: PMC10793686 DOI: 10.1080/19490976.2024.2302076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 01/02/2024] [Indexed: 01/13/2024] Open
Abstract
We developed MicroKPNN, a prior-knowledge guided interpretable neural network for microbiome-based human host phenotype prediction. The prior knowledge used in MicroKPNN includes the metabolic activities of different bacterial species, phylogenetic relationships, and bacterial community structure, all in a shallow neural network. Application of MicroKPNN to seven gut microbiome datasets (involving five different human diseases including inflammatory bowel disease, type 2 diabetes, liver cirrhosis, colorectal cancer, and obesity) shows that incorporation of the prior knowledge helped improve the microbiome-based host phenotype prediction. MicroKPNN outperformed fully connected neural network-based approaches in all seven cases, with the most improvement of accuracy in the prediction of type 2 diabetes. MicroKPNN outperformed a recently developed deep-learning based approach DeepMicro, which selects the best combination of autoencoder and machine learning approach to make predictions, in all of the seven cases. Importantly, we showed that MicroKPNN provides a way for interpretation of the predictive models. Using importance scores estimated for the hidden nodes, MicroKPNN could provide explanations for prior research findings by highlighting the roles of specific microbiome components in phenotype predictions. In addition, it may suggest potential future research directions for studying the impacts of microbiome on host health and diseases. MicroKPNN is publicly available at https://github.com/mgtools/MicroKPNN.
Collapse
Affiliation(s)
- Mahsa Monshizadeh
- Computer Science Department, Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| | - Yuzhen Ye
- Computer Science Department, Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| |
Collapse
|
7
|
Liao H, Shang J, Sun Y. GDmicro: classifying host disease status with GCN and deep adaptation network based on the human gut microbiome data. Bioinformatics 2023; 39:btad747. [PMID: 38085234 PMCID: PMC10749762 DOI: 10.1093/bioinformatics/btad747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 11/16/2023] [Accepted: 12/11/2023] [Indexed: 12/27/2023] Open
Abstract
MOTIVATION With advances in metagenomic sequencing technologies, there are accumulating studies revealing the associations between the human gut microbiome and some human diseases. These associations shed light on using gut microbiome data to distinguish case and control samples of a specific disease, which is also called host disease status classification. Importantly, using learning-based models to distinguish the disease and control samples is expected to identify important biomarkers more accurately than abundance-based statistical analysis. However, available tools have not fully addressed two challenges associated with this task: limited labeled microbiome data and decreased accuracy in cross-studies. The confounding factors, such as the diet, technical biases in sample collection/sequencing across different studies/cohorts often jeopardize the generalization of the learning model. RESULTS To address these challenges, we develop a new tool GDmicro, which combines semi-supervised learning and domain adaptation to achieve a more generalized model using limited labeled samples. We evaluated GDmicro on human gut microbiome data from 11 cohorts covering 5 different diseases. The results show that GDmicro has better performance and robustness than state-of-the-art tools. In particular, it improves the AUC from 0.783 to 0.949 in identifying inflammatory bowel disease. Furthermore, GDmicro can identify potential biomarkers with greater accuracy than abundance-based statistical analysis methods. It also reveals the contribution of these biomarkers to the host's disease status. AVAILABILITY AND IMPLEMENTATION https://github.com/liaoherui/GDmicro.
Collapse
Affiliation(s)
- Herui Liao
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong (SAR), 518057, China
| | - Jiayu Shang
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong (SAR), 518057, China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong (SAR), 518057, China
| |
Collapse
|
8
|
Chang YS, Li CW, Chen L, Wang XA, Lee MS, Chao YH. Early Gut Microbiota Profile in Healthy Neonates: Microbiome Analysis of the First-Pass Meconium Using Next-Generation Sequencing Technology. CHILDREN (BASEL, SWITZERLAND) 2023; 10:1260. [PMID: 37508757 PMCID: PMC10377966 DOI: 10.3390/children10071260] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 07/18/2023] [Accepted: 07/20/2023] [Indexed: 07/30/2023]
Abstract
Gut microbiome development during early life has significant long-term effects on health later in life. The first-pass meconium is not sterile, and it is important to know the initial founder of the subsequent gut microbiome. However, there is limited data on the microbiota profile of the first-pass meconium in healthy neonates. To determine the early gut microbiota profile, we analyzed 39 samples of the first-pass meconium from healthy neonates using 16S rRNA sequencing. Our results showed a similar profile of the microbiota composition in the first-pass meconium samples. Pseudomonas was the most abundant genus in most samples. The evenness of the microbial communities in the first-pass meconium was extremely poor, and the average Shannon diversity index was 1.31. An analysis of the relationship between perinatal characteristics and the meconium microbiome revealed that primigravidae babies had a significantly higher Shannon diversity index (p = 0.041), and the Bacteroidales order was a biomarker for the first-pass meconium of these neonates. The Shannon diversity index was not affected by the mode of delivery, maternal intrapartum antibiotic treatment, prolonged rupture of membranes, or birth weight. Our study extends previous research with further characterization of the gut microbiome in very early life.
Collapse
Affiliation(s)
- Yi-Sheng Chang
- Department of Research and Development, AllBio Life Incorporation, Taichung 402, Taiwan
| | - Chang-Wei Li
- Department of Research and Development, AllBio Life Incorporation, Taichung 402, Taiwan
| | - Ling Chen
- Department of Research and Development, AllBio Life Incorporation, Taichung 402, Taiwan
| | - Xing-An Wang
- Department of Pediatrics, Chung Shan Medical University Hospital, Taichung 402, Taiwan
| | - Maw-Sheng Lee
- Department of Obstetrics and Gynecology, Lee Women's Hospital, Taichung 406, Taiwan
- School of Medicine, Chung Shan Medical University, Taichung 402, Taiwan
| | - Yu-Hua Chao
- Department of Pediatrics, Chung Shan Medical University Hospital, Taichung 402, Taiwan
- School of Medicine, Chung Shan Medical University, Taichung 402, Taiwan
- Department of Clinical Pathology, Chung Shan Medical University Hospital, Taichung 402, Taiwan
| |
Collapse
|
9
|
Wang Z, Huang P, You R, Sun F, Zhu S. MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities. Genome Biol 2023; 24:1. [PMID: 36609515 PMCID: PMC9817263 DOI: 10.1186/s13059-022-02832-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 12/05/2022] [Indexed: 01/09/2023] Open
Abstract
Binning aims to recover microbial genomes from metagenomic data. For complex metagenomic communities, the available binning methods are far from satisfactory, which usually do not fully use different types of features and important biological knowledge. We developed a novel ensemble binner, MetaBinner, which generates component results with multiple types of features by k-means and uses single-copy gene information for initialization. It then employs a two-stage ensemble strategy based on single-copy genes to integrate the component results efficiently and effectively. Extensive experimental results on three large-scale simulated datasets and one real-world dataset demonstrate that MetaBinner outperforms the state-of-the-art binners significantly.
Collapse
Affiliation(s)
- Ziye Wang
- grid.8547.e0000 0001 0125 2443The Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China ,grid.8547.e0000 0001 0125 2443School of Mathematical Science, Fudan University, Shanghai, China
| | - Pingqin Huang
- grid.8547.e0000 0001 0125 2443School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Ronghui You
- grid.8547.e0000 0001 0125 2443The Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China
| | - Fengzhu Sun
- grid.42505.360000 0001 2156 6853Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA
| | - Shanfeng Zhu
- grid.8547.e0000 0001 0125 2443The Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China ,grid.513236.0Shanghai Qi Zhi Institute, Shanghai, China ,grid.419897.a0000 0004 0369 313XKey Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China ,grid.8547.e0000 0001 0125 2443MOE Frontiers Center for Brain Science and Shanghai Institute of Artificial Intelligence Algorithms, Fudan University, Shanghai, China ,Zhangjiang Fudan International Innovation Center, Shanghai, China
| |
Collapse
|
10
|
Bai X, Ren J, Sun F. MLR-OOD: A Markov Chain Based Likelihood Ratio Method for Out-Of-Distribution Detection of Genomic Sequences. J Mol Biol 2022; 434:167586. [PMID: 35427634 PMCID: PMC10433695 DOI: 10.1016/j.jmb.2022.167586] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 04/05/2022] [Accepted: 04/05/2022] [Indexed: 12/23/2022]
Abstract
Machine learning or deep learning models have been widely used for taxonomic classification of metagenomic sequences and many studies reported high classification accuracy. Such models are usually trained based on sequences in several training classes in hope of accurately classifying unknown sequences into these classes. However, when deploying the classification models on real testing data sets, sequences that do not belong to any of the training classes may be present and are falsely assigned to one of the training classes with high confidence. Such sequences are referred to as out-of-distribution (OOD) sequences and are ubiquitous in metagenomic studies. To address this problem, we develop a deep generative model-based method, MLR-OOD, that measures the probability of a testing sequencing belonging to OOD by the likelihood ratio of the maximum of the in-distribution (ID) class conditional likelihoods and the Markov chain likelihood of the testing sequence measuring the sequence complexity. We compose three different microbial data sets consisting of bacterial, viral, and plasmid sequences for comprehensively benchmarking OOD detection methods. We show that MLR-OOD achieves the state-of-the-art performance demonstrating the generality of MLR-OOD to various types of microbial data sets. It is also shown that MLR-OOD is robust to the GC content, which is a major confounding effect for OOD detection of genomic sequences. In conclusion, MLR-OOD will greatly reduce false positives caused by OOD sequences in metagenomic sequence classification.
Collapse
Affiliation(s)
- Xin Bai
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Jie Ren
- Google Research, Brain Team, USA
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
11
|
Colorectal cancer: risk factors and potential of dietary probiotics in its prevention. PROCEEDINGS OF THE INDIAN NATIONAL SCIENCE ACADEMY 2022. [DOI: 10.1007/s43538-022-00083-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
12
|
Li X, Wang X, Huang R, Stucky A, Chen X, Sun L, Wen Q, Zeng Y, Fletcher H, Wang C, Xu Y, Cao H, Sun F, Li SC, Zhang X, Zhong JF. The Machine-Learning-Mediated Interface of Microbiome and Genetic Risk Stratification in Neuroblastoma Reveals Molecular Pathways Related to Patient Survival. Cancers (Basel) 2022; 14:cancers14122874. [PMID: 35740540 PMCID: PMC9220810 DOI: 10.3390/cancers14122874] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 05/23/2022] [Accepted: 05/30/2022] [Indexed: 02/01/2023] Open
Abstract
Simple Summary Neuroblastoma is a highly heterogeneous malignancy with a wide range of outcomes from spontaneous regression to fatal chemoresistant disease, as currently treated according to the risk stratification of the Children’s Oncology Group (COG), resulting in some high COG risk patients receiving excessive treatment, due to lacking predictors for treatment response. Here, we sought to complement COG risk classification by using the tumor intracellular microbiome, which is part of the tumor’s molecular signature. We determine that an intra-tumor microbial gene abundance score, namely M-score, separates the high COG-risk patients into two subpopulations (Mhigh and Mlow) with higher accuracy in risk stratification than the current COG risk assessment, thus sparing a subset of high COG-risk patients from being subjected to traditional high-risk therapies. Abstract Currently, most neuroblastoma patients are treated according to the Children’s Oncology Group (COG) risk group assignment; however, neuroblastoma’s heterogeneity renders only a few predictors for treatment response, resulting in excessive treatment. Here, we sought to couple COG risk classification with tumor intracellular microbiome, which is part of the molecular signature of a tumor. We determine that an intra-tumor microbial gene abundance score, namely M-score, separates the high COG-risk patients into two subpopulations (Mhigh and Mlow) with higher accuracy in risk stratification than the current COG risk assessment, thus sparing a subset of high COG-risk patients from being subjected to traditional high-risk therapies. Mechanistically, the classification power of M-scores implies the effect of CREB over-activation, which may influence the critical genes involved in cellular proliferation, anti-apoptosis, and angiogenesis, affecting tumor cell proliferation survival and metastasis. Thus, intracellular microbiota abundance in neuroblastoma regulates intracellular signals to affect patients’ survival.
Collapse
Affiliation(s)
- Xin Li
- Department of Basic Science, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (X.L.); (A.S.); (X.C.); (H.F.); (C.W.)
| | - Xiaoqi Wang
- Medical Center of Hematology, Xinqiao Hospital, State Key Laboratory of Trauma, Burn and Combined Injury, Army Medical University, Chongqing 400037, China; (X.W.); (R.H.); (Q.W.); (Y.Z.)
| | - Ruihao Huang
- Medical Center of Hematology, Xinqiao Hospital, State Key Laboratory of Trauma, Burn and Combined Injury, Army Medical University, Chongqing 400037, China; (X.W.); (R.H.); (Q.W.); (Y.Z.)
| | - Andres Stucky
- Department of Basic Science, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (X.L.); (A.S.); (X.C.); (H.F.); (C.W.)
| | - Xuelian Chen
- Department of Basic Science, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (X.L.); (A.S.); (X.C.); (H.F.); (C.W.)
| | - Lan Sun
- Department of Oncology, Bishan Hospital of Chongqing Medical University, the People’s Hospital of Bishan District, Chongqing 400037, China;
| | - Qin Wen
- Medical Center of Hematology, Xinqiao Hospital, State Key Laboratory of Trauma, Burn and Combined Injury, Army Medical University, Chongqing 400037, China; (X.W.); (R.H.); (Q.W.); (Y.Z.)
| | - Yunjing Zeng
- Medical Center of Hematology, Xinqiao Hospital, State Key Laboratory of Trauma, Burn and Combined Injury, Army Medical University, Chongqing 400037, China; (X.W.); (R.H.); (Q.W.); (Y.Z.)
| | - Hansel Fletcher
- Department of Basic Science, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (X.L.); (A.S.); (X.C.); (H.F.); (C.W.)
| | - Charles Wang
- Department of Basic Science, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (X.L.); (A.S.); (X.C.); (H.F.); (C.W.)
| | - Yi Xu
- Divisions of Hematology and Oncology and Regenerative Medicine, Department of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (Y.X.); (H.C.)
- Cancer Center of Loma Linda University, Loma Linda, CA 92350, USA
| | - Huynh Cao
- Divisions of Hematology and Oncology and Regenerative Medicine, Department of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (Y.X.); (H.C.)
- Cancer Center of Loma Linda University, Loma Linda, CA 92350, USA
| | - Fengzhu Sun
- Quantitative and Computational Biology Department, University of Southern California, Los Angeles, CA 90089, USA;
| | - Shengwen Calvin Li
- CHOC Children’s Research Institute, Children’s Hospital of Orange County (CHOC), 1201 La Veta Ave., Orange, CA 92868-3874, USA
- Department of Neurology, University of California—Irvine School of Medicine, 200 S. Manchester Ave. Ste. 206, Orange, CA 92868, USA
- Correspondence: (S.C.L.); (X.Z.); (J.F.Z.)
| | - Xi Zhang
- Medical Center of Hematology, Xinqiao Hospital, State Key Laboratory of Trauma, Burn and Combined Injury, Army Medical University, Chongqing 400037, China; (X.W.); (R.H.); (Q.W.); (Y.Z.)
- Correspondence: (S.C.L.); (X.Z.); (J.F.Z.)
| | - Jiang F. Zhong
- Department of Basic Science, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (X.L.); (A.S.); (X.C.); (H.F.); (C.W.)
- Cancer Center of Loma Linda University, Loma Linda, CA 92350, USA
- Correspondence: (S.C.L.); (X.Z.); (J.F.Z.)
| |
Collapse
|
13
|
Chen X, Zhu Z, Zhang W, Wang Y, Wang F, Yang J, Wong KC. Human disease prediction from microbiome data by multiple feature fusion and deep learning. iScience 2022; 25:104081. [PMID: 35372808 PMCID: PMC8971930 DOI: 10.1016/j.isci.2022.104081] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 09/16/2021] [Accepted: 03/13/2022] [Indexed: 10/29/2022] Open
Abstract
Human disease prediction from microbiome data has broad implications in metagenomics. It is rare for the existing methods to consider abundance profiles from both known and unknown microbial organisms, or capture the taxonomic relationships among microbial taxa, leading to significant information loss. On the other hand, deep learning has shown unprecedented advantages in classification tasks for its feature-learning ability. However, it encounters the opposite situation in metagenome-based disease prediction since high-dimensional low-sample-size metagenomic datasets can lead to severe overfitting; and black-box model fails in providing biological explanations. To circumvent the related problems, we developed MetaDR, a comprehensive machine learning-based framework that integrates various information and deep learning to predict human diseases. Experimental results indicate that MetaDR achieves competitive prediction performance with a reduction in running time, and effectively discovers the informative features with biological insights.
Collapse
Affiliation(s)
- Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Zifan Zhu
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Weitong Zhang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Yuchen Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.,Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| |
Collapse
|
14
|
Gao Y, Zhu Z, Sun F. Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data. Synth Syst Biotechnol 2022; 7:574-585. [PMID: 35155839 PMCID: PMC8801753 DOI: 10.1016/j.synbio.2022.01.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 12/14/2021] [Accepted: 01/19/2022] [Indexed: 12/14/2022] Open
Abstract
Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples. Although many studies have investigated this problem, there are no consensus on the optimal approaches for predicting disease status based on metagenomic samples. Using six human gut metagenomic datasets consisting of large numbers of colorectal cancer patients and healthy controls from different countries, we investigated different software packages for extracting relative abundances of known microbial genomes and for integrating mapping and assembly approaches to obtain the relative abundance profiles of both known and novel genomes. The random forests (RF) classification algorithm was then used to predict colorectal cancer status based on the microbial relative abundance profiles. Based on within data cross-validation and cross-dataset prediction, we show that the RF prediction performance using the microbial relative abundance profiles estimated by Centrifuge is generally higher than that using the microbial relative abundance profiles estimated by MetaPhlAn2 and Bracken. We also develop a novel method to integrate the relative abundance profiles of both known and novel microbial organisms to further increase the prediction performance for colorectal cancer from metagenomes.
Collapse
|
15
|
Ferravante C, Memoli D, Palumbo D, Ciaramella P, Di Loria A, D'Agostino Y, Nassa G, Rizzo F, Tarallo R, Weisz A, Giurato G. HOME-BIO (sHOtgun MEtagenomic analysis of BIOlogical entities): a specific and comprehensive pipeline for metagenomic shotgun sequencing data analysis. BMC Bioinformatics 2021; 22:106. [PMID: 34225648 PMCID: PMC8256542 DOI: 10.1186/s12859-021-04004-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 02/08/2021] [Indexed: 12/12/2022] Open
Abstract
Background Next-Generation-Sequencing (NGS) enables detection of microorganisms present in biological and other matrices of various origin and nature, allowing not only the identification of known phyla and strains but also the discovery of novel ones. The large amount of metagenomic shotgun data produced by NGS require comprehensive and user-friendly pipelines for data analysis, that speed up the bioinformatics steps, relieving the users from the need to manually perform complex and time-consuming tasks. Results We describe here HOME-BIO (sHOtgun MEtagenomic analysis of BIOlogical entities), an exhaustive pipeline for metagenomics data analysis, comprising three independent analytical modules designed for an inclusive analysis of large NGS datasets. Conclusions HOME-BIO is a powerful and easy-to-use tool that can be run also by users with limited computational expertise. It allows in-depth analyses by removing low-complexity/ problematic reads, integrating the analytical steps that lead to a comprehensive taxonomy profile of each sample by querying different source databases, and it is customizable according to specific users’ needs.
Collapse
Affiliation(s)
- Carlo Ferravante
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Via S. Allende, 1, 84081, Baronissi, SA, Italy.,Department of Veterinary Medicine and Animal Production, University of Naples Federico II, Via Delpino 1, 80137, Naples, Italy.,Genomix4Life, via S. Allende 43/L, 84081, Baronissi, SA, Italy
| | - Domenico Memoli
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Via S. Allende, 1, 84081, Baronissi, SA, Italy
| | - Domenico Palumbo
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Via S. Allende, 1, 84081, Baronissi, SA, Italy
| | - Paolo Ciaramella
- Department of Veterinary Medicine and Animal Production, University of Naples Federico II, Via Delpino 1, 80137, Naples, Italy
| | - Antonio Di Loria
- Department of Veterinary Medicine and Animal Production, University of Naples Federico II, Via Delpino 1, 80137, Naples, Italy
| | - Ylenia D'Agostino
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Via S. Allende, 1, 84081, Baronissi, SA, Italy
| | - Giovanni Nassa
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Via S. Allende, 1, 84081, Baronissi, SA, Italy
| | - Francesca Rizzo
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Via S. Allende, 1, 84081, Baronissi, SA, Italy
| | - Roberta Tarallo
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Via S. Allende, 1, 84081, Baronissi, SA, Italy
| | - Alessandro Weisz
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Via S. Allende, 1, 84081, Baronissi, SA, Italy. .,CRGS - Genome Research Center for Health, University of Salerno Campus of Medicine, 84081, Baronissi, SA, Italy.
| | - Giorgio Giurato
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Via S. Allende, 1, 84081, Baronissi, SA, Italy.
| |
Collapse
|
16
|
Nogueira T, Botelho A. Metagenomics and Other Omics Approaches to Bacterial Communities and Antimicrobial Resistance Assessment in Aquacultures. Antibiotics (Basel) 2021; 10:787. [PMID: 34203511 PMCID: PMC8300701 DOI: 10.3390/antibiotics10070787] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 06/20/2021] [Accepted: 06/22/2021] [Indexed: 12/21/2022] Open
Abstract
The shortage of wild fishery resources and the rising demand for human nutrition has driven a great expansion in aquaculture during the last decades in terms of production and economic value. As such, sustainable aquaculture production is one of the main priorities of the European Union's 2030 agenda. However, the intensification of seafood farming has resulted in higher risks of disease outbreaks and in the increased use of antimicrobials to control them. The selective pressure exerted by these drugs provides the ideal conditions for the emergence of antimicrobial resistance hotspots in aquaculture facilities. Omics technology is an umbrella term for modern technologies such as genomics, metagenomics, transcriptomics, proteomics, culturomics, and metabolomics. These techniques have received increasing recognition because of their potential to unravel novel mechanisms in biological science. Metagenomics allows the study of genomes in microbial communities contained within a certain environment. The potential uses of metagenomics in aquaculture environments include the study of microbial diversity, microbial functions, and antibiotic resistance genes. A snapshot of these high throughput technologies applied to microbial diversity and antimicrobial resistance studies in aquacultures will be presented in this review.
Collapse
Affiliation(s)
- Teresa Nogueira
- Laboratory of Bacteriology and Mycology, INIAV-National Institute for Agrarian and Veterinary Research, 2780-157 Oeiras, Portugal;
- cE3c-Centre for Ecology, Evolution and Environmental Changes, Evolutionary Ecology of Microorganisms Group, Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal
| | - Ana Botelho
- Laboratory of Bacteriology and Mycology, INIAV-National Institute for Agrarian and Veterinary Research, 2780-157 Oeiras, Portugal;
| |
Collapse
|
17
|
Computational Viromics: Applications of the Computational Biology in Viromics Studies. Virol Sin 2021; 36:1256-1260. [PMID: 34057678 PMCID: PMC8165334 DOI: 10.1007/s12250-021-00395-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 04/14/2021] [Indexed: 12/30/2022] Open
|
18
|
Wei ZG, Zhang XD, Cao M, Liu F, Qian Y, Zhang SW. Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences. Front Microbiol 2021; 12:644012. [PMID: 33841367 PMCID: PMC8024490 DOI: 10.3389/fmicb.2021.644012] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 02/17/2021] [Indexed: 12/31/2022] Open
Abstract
With the advent of next-generation sequencing technology, it has become convenient and cost efficient to thoroughly characterize the microbial diversity and taxonomic composition in various environmental samples. Millions of sequencing data can be generated, and how to utilize this enormous sequence resource has become a critical concern for microbial ecologists. One particular challenge is the OTUs (operational taxonomic units) picking in 16S rRNA sequence analysis. Lucky, this challenge can be directly addressed by sequence clustering that attempts to group similar sequences. Therefore, numerous clustering methods have been proposed to help to cluster 16S rRNA sequences into OTUs. However, each method has its clustering mechanism, and different methods produce diverse outputs. Even a slight parameter change for the same method can also generate distinct results, and how to choose an appropriate method has become a challenge for inexperienced users. A lot of time and resources can be wasted in selecting clustering tools and analyzing the clustering results. In this study, we introduced the recent advance of clustering methods for OTUs picking, which mainly focus on three aspects: (i) the principles of existing clustering algorithms, (ii) benchmark dataset construction for OTU picking and evaluation metrics, and (iii) the performance of different methods with various distance thresholds on benchmark datasets. This paper aims to assist biological researchers to select the reasonable clustering methods for analyzing their collected sequences and help algorithm developers to design more efficient sequences clustering methods.
Collapse
Affiliation(s)
- Ze-Gang Wei
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Xiao-Dan Zhang
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Ming Cao
- Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
- School of Mathematics and Statistics, Shaanxi Xueqian Normal University, Xi’an, China
| | - Fei Liu
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Yu Qian
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| |
Collapse
|
19
|
Liu Z, Ma A, Mathé E, Merling M, Ma Q, Liu B. Network analyses in microbiome based on high-throughput multi-omics data. Brief Bioinform 2021; 22:1639-1655. [PMID: 32047891 PMCID: PMC7986608 DOI: 10.1093/bib/bbaa005] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 01/07/2020] [Accepted: 01/08/2020] [Indexed: 02/06/2023] Open
Abstract
Together with various hosts and environments, ubiquitous microbes interact closely with each other forming an intertwined system or community. Of interest, shifts of the relationships between microbes and their hosts or environments are associated with critical diseases and ecological changes. While advances in high-throughput Omics technologies offer a great opportunity for understanding the structures and functions of microbiome, it is still challenging to analyse and interpret the omics data. Specifically, the heterogeneity and diversity of microbial communities, compounded with the large size of the datasets, impose a tremendous challenge to mechanistically elucidate the complex communities. Fortunately, network analyses provide an efficient way to tackle this problem, and several network approaches have been proposed to improve this understanding recently. Here, we systemically illustrate these network theories that have been used in biological and biomedical research. Then, we review existing network modelling methods of microbial studies at multiple layers from metagenomics to metabolomics and further to multi-omics. Lastly, we discuss the limitations of present studies and provide a perspective for further directions in support of the understanding of microbial communities.
Collapse
Affiliation(s)
- Zhaoqian Liu
- Department of Biomedical Informatics, College of Medicine, the Ohio State University, Columbus, OH 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, the Ohio State University, Columbus, OH 43210, USA
| | - Ewy Mathé
- Department of Biomedical Informatics, College of Medicine, the Ohio State University, Columbus, OH 43210, USA
| | - Marlena Merling
- Department of Biomedical Informatics, College of Medicine, the Ohio State University, Columbus, OH 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, the Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- Department of Biomedical Informatics, College of Medicine, the Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
20
|
Dalal N, Jalandra R, Sharma M, Prakash H, Makharia GK, Solanki PR, Singh R, Kumar A. Omics technologies for improved diagnosis and treatment of colorectal cancer: Technical advancement and major perspectives. Biomed Pharmacother 2020; 131:110648. [PMID: 33152902 DOI: 10.1016/j.biopha.2020.110648] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 08/09/2020] [Accepted: 08/16/2020] [Indexed: 12/11/2022] Open
Abstract
Colorectal cancer (CRC) ranks third among the most commonly occurring cancers worldwide, and it causes half a million deaths annually. Alongside mechanistic study for CRC detection and treatment by conventional techniques, new technologies have been developed to study CRC. These technologies include genomics, transcriptomics, proteomics, and metabolomics which elucidate DNA markers, RNA transcripts, protein and, metabolites produced inside the colon and rectum part of the gut. All these approaches form the omics arena, which presents a remarkable opportunity for the discovery of novel prognostic, diagnostic and therapeutic biomarkers and also delineate the underlying mechanism of CRC causation, which may further help in devising treatment strategies. This review also mentions the latest developments in metagenomics and culturomics as emerging evidence suggests that metagenomics of gut microbiota has profound implications in the causation, prognosis, and treatment of CRC. A majority of bacteria cannot be studied as they remain unculturable, so culturomics has also been strengthened to develop culture conditions suitable for the growth of unculturable bacteria and identify unknown bacteria. The overall purpose of this review is to succinctly evaluate the application of omics technologies in colorectal cancer research for improving the diagnosis and treatment strategies.
Collapse
Affiliation(s)
- Nishu Dalal
- Gene Regulation Laboratory, National Institute of Immunology, New Delhi 110067, India; Department of Environmental Science, Satyawati College, Delhi University, Delhi 110052, India
| | - Rekha Jalandra
- Gene Regulation Laboratory, National Institute of Immunology, New Delhi 110067, India; Department of Zoology, Maharshi Dayanand University, Rohtak 124001, India
| | - Minakshi Sharma
- Department of Zoology, Maharshi Dayanand University, Rohtak 124001, India
| | - Hridayesh Prakash
- Amity Institute of Virology and Immunology, Amity University, Sector 125, Noida 201313, Uttar Pradesh, India
| | - Govind K Makharia
- Department of Gastroenterology and Human Nutrition, All India Institute of Medical Sciences, New Delhi 110029, India
| | - Pratima R Solanki
- Special Centre for Nanoscience, Jawaharlal Nehru University, New Delhi 110067, India
| | - Rajeev Singh
- Department of Environmental Science, Satyawati College, Delhi University, Delhi 110052, India.
| | - Anil Kumar
- Gene Regulation Laboratory, National Institute of Immunology, New Delhi 110067, India.
| |
Collapse
|
21
|
Zhu Z, Ren J, Michail S, Sun F. Correction to: MicroPro: using metagenomic unmapped reads to provide insights into human microbiota and disease associations. Genome Biol 2019; 20:214. [PMID: 31640754 PMCID: PMC6805598 DOI: 10.1186/s13059-019-1826-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Affiliation(s)
- Zifan Zhu
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Jie Ren
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Sonia Michail
- Department of Pediatrics, Division of Gastroenterology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Fengzhu Sun
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
22
|
Knight R, Ley RE, Raes J, Grice EA. Expanding the scope and scale of microbiome research. Genome Biol 2019; 20:191. [PMID: 31488207 PMCID: PMC6729039 DOI: 10.1186/s13059-019-1804-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 08/27/2019] [Indexed: 11/10/2022] Open
Affiliation(s)
- Rob Knight
- Department of Pediatrics, University of California, Gilman Drive, La Jolla, San Diego, CA, 92093, USA. .,Center for Microbiome Innovation, Jacobs School of Engineering, University of California, Gilman Drive, La Jolla, San Diego, CA, 92093-0436, USA. .,Department of Computer Science and Engineering, University of California, Gilman Drive, La Jolla, San Diego, CA, 92093-0404, USA. .,Department of Bioengineering, University of California, La Jolla, San Diego, CA, 92093-0412, USA.
| | - Ruth E Ley
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Max Planck Ring, 72076, Tübingen, Germany
| | - Jeroen Raes
- Laboratory of Molecular Bacteriology, Department of Microbiology and Immunology, Rega Institute, KU Leuven, Herestraat, 3000, Leuven, Belgium.,VIB-KU Leuven Center for Microbiology, Campus Gasthuisberg, Rega Instituut, Herestraat, 3000, Leuven, Belgium
| | - Elizabeth A Grice
- Department of Dermatology and Microbiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| |
Collapse
|