1
|
Teixeira M, Silva F, Ferreira RM, Pereira T, Figueiredo C, Oliveira HP. A review of machine learning methods for cancer characterization from microbiome data. NPJ Precis Oncol 2024; 8:123. [PMID: 38816569 PMCID: PMC11139966 DOI: 10.1038/s41698-024-00617-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 05/17/2024] [Indexed: 06/01/2024] Open
Abstract
Recent studies have shown that the microbiome can impact cancer development, progression, and response to therapies suggesting microbiome-based approaches for cancer characterization. As cancer-related signatures are complex and implicate many taxa, their discovery often requires Machine Learning approaches. This review discusses Machine Learning methods for cancer characterization from microbiome data. It focuses on the implications of choices undertaken during sample collection, feature selection and pre-processing. It also discusses ML model selection, guiding how to choose an ML model, and model validation. Finally, it enumerates current limitations and how these may be surpassed. Proposed methods, often based on Random Forests, show promising results, however insufficient for widespread clinical usage. Studies often report conflicting results mainly due to ML models with poor generalizability. We expect that evaluating models with expanded, hold-out datasets, removing technical artifacts, exploring representations of the microbiome other than taxonomical profiles, leveraging advances in deep learning, and developing ML models better adapted to the characteristics of microbiome data will improve the performance and generalizability of models and enable their usage in the clinic.
Collapse
Affiliation(s)
- Marco Teixeira
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal.
- Faculty of Engineering, University of Porto, Porto, Portugal.
| | - Francisco Silva
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
- Faculty of Science, University of Porto, Porto, Portugal
| | - Rui M Ferreira
- Ipatimup - Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Instituto de Investigação e Inovação em Saúde, University of Porto, Porto, Portugal
| | - Tania Pereira
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
- Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| | - Ceu Figueiredo
- Ipatimup - Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Instituto de Investigação e Inovação em Saúde, University of Porto, Porto, Portugal
- Faculty of Medicine, University of Porto, Porto, Portugal
| | - Hélder P Oliveira
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
- Faculty of Science, University of Porto, Porto, Portugal
| |
Collapse
|
2
|
Feng J, Yang K, Liu X, Song M, Zhan P, Zhang M, Chen J, Liu J. Machine learning: a powerful tool for identifying key microbial agents associated with specific cancer types. PeerJ 2023; 11:e16304. [PMID: 37901464 PMCID: PMC10601900 DOI: 10.7717/peerj.16304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 09/26/2023] [Indexed: 10/31/2023] Open
Abstract
Machine learning (ML) includes a broad class of computer programs that improve with experience and shows unique strengths in performing tasks such as clustering, classification and regression. Over the past decade, microbial communities have been implicated in influencing the onset, progression, metastasis, and therapeutic response of multiple cancers. Host-microbe interaction may be a physiological pathway contributing to cancer development. With the accumulation of a large number of high-throughput data, ML has been successfully applied to the study of human cancer microbiomics in an attempt to reveal the complex mechanism behind cancer. In this review, we begin with a brief overview of the data sources included in cancer microbiomics studies. Then, the characteristics of the ML algorithm are briefly introduced. Secondly, the application progress of ML in cancer microbiomics is also reviewed. Finally, we highlight the challenges and future prospects facing ML in cancer microbiomics. On this basis, we conclude that the development of cancer microbiomics can not be achieved without ML, and that ML can be used to develop tumor-targeting microbial therapies, ultimately contributing to personalized and precision medicine.
Collapse
Affiliation(s)
- Jia Feng
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| | - Kailan Yang
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| | - Xuexue Liu
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| | - Min Song
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| | - Ping Zhan
- Department of Obstetrics, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, China
| | - Mi Zhang
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| | - Jinsong Chen
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| | - Jinbo Liu
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| |
Collapse
|
3
|
Pan P, Li J, Wang B, Tan X, Yin H, Han Y, Wang H, Shi X, Li X, Xie C, Chen L, Chen L, Bai Y, Li Z, Tian G. Molecular characterization of colorectal adenoma and colorectal cancer via integrated genomic transcriptomic analysis. Front Oncol 2023; 13:1067849. [PMID: 37546388 PMCID: PMC10401844 DOI: 10.3389/fonc.2023.1067849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 06/21/2023] [Indexed: 08/08/2023] Open
Abstract
Introduction Colorectal adenoma can develop into colorectal cancer. Determining the risk of tumorigenesis in colorectal adenoma would be critical for avoiding the development of colorectal cancer; however, genomic features that could help predict the risk of tumorigenesis remain uncertain. Methods In this work, DNA and RNA parallel capture sequencing data covering 519 genes from colorectal adenoma and colorectal cancer samples were collected. The somatic mutation profiles were obtained from DNA sequencing data, and the expression profiles were obtained from RNA sequencing data. Results Despite some similarities between the adenoma samples and the cancer samples, different mutation frequencies, co-occurrences, and mutually exclusive patterns were detected in the mutation profiles of patients with colorectal adenoma and colorectal cancer. Differentially expressed genes were also detected between the two patient groups using RNA sequencing. Finally, two random forest classification models were built, one based on mutation profiles and one based on expression profiles. The models distinguished adenoma and cancer samples with accuracy levels of 81.48% and 100.00%, respectively, showing the potential of the 519-gene panel for monitoring adenoma patients in clinical practice. Conclusion This study revealed molecular characteristics and correlations between colorectal adenoma and colorectal cancer, and it demonstrated that the 519-gene panel may be used for early monitoring of the progression of colorectal adenoma to cancer.
Collapse
Affiliation(s)
- Peng Pan
- Department of Gastroenterology, Shanghai Changhai Hospital, Shanghai, China
| | - Jingnan Li
- Department of Gastroenterology, Peking Union Medical College Hospital, Beijing, China
| | - Bo Wang
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Xiaoyan Tan
- Department of Gastroenterology, Maoming People's Hospital, Maoming, China
| | - Hekun Yin
- Department of Gastroenterology, Jiangmen Central Hospital, Jiangmen, China
| | - Yingmin Han
- Department of Bioinformatics, Boke Biotech Co., Ltd., Wuxi, China
| | - Haobin Wang
- Department of Bioinformatics, Boke Biotech Co., Ltd., Wuxi, China
| | - Xiaoli Shi
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Xiaoshuang Li
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Cuinan Xie
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Longfei Chen
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Lanyou Chen
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Yu Bai
- Department of Gastroenterology, Shanghai Changhai Hospital, Shanghai, China
| | - Zhaoshen Li
- Department of Gastroenterology, Shanghai Changhai Hospital, Shanghai, China
| | - Geng Tian
- Department of Bioinformatics, Boke Biotech Co., Ltd., Wuxi, China
| |
Collapse
|
4
|
Bostanci E, Kocak E, Unal M, Guzel MS, Acici K, Asuroglu T. Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer. SENSORS (BASEL, SWITZERLAND) 2023; 23:3080. [PMID: 36991790 PMCID: PMC10052105 DOI: 10.3390/s23063080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/09/2023] [Accepted: 03/11/2023] [Indexed: 06/19/2023]
Abstract
Data from omics studies have been used for prediction and classification of various diseases in biomedical and bioinformatics research. In recent years, Machine Learning (ML) algorithms have been used in many different fields related to healthcare systems, especially for disease prediction and classification tasks. Integration of molecular omics data with ML algorithms has offered a great opportunity to evaluate clinical data. RNA sequence (RNA-seq) analysis has been emerged as the gold standard for transcriptomics analysis. Currently, it is being used widely in clinical research. In our present work, RNA-seq data of extracellular vesicles (EV) from healthy and colon cancer patients are analyzed. Our aim is to develop models for prediction and classification of colon cancer stages. Five different canonical ML and Deep Learning (DL) classifiers are used to predict colon cancer of an individual with processed RNA-seq data. The classes of data are formed on the basis of both colon cancer stages and cancer presence (healthy or cancer). The canonical ML classifiers, which are k-Nearest Neighbor (kNN), Logistic Model Tree (LMT), Random Tree (RT), Random Committee (RC), and Random Forest (RF), are tested with both forms of the data. In addition, to compare the performance with canonical ML models, One-Dimensional Convolutional Neural Network (1-D CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM) DL models are utilized. Hyper-parameter optimizations of DL models are constructed by using genetic meta-heuristic optimization algorithm (GA). The best accuracy in cancer prediction is obtained with RC, LMT, and RF canonical ML algorithms as 97.33%. However, RT and kNN show 95.33% performance. The best accuracy in cancer stage classification is achieved with RF as 97.33%. This result is followed by LMT, RC, kNN, and RT with 96.33%, 96%, 94.66%, and 94%, respectively. According to the results of the experiments with DL algorithms, the best accuracy in cancer prediction is obtained with 1-D CNN as 97.67%. BiLSTM and LSTM show 94.33% and 93.67% performance, respectively. In classification of the cancer stages, the best accuracy is achieved with BiLSTM as 98%. 1-D CNN and LSTM show 97% and 94.33% performance, respectively. The results reveal that both canonical ML and DL models may outperform each other for different numbers of features.
Collapse
Affiliation(s)
- Erkan Bostanci
- Department of Computer Engineering, Faculty of Engineering, Ankara University, 06830 Ankara, Turkey
| | - Engin Kocak
- Department of Analytical Chemistry, Faculty of Gülhane Pharmacy, University of Health Sciences, 06018 Ankara, Turkey
| | - Metehan Unal
- Department of Computer Engineering, Faculty of Engineering, Ankara University, 06830 Ankara, Turkey
| | - Mehmet Serdar Guzel
- Department of Computer Engineering, Faculty of Engineering, Ankara University, 06830 Ankara, Turkey
| | - Koray Acici
- Department of Artificial Intelligence and Data Engineering, Faculty of Engineering, Ankara University, 06830 Ankara, Turkey
| | - Tunc Asuroglu
- Faculty of Medicine and Health Technology, Tampere University, 33720 Tampere, Finland
| |
Collapse
|
5
|
Cakmak A, Ayaz H, Arıkan S, Ibrahimzada AR, Demirkol Ş, Sönmez D, Hakan MT, Sürmen ST, Horozoğlu C, Doğan MB, Küçükhüseyin Ö, Cacına C, Kıran B, Zeybek Ü, Baysan M, Yaylım İ. Predicting the predisposition to colorectal cancer based on SNP profiles of immune phenotypes using supervised learning models. Med Biol Eng Comput 2023; 61:243-258. [PMID: 36357628 DOI: 10.1007/s11517-022-02707-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 10/22/2022] [Indexed: 11/12/2022]
Abstract
This study explores the machine learning-based assessment of predisposition to colorectal cancer based on single nucleotide polymorphisms (SNP). Such a computational approach may be used as a risk indicator and an auxiliary diagnosis method that complements the traditional methods such as biopsy and CT scan. Moreover, it may be used to develop a low-cost screening test for the early detection of colorectal cancers to improve public health. We employ several supervised classification algorithms. Besides, we apply data imputation to fill in the missing genotype values. The employed dataset includes SNPs observed in particular colorectal cancer-associated genomic loci that are located within DNA regions of 11 selected genes obtained from 115 individuals. We make the following observations: (i) random forest-based classifier using one-hot encoding and K-nearest neighbor (KNN)-based imputation performs the best among the studied classifiers with an F1 score of 89% and area under the curve (AUC) score of 0.96. (ii) One-hot encoding together with K-nearest neighbor-based data imputation increases the F1 scores by around 26% in comparison to the baseline approach which does not employ them. (iii) The proposed model outperforms a commonly employed state-of-the-art approach, ColonFlag, under all evaluated settings by up to 24% in terms of the AUC score. Based on the high accuracy of the constructed predictive models, the studied 11 genes may be considered a gene panel candidate for colon cancer risk screening.
Collapse
Affiliation(s)
- Ali Cakmak
- Department of Computer Engineering, Istanbul Technical University, Ayazaga Campus, Reşitpaşa, 34467, Sarıyer, Istanbul, Turkey.
| | | | - Soykan Arıkan
- Başakşehir Çam and Sakura City Hospital, Istanbul, Turkey
| | | | | | - Dilara Sönmez
- Aziz Sancar Institute of Experimental Medicine, Istanbul University, Istanbul, Turkey
| | - Mehmet T Hakan
- Aziz Sancar Institute of Experimental Medicine, Istanbul University, Istanbul, Turkey
| | - Saime T Sürmen
- Aziz Sancar Institute of Experimental Medicine, Istanbul University, Istanbul, Turkey
| | | | - Mehmet B Doğan
- Istanbul Research and Training Hospital, Istanbul, Turkey
| | - Özlem Küçükhüseyin
- Aziz Sancar Institute of Experimental Medicine, Istanbul University, Istanbul, Turkey
| | - Canan Cacına
- Aziz Sancar Institute of Experimental Medicine, Istanbul University, Istanbul, Turkey
| | | | - Ümit Zeybek
- Aziz Sancar Institute of Experimental Medicine, Istanbul University, Istanbul, Turkey
| | - Mehmet Baysan
- Department of Computer Engineering, Istanbul Technical University, Ayazaga Campus, Reşitpaşa, 34467, Sarıyer, Istanbul, Turkey
| | - İlhan Yaylım
- Aziz Sancar Institute of Experimental Medicine, Istanbul University, Istanbul, Turkey
| |
Collapse
|
6
|
Liu X, Yuan P, Li R, Zhang D, An J, Ju J, Liu C, Ren F, Hou R, Li Y, Yang J. Predicting breast cancer recurrence and metastasis risk by integrating color and texture features of histopathological images and machine learning technologies. Comput Biol Med 2022; 146:105569. [DOI: 10.1016/j.compbiomed.2022.105569] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/24/2022] [Accepted: 04/25/2022] [Indexed: 12/11/2022]
|
7
|
Lin YC, Salleb-Aouissi A, Hooven TA. Interpretable prediction of necrotizing enterocolitis from machine learning analysis of premature infant stool microbiota. BMC Bioinformatics 2022; 23:104. [PMID: 35337258 PMCID: PMC8953333 DOI: 10.1186/s12859-022-04618-w] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 02/23/2022] [Indexed: 12/18/2022] Open
Abstract
Background Necrotizing enterocolitis (NEC) is a common, potentially catastrophic intestinal disease among very low birthweight premature infants. Affecting up to 15% of neonates born weighing less than 1500 g, NEC causes sudden-onset, progressive intestinal inflammation and necrosis, which can lead to significant bowel loss, multi-organ injury, or death. No unifying cause of NEC has been identified, nor is there any reliable biomarker that indicates an individual patient’s risk of the disease. Without a way to predict NEC in advance, the current medical strategy involves close clinical monitoring in an effort to treat babies with NEC as quickly as possible before irrecoverable intestinal damage occurs. In this report, we describe a novel machine learning application for generating dynamic, individualized NEC risk scores based on intestinal microbiota data, which can be determined from sequencing bacterial DNA from otherwise discarded infant stool. A central insight that differentiates our work from past efforts was the recognition that disease prediction from stool microbiota represents a specific subtype of machine learning problem known as multiple instance learning (MIL). Results We used a neural network-based MIL architecture, which we tested on independent datasets from two cohorts encompassing 3595 stool samples from 261 at-risk infants. Our report also introduces a new concept called the “growing bag” analysis, which applies MIL over time, allowing incorporation of past data into each new risk calculation. This approach allowed early, accurate NEC prediction, with a mean sensitivity of 86% and specificity of 90%. True-positive NEC predictions occurred an average of 8 days before disease onset. We also demonstrate that an attention-gated mechanism incorporated into our MIL algorithm permits interpretation of NEC risk, identifying several bacterial taxa that past work has associated with NEC, and potentially pointing the way toward new hypotheses about NEC pathogenesis. Our system is flexible, accepting microbiota data generated from targeted 16S or “shotgun” whole-genome DNA sequencing. It performs well in the setting of common, potentially confounding preterm neonatal clinical events such as perinatal cardiopulmonary depression, antibiotic administration, feeding disruptions, or transitions between breast feeding and formula. Conclusions We have developed and validated a robust MIL-based system for NEC prediction from harmlessly collected premature infant stool. While this system was developed for NEC prediction, our MIL approach may also be applicable to other diseases characterized by changes in the human microbiota. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04618-w.
Collapse
Affiliation(s)
- Yun Chao Lin
- Department of Computer Science, Columbia University, 1214 Amsterdam Ave., Mailcode 0401, New York, 10027, USA
| | - Ansaf Salleb-Aouissi
- Department of Computer Science, Columbia University, 1214 Amsterdam Ave., Mailcode 0401, New York, 10027, USA.
| | - Thomas A Hooven
- Department of Pediatrics, University of Pittsburgh School of Medicine, Pittsburgh, USA.,Richard King Mellon Institute for Pediatric Research, UPMC Children's Hospital of Pittsburgh, Pittsburgh, USA
| |
Collapse
|
8
|
Yu T, Su S, Hu J, Zhang J, Xianyu Y. A New Strategy for Microbial Taxonomic Identification through Micro-Biosynthetic Gold Nanoparticles and Machine Learning. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2022; 34:e2109365. [PMID: 34989446 DOI: 10.1002/adma.202109365] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 12/28/2021] [Indexed: 06/14/2023]
Abstract
Microorganisms can serve as biological factories for the synthesis of inorganic nanomaterials that can become useful as nanocatalysts, energy-harvesting-storage components, antibacterial agents, and biomedical materials. Herein, the development of biosynthesis of inorganic nanomaterials into a simple, stable, and accurate strategy for distinguishing microorganisms from multiple classification levels (i.e., kingdom, order, genus, and species) without gene amplification, biochemical testing, or target recognition is reported. Gold nanoparticles (AuNPs) biosynthesized by different microorganisms differ in color of the solution, and their features can be characterized, including the particle size, the surface plasmon resonance (SPR) spectrum, and the surface potential. The inter-relation between the features of micro-biosynthetic AuNPs and the classification of microorganisms are exploited at different levels through machine learning to establish a taxonomic model. This model agrees well with traditional classification methods that offers a new strategy for microbial taxonomic identification. The underlying mechanism of this strategy is related to the biomolecules produced by different microorganisms including glucose, glutathione, and nicotinamide adenine dinucleotide phosphate-dependent reductase that regulate the features of micro-biosynthetic AuNPs. This work broadens the application of biosynthesis of inorganic materials through micro-biosynthetic AuNPs and machine learning, which holds great promise as a tool for biomedical research.
Collapse
Affiliation(s)
- Ting Yu
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, 310058, China
| | - Shixuan Su
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, 310058, China
| | - Jing Hu
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, 310058, China
| | - Jun Zhang
- Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Yunlei Xianyu
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, 310058, China
- Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou, Zhejiang, 310058, China
- Ningbo Research Institute, Zhejiang University, Ningbo, Zhejiang, 315100, China
| |
Collapse
|
9
|
Qi X, Zuo J, Yan D, Hu G, Wang R, Chen J, Fu J. A NOD-Like Receptor Signaling-Based Gene Signature Identified as a
Novel Prognostic Biomarker for Predicting Overall Survival of Colorectal
Cancer Patients. Curr Bioinform 2022. [DOI: 10.2174/1574893616666211005122422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Colorectal Cancer (CRC) is the most frequently diagnosed gastrointestinal
tract malignant tumor worldwide, which is closely associated with distant metastasis and poor prognosis.
Due to high degree of heterogeneity, reliable prognostic biomarkers are urgently needed to guide the
therapeutic intervention of CRC patients.
Objective:
The present study aimed to develop a NOD-Like Receptors (NLRs) signaling-based gene
signature that can successfully predict the overall survival of CRC patients.
Methods:
Firstly, differentially expressed NLR signaling-related genes were identified between primary
and metastatic human CRC samples. Genes with prognostic value were then screened through univariate
Cox regression analysis. Next, the NLR signaling-based prognostic signature was constructed by
LASSO-penalized Cox regression analysis, and its predictive ability was further confirmed in an independent
cohort. Furthermore, functional studies including GO, GSEA, ssGSEA and chemotherapeutic
response analyses were performed to explore the role of the NLR signaling-based signature in CRC
pathogenesis and therapy.
Results:
The established prognostic signature that consisted of 7 NLR signaling-related genes can effectively
stratify the high-risk and low-risk CRC patients in both training and validation cohorts. Moreover,
the signature proved to be an independent indicator of overall survival in CRC patients. Functional annotation
and chemotherapeutic response analyses showed that the signature was closely associated with
immune status and chemotherapeutic sensitivity of CRC patients.
Conclusion:
The novel NLR signaling-based gene signature could serve as a potential tool for survival
prediction and therapeutic evaluation, thereby contributing to the personalized prognostic management
of CRC patients.
Collapse
Affiliation(s)
- Xin Qi
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, 215011 Suzhou, China
| | - Jiachen Zuo
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, 215011 Suzhou, China
| | - Donghui Yan
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, 215011 Suzhou, China
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, 215123 Suzhou, China
| | - Rui Wang
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, 215011 Suzhou, China
| | - Jiajia Chen
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, 215011 Suzhou, China
| | - Jiaolong Fu
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, 215011 Suzhou, China
| |
Collapse
|
10
|
Chen X, Liu L, Zhang W, Yang J, Wong KC. Human host status inference from temporal microbiome changes via recurrent neural networks. Brief Bioinform 2021; 22:6307015. [PMID: 34151933 DOI: 10.1093/bib/bbab223] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 04/21/2021] [Accepted: 04/21/2021] [Indexed: 01/04/2023] Open
Abstract
With the rapid increase in sequencing data, human host status inference (e.g. healthy or sick) from microbiome data has become an important issue. Existing studies are mostly based on single-point microbiome composition, while it is rare that the host status is predicted from longitudinal microbiome data. However, single-point-based methods cannot capture the dynamic patterns between the temporal changes and host status. Therefore, it remains challenging to build good predictive models as well as scaling to different microbiome contexts. On the other hand, existing methods are mainly targeted for disease prediction and seldom investigate other host statuses. To fill the gap, we propose a comprehensive deep learning-based framework that utilizes longitudinal microbiome data as input to infer the human host status. Specifically, the framework is composed of specific data preparation strategies and a recurrent neural network tailored for longitudinal microbiome data. In experiments, we evaluated the proposed method on both semi-synthetic and real datasets based on different sequencing technologies and metagenomic contexts. The results indicate that our method achieves robust performance compared to other baseline and state-of-the-art classifiers and provides a significant reduction in prediction time.
Collapse
Affiliation(s)
- Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| | - Lingjing Liu
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| | - Weitong Zhang
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Kowloon, Hong Kong SAR
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| |
Collapse
|
11
|
Wei ZG, Zhang XD, Cao M, Liu F, Qian Y, Zhang SW. Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences. Front Microbiol 2021; 12:644012. [PMID: 33841367 PMCID: PMC8024490 DOI: 10.3389/fmicb.2021.644012] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 02/17/2021] [Indexed: 12/31/2022] Open
Abstract
With the advent of next-generation sequencing technology, it has become convenient and cost efficient to thoroughly characterize the microbial diversity and taxonomic composition in various environmental samples. Millions of sequencing data can be generated, and how to utilize this enormous sequence resource has become a critical concern for microbial ecologists. One particular challenge is the OTUs (operational taxonomic units) picking in 16S rRNA sequence analysis. Lucky, this challenge can be directly addressed by sequence clustering that attempts to group similar sequences. Therefore, numerous clustering methods have been proposed to help to cluster 16S rRNA sequences into OTUs. However, each method has its clustering mechanism, and different methods produce diverse outputs. Even a slight parameter change for the same method can also generate distinct results, and how to choose an appropriate method has become a challenge for inexperienced users. A lot of time and resources can be wasted in selecting clustering tools and analyzing the clustering results. In this study, we introduced the recent advance of clustering methods for OTUs picking, which mainly focus on three aspects: (i) the principles of existing clustering algorithms, (ii) benchmark dataset construction for OTU picking and evaluation metrics, and (iii) the performance of different methods with various distance thresholds on benchmark datasets. This paper aims to assist biological researchers to select the reasonable clustering methods for analyzing their collected sequences and help algorithm developers to design more efficient sequences clustering methods.
Collapse
Affiliation(s)
- Ze-Gang Wei
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Xiao-Dan Zhang
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Ming Cao
- Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
- School of Mathematics and Statistics, Shaanxi Xueqian Normal University, Xi’an, China
| | - Fei Liu
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Yu Qian
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| |
Collapse
|
12
|
Pet-Human Gut Microbiome Host Classifier Using Data from Different Studies. Microorganisms 2020; 8:microorganisms8101591. [PMID: 33076521 PMCID: PMC7602744 DOI: 10.3390/microorganisms8101591] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 10/07/2020] [Accepted: 10/09/2020] [Indexed: 12/30/2022] Open
Abstract
(1) Background: microbiome host classification can be used to identify sources of contamination in environmental data. However, there is no ready-to-use host classifier. Here, we aimed to build a model that would be able to discriminate between pet and human microbiomes samples. The challenge of the study was to build a classifier using data solely from publicly available studies that normally contain sequencing data for only one type of host. (2) Results: we have developed a random forest model that distinguishes human microbiota from domestic pet microbiota (cats and dogs) with 97% accuracy. In order to prevent overfitting, samples from several (at least four) different projects were necessary. Feature importance analysis revealed that the model relied on several taxa known to be key components in domestic cat and dog microbiomes (such as Fusobacteriaceae and Peptostreptococcaeae), as well as on some taxa exclusively found in humans (as Akkermansiaceae). (3) Conclusion: we have shown that it is possible to make a reliable pet/human gut microbiome classifier on the basis of the data collected from different studies.
Collapse
|
13
|
Jiang Y, Song H, Jiang L, Qiao Y, Yang D, Wang D, Li J. Silybin Prevents Prostate Cancer by Inhibited the ALDH1A1 Expression in the Retinol Metabolism Pathway. Front Cell Dev Biol 2020; 8:574394. [PMID: 32984354 PMCID: PMC7487981 DOI: 10.3389/fcell.2020.574394] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 08/14/2020] [Indexed: 12/31/2022] Open
Abstract
Background Silybin was known to exert inhibition in prostate cancer, but the underlying mechanism remained largely unknown. This study was designed to find out the potential target of Silybin on prostate cancer and explore the relative mechanisms. Methods Firstly, we screened the possible targets of Silybin through the PubChem database and Subpathway – GM. Then DU145 cells were transferred to investigate the correction about related targets, magnetic bead sorting and flow cytometry were used to sort and identify the cells. Proliferation, migration and invasion ability of DU145 cells were detected by MTT assay, Transwell assay, plate clonality and sphere formation assay. BALB/c nude mice were constructed models with implanted sarcoma and measured the tumor volume every 5 days as wells tumor weight. The levels of proteins were detected by Western blot and immunocytochemistry. RT-PCR was selected to test the expression of protein’s mRNA. Results It was screened out the ALDH1A1 was highly correlated with subpathways of the Silybin risk metabolic pathway. And ALDH1A1 expression was positively correlated RARα with Ets1 by interfering with the ALDH1A1 gene. Importantly, ALDH1A1(+) cells showed proliferation, migration and invasion ability. In addition, it showed that Silybin exerted the inhibition on prostate cells by suppressed the proliferation, migration and invasion ability of cells in vitro experiment. Silybin also reduced the tumor volume and weight. And Silybin displayed obviously reduced the proteins and mRNA of ALDH1A1, RARα, Ets1 and MMP9 expressions. Conclusion Our results indicated that Silybin showed inhibition of prostate cancer and the mechanism was involving with downregulating ALDH1A1 expression, thereby inhibiting the activation of RARα and preventing the activation of Ets1 to inhibit the growth and invasion of prostate cancer.
Collapse
Affiliation(s)
- Ying Jiang
- College of Basic Medicine, Heilongjiang University of Chinese Medicine, Harbin, China
| | - Hanbing Song
- The First Affiliated Hospital, Heilongjiang University of Chinese Medicine, Harbin, China
| | - Ling Jiang
- College of Basic Medicine, Heilongjiang University of Chinese Medicine, Harbin, China
| | - Yu Qiao
- College of Basic Medicine, Heilongjiang University of Chinese Medicine, Harbin, China
| | - Dan Yang
- College of Basic Medicine, Heilongjiang University of Chinese Medicine, Harbin, China
| | - Donghua Wang
- Department of General Surgery, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Ji Li
- College of Basic Medicine, Heilongjiang University of Chinese Medicine, Harbin, China
| |
Collapse
|
14
|
Tang J, Wang Y, Luo Y, Fu J, Zhang Y, Li Y, Xiao Z, Lou Y, Qiu Y, Zhu F. Computational advances of tumor marker selection and sample classification in cancer proteomics. Comput Struct Biotechnol J 2020; 18:2012-2025. [PMID: 32802273 PMCID: PMC7403885 DOI: 10.1016/j.csbj.2020.07.009] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 07/06/2020] [Accepted: 07/08/2020] [Indexed: 12/11/2022] Open
Abstract
Cancer proteomics has become a powerful technique for characterizing the protein markers driving transformation of malignancy, tracing proteome variation triggered by therapeutics, and discovering the novel targets and drugs for the treatment of oncologic diseases. To facilitate cancer diagnosis/prognosis and accelerate drug target discovery, a variety of methods for tumor marker identification and sample classification have been developed and successfully applied to cancer proteomic studies. This review article describes the most recent advances in those various approaches together with their current applications in cancer-related studies. Firstly, a number of popular feature selection methods are overviewed with objective evaluation on their advantages and disadvantages. Secondly, these methods are grouped into three major classes based on their underlying algorithms. Finally, a variety of sample separation algorithms are discussed. This review provides a comprehensive overview of the advances on tumor maker identification and patients/samples/tissues separations, which could be guidance to the researches in cancer proteomics.
Collapse
Key Words
- ANN, Artificial Neural Network
- ANOVA, Analysis of Variance
- CFS, Correlation-based Feature Selection
- Cancer proteomics
- Computational methods
- DAPC, Discriminant Analysis of Principal Component
- DT, Decision Trees
- EDA, Estimation of Distribution Algorithm
- FC, Fold Change
- GA, Genetic Algorithms
- GR, Gain Ratio
- HC, Hill Climbing
- HCA, Hierarchical Cluster Analysis
- IG, Information Gain
- LDA, Linear Discriminant Analysis
- LIMMA, Linear Models for Microarray Data
- MBF, Markov Blanket Filter
- MWW, Mann–Whitney–Wilcoxon test
- OPLS-DA, Orthogonal Partial Least Squares Discriminant Analysis
- PCA, Principal Component Analysis
- PLS-DA, Partial Least Square Discriminant Analysis
- RF, Random Forest
- RF-RFE, Random Forest with Recursive Feature Elimination
- SA, Simulated Annealing
- SAM, Significance Analysis of Microarrays
- SBE, Sequential Backward Elimination
- SFS, and Sequential Forward Selection
- SOM, Self-organizing Map
- SU, Symmetrical Uncertainty
- SVM, Support Vector Machine
- SVM-RFE, Support Vector Machine with Recursive Feature Elimination
- Sample classification
- Tumor marker selection
- sPLSDA, Sparse Partial Least Squares Discriminant Analysis
- t-SNE, Student t Distribution
- χ2, Chi-square
Collapse
Affiliation(s)
- Jing Tang
- Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yang Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,School of Pharmaceutical Sciences and Innovative Drug Research Centre, Chongqing University, Chongqing 401331, China
| | - Yi Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziyu Xiao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yan Lou
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Yunqing Qiu
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Feng Zhu
- Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
15
|
Tang J, Mou M, Wang Y, Luo Y, Zhu F. MetaFS: Performance assessment of biomarker discovery in metaproteomics. Brief Bioinform 2020; 22:5854399. [PMID: 32510556 DOI: 10.1093/bib/bbaa105] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 04/17/2020] [Accepted: 05/05/2020] [Indexed: 12/19/2022] Open
Abstract
Metaproteomics suffers from the issues of dimensionality and sparsity. Data reduction methods can maximally identify the relevant subset of significant differential features and reduce data redundancy. Feature selection (FS) methods were applied to obtain the significant differential subset. So far, a variety of feature selection methods have been developed for metaproteomic study. However, due to FS's performance depended heavily on the data characteristics of a given research, the well-suitable feature selection method must be carefully selected to obtain the reproducible differential proteins. Moreover, it is critical to evaluate the performance of each FS method according to comprehensive criteria, because the single criterion is not sufficient to reflect the overall performance of the FS method. Therefore, we developed an online tool named MetaFS, which provided 13 types of FS methods and conducted the comprehensive evaluation on the complex FS methods using four widely accepted and independent criteria. Furthermore, the function and reliability of MetaFS were systematically tested and validated via two case studies. In sum, MetaFS could be a distinguished tool for discovering the overall well-performed FS method for selecting the potential biomarkers in microbiome studies. The online tool is freely available at https://idrblab.org/metafs/.
Collapse
|
16
|
Hooven TA, Lin AYC, Salleb-Aouissi A. Multiple Instance Learning for Predicting Necrotizing Enterocolitis in Premature Infants Using Microbiome Data. PROCEEDINGS OF THE ACM CONFERENCE ON HEALTH, INFERENCE, AND LEARNING 2020; 2020:99-109. [PMID: 34318306 PMCID: PMC8313028 DOI: 10.1145/3368555.3384466] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Necrotizing enterocolitis (NEC) is a life-threatening intestinal disease that primarily affects preterm infants during their first weeks after birth. Mortality rates associated with NEC are 15-30%, and surviving infants are susceptible to multiple serious, long-term complications. The disease is sporadic and, with currently available tools, unpredictable. We are creating an early warning system that uses stool microbiome features, combined with clinical and demographic information, to identify infants at high risk of developing NEC. Our approach uses a multiple instance learning, neural network-based system that could be used to generate daily or weekly NEC predictions for premature infants. The approach was selected to effectively utilize sparse and weakly annotated datasets characteristic of stool microbiome analysis. Here we describe initial validation of our system, using clinical and microbiome data from a nested case-control study of 161 preterm infants. We show receiver-operator curve areas above 0.9, with 75% of dominant predictive samples for NEC-affected infants identified at least 24 hours prior to disease onset. Our results pave the way for development of a real-time early warning system for NEC using a limited set of basic clinical and demographic details combined with stool microbiome data.
Collapse
|
17
|
Zhang ZM, Tan JX, Wang F, Dao FY, Zhang ZY, Lin H. Early Diagnosis of Hepatocellular Carcinoma Using Machine Learning Method. Front Bioeng Biotechnol 2020; 8:254. [PMID: 32292778 PMCID: PMC7122481 DOI: 10.3389/fbioe.2020.00254] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/12/2020] [Indexed: 12/18/2022] Open
Abstract
Hepatocellular carcinoma (HCC) is a serious cancer which ranked the fourth in cancer-related death worldwide. Hence, more accurate diagnostic models are urgently needed to aid the early HCC diagnosis under clinical scenarios and thus improve HCC treatment and survival. Several conventional methods have been used for discriminating HCC from cirrhosis tissues in patients without HCC (CwoHCC). However, the recognition successful rates are still far from satisfactory. In this study, we applied a computational approach that based on machine learning method to a set of microarray data generated from 1091 HCC samples and 242 CwoHCC samples. The within-sample relative expression orderings (REOs) method was used to extract numerical descriptors from gene expression profiles datasets. After removing the unrelated features by using maximum redundancy minimum relevance (mRMR) with incremental feature selection, we achieved “11-gene-pair” which could produce outstanding results. We further investigated the discriminate capability of the “11-gene-pair” for HCC recognition on several independent datasets. The wonderful results were obtained, demonstrating that the selected gene pairs can be signature for HCC. The proposed computational model can discriminate HCC and adjacent non-cancerous tissues from CwoHCC even for minimum biopsy specimens and inaccurately sampled specimens, which can be practical and effective for aiding the early HCC diagnosis at individual level.
Collapse
Affiliation(s)
- Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiu-Xin Tan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhao-Yue Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|