1
|
Bakir-Gungor B, Temiz M, Inal Y, Cicekyurt E, Yousef M. CCPred: Global and population-specific colorectal cancer prediction and metagenomic biomarker identification at different molecular levels using machine learning techniques. Comput Biol Med 2024; 182:109098. [PMID: 39293338 DOI: 10.1016/j.compbiomed.2024.109098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 08/29/2024] [Accepted: 08/31/2024] [Indexed: 09/20/2024]
Abstract
Colorectal cancer (CRC) ranks as the third most common cancer globally and the second leading cause of cancer-related deaths. Recent research highlights the pivotal role of the gut microbiota in CRC development and progression. Understanding the complex interplay between disease development and metagenomic data is essential for CRC diagnosis and treatment. Current computational models employ machine learning to identify metagenomic biomarkers associated with CRC, yet there is a need to improve their accuracy through a holistic biological knowledge perspective. This study aims to evaluate CRC-associated metagenomic data at species, enzymes, and pathway levels via conducting global and population-specific analyses. These analyses utilize relative abundance values from human gut microbiome sequencing data and robust classification models are built for disease prediction and biomarker identification. For global CRC prediction and biomarker identification, the features that are identified by SelectKBest (SKB), Information Gain (IG), and Extreme Gradient Boosting (XGBoost) methods are combined. Population-based analysis includes within-population, leave-one-dataset-out (LODO) and cross-population approaches. Four classification algorithms are employed for CRC classification. Random Forest achieved an AUC of 0.83 for species data, 0.78 for enzyme data and 0.76 for pathway data globally. On the global scale, potential taxonomic biomarkers include ruthenibacterium lactatiformanas; enzyme biomarkers include RNA 2' 3' cyclic 3' phosphodiesterase; and pathway biomarkers include pyruvate fermentation to acetone pathway. This study underscores the potential of machine learning models trained on metagenomic data for improved disease prediction and biomarker discovery. The proposed model and associated files are available at https://github.com/TemizMus/CCPRED.
Collapse
Affiliation(s)
- Burcu Bakir-Gungor
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, 38080, Turkey
| | - Mustafa Temiz
- Department of Electrical and Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, 38080, Turkey.
| | - Yasin Inal
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, 38080, Turkey
| | - Emre Cicekyurt
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, 38080, Turkey
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, 13206, Israel; Galilee Digital Health Research Center (GDH), Zefat Academic College, Israel
| |
Collapse
|
2
|
Xiao Y, Tan M, Song J, Huang Y, Lv M, Liao M, Yu Z, Gao Z, Qu S, Liang W. Developmental validation of an mRNA kit: A 5-dye multiplex assay designed for body-fluid identification. Forensic Sci Int Genet 2024; 71:103045. [PMID: 38615496 DOI: 10.1016/j.fsigen.2024.103045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 03/25/2024] [Accepted: 03/29/2024] [Indexed: 04/16/2024]
Abstract
Identifying the sources of biosamples found at crime scenes is crucial for forensic investigations. Among the markers used for body fluid identification (BFI), mRNA has emerged as a well-studied marker because of its high specificity and remarkable stability. Despite this potential, commercially available mRNA kits specifically designed for BFI are lacking. Therefore, we developed an mRNA kit that includes 21 specific mRNA markers of body fluids, along with three housekeeping genes for BFI, to identify four forensic-relevant fluids (blood, semen, saliva, and vaginal fluids). In this study, we tested 451 single-body-fluid samples, validated the universality of the mRNA kit, and obtained a gene expression profile. We performed the validation studies in triplicates and determined the sensitivity, specificity, stability, precision, and repeatability of the mRNA kit. The sensitivity of the kit was found to be 0.1 ng. Our validation process involved the examination of 59 RNA mixtures, 60 body fluids mixtures, and 20 casework samples, which further established the reliability of the kit. Furthermore, we constructed five classifiers that can handle single-body fluids and mixtures using this kit. The classifiers output possibility values and identify the specific body fluids of interest. Our results showed the reliability and suitability of the BFI kit, and the Random Forest classifier performed the best, with 94% precision. In conclusion, we developed an mRNA kit for BFI which can be a promising tool for forensic practice.
Collapse
Affiliation(s)
- Yuanyuan Xiao
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, Sichuan 610041, PR China
| | - Mengyu Tan
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, Sichuan 610041, PR China
| | - Jinlong Song
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, Sichuan 610041, PR China
| | - Yihang Huang
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, Sichuan 610041, PR China
| | - Meili Lv
- Department of Immunology, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, Sichuan 610041, PR China
| | - Miao Liao
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, Sichuan 610041, PR China
| | - Zailiang Yu
- Suzhou Microread Genetics Co.,Ltd, Suzhou, Jiangsu, PR China
| | - Zhixiao Gao
- Suzhou Microread Genetics Co.,Ltd, Suzhou, Jiangsu, PR China
| | - Shengqiu Qu
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, Sichuan 610041, PR China.
| | - Weibo Liang
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, Sichuan 610041, PR China.
| |
Collapse
|
3
|
Roy G, Prifti E, Belda E, Zucker JD. Deep learning methods in metagenomics: a review. Microb Genom 2024; 10:001231. [PMID: 38630611 PMCID: PMC11092122 DOI: 10.1099/mgen.0.001231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/27/2024] [Indexed: 04/19/2024] Open
Abstract
The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome plays a crucial role in human health, providing vital information for patient diagnosis and prognosis. However, analysing metagenomic data remains challenging due to several factors, including reference catalogues, sparsity and compositionality. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification and disease prediction. Beyond generating predictive models, a key aspect of these methods is also their interpretability. This article reviews DL approaches in metagenomics, including convolutional networks, autoencoders and attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the microbiome's key role in our health.
Collapse
Affiliation(s)
- Gaspar Roy
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
| | - Edi Prifti
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Eugeni Belda
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Jean-Daniel Zucker
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| |
Collapse
|
4
|
Venugopal G, Khan ZH, Dash R, Tulsian V, Agrawal S, Rout S, Mahajan P, Ramadass B. Predictive association of gut microbiome and NLR in anemic low middle-income population of Odisha- a cross-sectional study. Front Nutr 2023; 10:1200688. [PMID: 37528994 PMCID: PMC10390256 DOI: 10.3389/fnut.2023.1200688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 06/27/2023] [Indexed: 08/03/2023] Open
Abstract
Background Iron is abundant on earth but not readily available for colonizing bacteria due to its low solubility in the human body. Hosts and microbiota compete fiercely for iron. <15% Supplemented Iron is absorbed in the small bowel, and the remaining iron is a source of dysbiosis. The gut microbiome signatures to the level of predicting anemia among low-middle-income populations are unknown. The present study was conducted to identify gut microbiome signatures that have predictive potential in association with Neutrophil to lymphocytes ratio (NLR) and Mean corpuscular volume (MCV) in anemia. Methods One hundred and four participants between 10 and 70 years were recruited from Odisha's Low Middle-Income (LMI) rural population. Hematological parameters such as Hemoglobin (HGB), NLR, and MCV were measured, and NLR was categorized using percentiles. The microbiome signatures were analyzed from 61 anemic and 43 non-anemic participants using 16 s rRNA sequencing, followed by the Bioinformatics analysis performed to identify the diversity, correlations, and indicator species. The Multi-Layered Perceptron Neural Network (MLPNN) model were applied to predict anemia. Results Significant microbiome diversity among anemic participants was observed between the lower, middle, and upper Quartile NLR groups. For anemic participants with NLR in the lower quartile, alpha indices indicated bacterial overgrowth, and consistently, we identified R. faecis and B. uniformis were predominating. Using ROC analysis, R. faecis had better distinction (AUC = 0.803) to predict anemia with lower NLR. In contrast, E. biforme and H. parainfluenzae were indicators of the NLR in the middle and upper quartile, respectively. While in Non-anemic participants with low MCV, the bacterial alteration was inversely related to gender. Furthermore, our Multi-Layered Perceptron Neural Network (MLPNN) models also provided 89% accuracy in predicting Anemic or Non-Anemic from the top 20 OTUs, HGB level, NLR, MCV, and indicator species. Conclusion These findings strongly associate anemic hematological parameters and microbiome. Such predictive association between the gut microbiome and NLR could be further evaluated and utilized to design precision nutrition models and to predict Iron supplementation and dietary intervention responses in both community and clinical settings.
Collapse
Affiliation(s)
- Giriprasad Venugopal
- Center of Excellence for Clinical Microbiome Research (CCMR), All India Institute of Medical Sciences (AIIMS), Bhubaneswar, Odisha, India
| | - Zaiba Hasan Khan
- Center of Excellence for Clinical Microbiome Research (CCMR), All India Institute of Medical Sciences (AIIMS), Bhubaneswar, Odisha, India
| | - Rishikesh Dash
- Center of Excellence for Clinical Microbiome Research (CCMR), All India Institute of Medical Sciences (AIIMS), Bhubaneswar, Odisha, India
| | - Vinay Tulsian
- Center of Excellence for Clinical Microbiome Research (CCMR), All India Institute of Medical Sciences (AIIMS), Bhubaneswar, Odisha, India
| | - Siwani Agrawal
- Department of Biochemistry, All India Institute of Medical Sciences, Bhubaneswar, Odisha, India
| | - Sudeshna Rout
- Department of Biochemistry, All India Institute of Medical Sciences, Bhubaneswar, Odisha, India
| | - Preetam Mahajan
- Department of Community Medicine and Family Medicine, All India Institute of Medical Sciences (AIIMS), Bhubaneswar, Odisha, India
| | - Balamurugan Ramadass
- Center of Excellence for Clinical Microbiome Research (CCMR), All India Institute of Medical Sciences (AIIMS), Bhubaneswar, Odisha, India
- Department of Biochemistry, All India Institute of Medical Sciences, Bhubaneswar, Odisha, India
- Adelaide Medical School Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA, Australia
| |
Collapse
|
5
|
Shtossel O, Isakov H, Turjeman S, Koren O, Louzoun Y. Ordering taxa in image convolution networks improves microbiome-based machine learning accuracy. Gut Microbes 2023; 15:2224474. [PMID: 37345233 PMCID: PMC10288916 DOI: 10.1080/19490976.2023.2224474] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 06/08/2023] [Indexed: 06/23/2023] Open
Abstract
The human gut microbiome is associated with a large number of disease etiologies. As such, it is a natural candidate for machine-learning-based biomarker development for multiple diseases and conditions. The microbiome is often analyzed using 16S rRNA gene sequencing or shotgun metagenomics. However, several properties of microbial sequence-based studies hinder machine learning (ML), including non-uniform representation, a small number of samples compared with the dimension of each sample, and sparsity of the data, with the majority of taxa present in a small subset of samples. We show here using a graph representation that the cladogram structure is as informative as the taxa frequency. We then suggest a novel method to combine information from different taxa and improve data representation for ML using microbial taxonomy. iMic (image microbiome) translates the microbiome to images through an iterative ordering scheme, and applies convolutional neural networks to the resulting image. We show that iMic has a higher precision in static microbiome gene sequence-based ML than state-of-the-art methods. iMic also facilitates the interpretation of the classifiers through an explainable artificial intelligence (AI) algorithm to iMic to detect taxa relevant to each condition. iMic is then extended to dynamic microbiome samples by translating them to movies.
Collapse
Affiliation(s)
- Oshrit Shtossel
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Haim Isakov
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Sondra Turjeman
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Omry Koren
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Yoram Louzoun
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| |
Collapse
|
6
|
Wang H, Zhao S, Cheng Y, Bi S, Zhu X. MTDeepM6A-2S: A two-stage multi-task deep learning method for predicting RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Front Microbiol 2022; 13:999506. [PMID: 36274691 PMCID: PMC9579691 DOI: 10.3389/fmicb.2022.999506] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/16/2022] [Indexed: 11/13/2022] Open
Abstract
N6-methyladenosine (m6A) is one of the most important RNA modifications, which is involved in many biological activities. Computational methods have been developed to detect m6A sites due to their high efficiency and low costs. As one of the most widely utilized model organisms, many methods have been developed for predicting m6A sites of Saccharomyces cerevisiae. However, the generalization of these methods was hampered by the limited size of the benchmark datasets. On the other hand, over 60,000 low resolution m6A sites and more than 10,000 base resolution m6A sites of Saccharomyces cerevisiae are recorded in RMBase and m6A-Atlas, respectively. The base resolution m6A sites are often obtained from low resolution results by post calibration. In view of these, we proposed a two-stage deep learning method, named MTDeepM6A-2S, to predict RNA m6A sites of Saccharomyces cerevisiae based on RNA sequence information. In the first stage, a multi-task model with convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) deep framework was built to not only detect the low resolution m6A sites but also assign a reasonable probability for the predicted site. In the second stage, a transfer-learning strategy was used to build the model to predict the base resolution m6A sites from those low resolution m6A sites. The effectiveness of our model was validated on both training and independent test sets. The results show that our model outperforms other state-of-the-art models on the independent test set, which indicates that our model holds high potential to become a useful tool for epitranscriptomics analysis.
Collapse
|
7
|
Bai X, Ren J, Sun F. MLR-OOD: A Markov Chain Based Likelihood Ratio Method for Out-Of-Distribution Detection of Genomic Sequences. J Mol Biol 2022; 434:167586. [PMID: 35427634 PMCID: PMC10433695 DOI: 10.1016/j.jmb.2022.167586] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 04/05/2022] [Accepted: 04/05/2022] [Indexed: 12/23/2022]
Abstract
Machine learning or deep learning models have been widely used for taxonomic classification of metagenomic sequences and many studies reported high classification accuracy. Such models are usually trained based on sequences in several training classes in hope of accurately classifying unknown sequences into these classes. However, when deploying the classification models on real testing data sets, sequences that do not belong to any of the training classes may be present and are falsely assigned to one of the training classes with high confidence. Such sequences are referred to as out-of-distribution (OOD) sequences and are ubiquitous in metagenomic studies. To address this problem, we develop a deep generative model-based method, MLR-OOD, that measures the probability of a testing sequencing belonging to OOD by the likelihood ratio of the maximum of the in-distribution (ID) class conditional likelihoods and the Markov chain likelihood of the testing sequence measuring the sequence complexity. We compose three different microbial data sets consisting of bacterial, viral, and plasmid sequences for comprehensively benchmarking OOD detection methods. We show that MLR-OOD achieves the state-of-the-art performance demonstrating the generality of MLR-OOD to various types of microbial data sets. It is also shown that MLR-OOD is robust to the GC content, which is a major confounding effect for OOD detection of genomic sequences. In conclusion, MLR-OOD will greatly reduce false positives caused by OOD sequences in metagenomic sequence classification.
Collapse
Affiliation(s)
- Xin Bai
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Jie Ren
- Google Research, Brain Team, USA
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
8
|
Borgman J, Stark K, Carson J, Hauser L. Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data. FRONTIERS IN BIOINFORMATICS 2022; 2:871256. [PMID: 36304316 PMCID: PMC9580936 DOI: 10.3389/fbinf.2022.871256] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 05/30/2022] [Indexed: 11/18/2022] Open
Abstract
We present a novel approach for rapidly identifying sequences that leverages the representational power of Deep Learning techniques and is applied to the analysis of microbiome data. The method involves the creation of a latent sequence space, training a convolutional neural network to rapidly identify sequences by mapping them into that space, and we leverage the novel encoded latent space for denoising to correct sequencing errors. Using mock bacterial communities of known composition, we show that this approach achieves single nucleotide resolution, generating results for sequence identification and abundance estimation that match the best available microbiome algorithms in terms of accuracy while vastly increasing the speed of accurate processing. We further show the ability of this approach to support phenotypic prediction at the sample level on an experimental data set for which the ground truth for sequence identities and abundances is unknown, but the expected phenotypes of the samples are definitive. Moreover, this approach offers a potential solution for the analysis of data from other types of experiments that currently rely on computationally intensive sequence identification.
Collapse
|
9
|
Yogesh MJ, Karthikeyan J. Health Informatics: Engaging Modern Healthcare Units: A Brief Overview. Front Public Health 2022; 10:854688. [PMID: 35570921 PMCID: PMC9099090 DOI: 10.3389/fpubh.2022.854688] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 03/31/2022] [Indexed: 11/13/2022] Open
Abstract
In the current scenario, with a large amount of unstructured data, Health Informatics is gaining traction, allowing Healthcare Units to leverage and make meaningful insights for doctors and decision-makers with relevant information to scale operations and predict the future view of treatments via Information Systems Communication. Now, around the world, massive amounts of data are being collected and analyzed for better patient diagnosis and treatment, improving public health systems and assisting government agencies in designing and implementing public health policies, instilling confidence in future generations who want to use better public health systems. This article provides an overview of the HL7 FHIR Architecture, including the workflow state, linkages, and various informatics approaches used in healthcare units. The article discusses future trends and directions in Health Informatics for successful application to provide public health safety. With the advancement of technology, healthcare units face new issues that must be addressed with appropriate adoption policies and standards.
Collapse
Affiliation(s)
- M. J. Yogesh
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | | |
Collapse
|
10
|
Bakir-Gungor B, Hacılar H, Jabeer A, Nalbantoglu OU, Aran O, Yousef M. Inflammatory bowel disease biomarkers of human gut microbiota selected via different feature selection methods. PeerJ 2022; 10:e13205. [PMID: 35497193 PMCID: PMC9048649 DOI: 10.7717/peerj.13205] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 03/10/2022] [Indexed: 01/12/2023] Open
Abstract
The tremendous boost in next generation sequencing and in the "omics" technologies makes it possible to characterize the human gut microbiome-the collective genomes of the microbial community that reside in our gastrointestinal tract. Although some of these microorganisms are considered to be essential regulators of our immune system, the alteration of the complexity and eubiotic state of microbiota might promote autoimmune and inflammatory disorders such as diabetes, rheumatoid arthritis, Inflammatory bowel diseases (IBD), obesity, and carcinogenesis. IBD, comprising Crohn's disease and ulcerative colitis, is a gut-related, multifactorial disease with an unknown etiology. IBD presents defects in the detection and control of the gut microbiota, associated with unbalanced immune reactions, genetic mutations that confer susceptibility to the disease, and complex environmental conditions such as westernized lifestyle. Although some existing studies attempt to unveil the composition and functional capacity of the gut microbiome in relation to IBD diseases, a comprehensive picture of the gut microbiome in IBD patients is far from being complete. Due to the complexity of metagenomic studies, the applications of the state-of-the-art machine learning techniques became popular to address a wide range of questions in the field of metagenomic data analysis. In this regard, using IBD associated metagenomics dataset, this study utilizes both supervised and unsupervised machine learning algorithms, (i) to generate a classification model that aids IBD diagnosis, (ii) to discover IBD-associated biomarkers, (iii) to discover subgroups of IBD patients using k-means and hierarchical clustering approaches. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), min redundancy max relevance (mRMR), Select K Best (SKB), Information Gain (IG) and Extreme Gradient Boosting (XGBoost). In our experiments with 100-fold Monte Carlo cross-validation (MCCV), XGBoost, IG, and SKB methods showed a considerable effect in terms of minimizing the microbiota used for the diagnosis of IBD and thus reducing the cost and time. We observed that compared to Decision Tree, Support Vector Machine, Logitboost, Adaboost, and stacking ensemble classifiers, our Random Forest classifier resulted in better performance measures for the classification of IBD. Our findings revealed potential microbiome-mediated mechanisms of IBD and these findings might be useful for the development of microbiome-based diagnostics.
Collapse
Affiliation(s)
- Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Hilal Hacılar
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Amhar Jabeer
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | | | - Oya Aran
- TETAM, Bogazici University, Istanbul, Turkey
| | - Malik Yousef
- Zefat Academic College, Zefat, Israel,Galilee Digital Health Research Center, Zefat Academic College, Zefat, Israel
| |
Collapse
|
11
|
Feng Y, Cheng Z, Wei X, Chen M, Zhang J, Zhang Y, Xue L, Chen M, Li F, Shang Y, Liang T, Ding Y, Wu Q. Novel method for rapid identification of Listeria monocytogenes based on metabolomics and deep learning. Food Control 2022. [DOI: 10.1016/j.foodcont.2022.109042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|
12
|
Zhou J, Ye Y, Jiang J. Kernel principal components based cascade forest towards disease identification with human microbiota. BMC Med Inform Decis Mak 2021; 21:360. [PMID: 34949186 PMCID: PMC8697468 DOI: 10.1186/s12911-021-01705-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 11/30/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome, i.e., inflammation, obesity, HIV, and diabetes. Through supervised classification, it is feasible to determine the human disease states by revealing the intestinal microbiota compositional information. However, the abundance matrix of microbiome data is so sparse, an interpretable deep model is crucial to further represent and mine the data for expansion, such as the deep forest model. What's more, overfitting can still exist in the original deep forest model when dealing with such "large p, small n" biology data. Feature reduction is considered to improve the ensemble forest model especially towards the disease identification in the human microbiota. METHODS In this work, we propose the kernel principal components based cascade forest method, so-called KPCCF, to classify the disease states of patients by using taxonomic profiles of the microbiome at the family level. In detail, the kernel principal components analysis method is first used to reduce the original dimension of human microbiota datasets. Besides, the processed data is fed into the cascade forest to preliminarily discriminate against the disease state of the samples. RESULTS The proposed KPCCF algorithm can represent the small-scale and high-dimension human microbiota datasets with the sparse feature matrix. Systematic comparison experiments demonstrate that our method consistently outperforms the state-of-the-art methods with the comparative study on 4 datasets. CONCLUSION Despite sharing some common characteristics, a one-size-fits-all solution does not exist in any space. The traditional depth model has limitations in the biological application of the unbalanced scale between small samples and high dimensions. KPCCF distinguishes from the standard deep forest model for its excellent performance in the microbiota field. Additionally, compared to other dimensionality reduction methods, the kernel principal components analysis method is more suitable for microbiota datasets.
Collapse
Affiliation(s)
- Jiayu Zhou
- National University of Defense Technology, Changsha, China
- Naval Submarine Academy, Qingdao, China
| | - Yanqing Ye
- Consulting Center for Strategic Assessment, Academy of Military Sciences, Beijing, China
| | - Jiang Jiang
- National University of Defense Technology, Changsha, China
| |
Collapse
|
13
|
Curry KD, Nute MG, Treangen TJ. It takes guts to learn: machine learning techniques for disease detection from the gut microbiome. Emerg Top Life Sci 2021; 5:815-827. [PMID: 34779841 PMCID: PMC8786294 DOI: 10.1042/etls20210213] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 09/29/2021] [Accepted: 10/06/2021] [Indexed: 02/01/2023]
Abstract
Associations between the human gut microbiome and expression of host illness have been noted in a variety of conditions ranging from gastrointestinal dysfunctions to neurological deficits. Machine learning (ML) methods have generated promising results for disease prediction from gut metagenomic information for diseases including liver cirrhosis and irritable bowel disease, but have lacked efficacy when predicting other illnesses. Here, we review current ML methods designed for disease classification from microbiome data. We highlight the computational challenges these methods have effectively overcome and discuss the biological components that have been overlooked to offer perspectives on future work in this area.
Collapse
Affiliation(s)
- Kristen D. Curry
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Michael G. Nute
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Todd J. Treangen
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| |
Collapse
|
14
|
Ling W, Qi Y, Hua X, Wu MC. Deep ensemble learning over the microbial phylogenetic tree (DeepEn-Phy). PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2021; 2021:470-477. [PMID: 36704639 PMCID: PMC9875567 DOI: 10.1109/bibm52615.2021.9669654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Successful prediction of clinical outcomes facilitates tailored diagnosis and treatment. The microbiome has been shown to be an important biomarker to predict host clinical outcomes. Further, the incorporation of microbial phylogeny, the evolutionary relationship among microbes, has been demonstrated to improve prediction accuracy. We propose a phylogeny-driven deep neural network (PhyNN) and develop an ensemble method, DeepEn-Phy, for host clinical outcome prediction. The method is designed to optimally extract features from phylogeny, thereby take full advantage of the information in phylogeny while harnessing the core principles of phylogeny (in contrast to taxonomy). We apply DeepEn-Phy to a real large microbiome data set to predict both categorical and continuous clinical outcomes. DeepEn-Phy demonstrates superior prediction performance to existing machine learning and deep learning approaches. Overall, DeepEn-Phy provides a new strategy for designing deep neural network architectures within the context of phylogeny-constrained microbiome data.
Collapse
Affiliation(s)
- Wodan Ling
- Fred Hutchinson, Cancer Research Center, Seattle, USA
| | | | - Xing Hua
- Fred Hutchinson, Cancer Research Center, Seattle, USA
| | - Michael C. Wu
- Fred Hutchinson, Cancer Research Center, Seattle, USA
| |
Collapse
|
15
|
Deng Z, Zhang J, Li J, Zhang X. Application of Deep Learning in Plant-Microbiota Association Analysis. Front Genet 2021; 12:697090. [PMID: 34691142 PMCID: PMC8531731 DOI: 10.3389/fgene.2021.697090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 08/31/2021] [Indexed: 01/04/2023] Open
Abstract
Unraveling the association between microbiome and plant phenotype can illustrate the effect of microbiome on host and then guide the agriculture management. Adequate identification of species and appropriate choice of models are two challenges in microbiome data analysis. Computational models of microbiome data could help in association analysis between the microbiome and plant host. The deep learning methods have been widely used to learn the microbiome data due to their powerful strength of handling the complex, sparse, noisy, and high-dimensional data. Here, we review the analytic strategies in the microbiome data analysis and describe the applications of deep learning models for plant–microbiome correlation studies. We also introduce the application cases of different models in plant–microbiome correlation analysis and discuss how to adapt the models on the critical steps in data processing. From the aspect of data processing manner, model structure, and operating principle, most deep learning models are suitable for the plant microbiome data analysis. The ability of feature representation and pattern recognition is the advantage of deep learning methods in modeling and interpretation for association analysis. Based on published computational experiments, the convolutional neural network and graph neural networks could be recommended for plant microbiome analysis.
Collapse
Affiliation(s)
- Zhiyu Deng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Jinming Zhang
- Department of Infectious Diseases, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Junya Li
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China
| |
Collapse
|
16
|
Zhao Z, Woloszynek S, Agbavor F, Mell JC, Sokhansanj BA, Rosen GL. Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network. PLoS Comput Biol 2021; 17:e1009345. [PMID: 34550967 PMCID: PMC8496832 DOI: 10.1371/journal.pcbi.1009345] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/07/2021] [Accepted: 08/12/2021] [Indexed: 01/04/2023] Open
Abstract
Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).
Collapse
Affiliation(s)
- Zhengqiao Zhao
- Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Stephen Woloszynek
- Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Felix Agbavor
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Joshua Chang Mell
- College of Medicine, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Bahrad A. Sokhansanj
- Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Gail L. Rosen
- Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
17
|
Hassanzadeh HR, Wang MD. An Integrated Deep Network for Cancer Survival Prediction Using Omics Data. Front Big Data 2021; 4:568352. [PMID: 34337396 PMCID: PMC8322661 DOI: 10.3389/fdata.2021.568352] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 06/01/2021] [Indexed: 12/22/2022] Open
Abstract
As a highly sophisticated disease that humanity faces, cancer is known to be associated with dysregulation of cellular mechanisms in different levels, which demands novel paradigms to capture informative features from different omics modalities in an integrated way. Successful stratification of patients with respect to their molecular profiles is a key step in precision medicine and in tailoring personalized treatment for critically ill patients. In this article, we use an integrated deep belief network to differentiate high-risk cancer patients from the low-risk ones in terms of the overall survival. Our study analyzes RNA, miRNA, and methylation molecular data modalities from both labeled and unlabeled samples to predict cancer survival and subsequently to provide risk stratification. To assess the robustness of our novel integrative analytics, we utilize datasets of three cancer types with 836 patients and show that our approach outperforms the most successful supervised and semi-supervised classification techniques applied to the same cancer prediction problems. In addition, despite the preconception that deep learning techniques require large size datasets for proper training, we have illustrated that our model can achieve better results for moderately sized cancer datasets.
Collapse
Affiliation(s)
- Hamid Reza Hassanzadeh
- School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA, United States
| | - May D. Wang
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, United States
| |
Collapse
|
18
|
Chen X, Liu L, Zhang W, Yang J, Wong KC. Human host status inference from temporal microbiome changes via recurrent neural networks. Brief Bioinform 2021; 22:6307015. [PMID: 34151933 DOI: 10.1093/bib/bbab223] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 04/21/2021] [Accepted: 04/21/2021] [Indexed: 01/04/2023] Open
Abstract
With the rapid increase in sequencing data, human host status inference (e.g. healthy or sick) from microbiome data has become an important issue. Existing studies are mostly based on single-point microbiome composition, while it is rare that the host status is predicted from longitudinal microbiome data. However, single-point-based methods cannot capture the dynamic patterns between the temporal changes and host status. Therefore, it remains challenging to build good predictive models as well as scaling to different microbiome contexts. On the other hand, existing methods are mainly targeted for disease prediction and seldom investigate other host statuses. To fill the gap, we propose a comprehensive deep learning-based framework that utilizes longitudinal microbiome data as input to infer the human host status. Specifically, the framework is composed of specific data preparation strategies and a recurrent neural network tailored for longitudinal microbiome data. In experiments, we evaluated the proposed method on both semi-synthetic and real datasets based on different sequencing technologies and metagenomic contexts. The results indicate that our method achieves robust performance compared to other baseline and state-of-the-art classifiers and provides a significant reduction in prediction time.
Collapse
Affiliation(s)
- Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| | - Lingjing Liu
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| | - Weitong Zhang
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Kowloon, Hong Kong SAR
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| |
Collapse
|
19
|
Lin Y, Wang G, Yu J, Sung JJY. Artificial intelligence and metagenomics in intestinal diseases. J Gastroenterol Hepatol 2021; 36:841-847. [PMID: 33880764 DOI: 10.1111/jgh.15501] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 02/24/2021] [Accepted: 03/18/2021] [Indexed: 12/12/2022]
Abstract
Gut microbiota has been shown to associate with the development of gastrointestinal diseases. In the last decade, development in whole metagenome sequencing and 16S rRNA sequencing technology has dramatically accelerated the gut microbiome's research and revealed its association with gastrointestinal disorders. Because of high dimensionality and complexity's intrinsic data characteristics, traditional bioinformatical methods could only explain the most significant changes with limited prediction accuracy. In contrast, machine learning is the application of artificial intelligence that provides the computational systems to automatically learn and improve from experience (training cohort) without being explicitly programmed. It is thus capable of unwiring high dimensionality and complicated correlational hitches. With modern computation power, machine learning is widely utilized to analyze microorganisms related to disease onset and other clinical features. It could help explore and identify novel biomarkers or improve the accuracy rate of disease diagnostic. This review summarized the most recent research that utilized machine learning to reveal the role of gut microbiota in intestinal disorders.
Collapse
Affiliation(s)
- Yufeng Lin
- Institute of Digestive Disease and Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Guoping Wang
- Institute of Digestive Disease and Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Jun Yu
- Institute of Digestive Disease and Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Joseph J Y Sung
- Institute of Digestive Disease and Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong
| |
Collapse
|
20
|
Ovur SE, Zhou X, Qi W, Zhang L, Hu Y, Su H, Ferrigno G, De Momi E. A novel autonomous learning framework to enhance sEMG-based hand gesture recognition using depth information. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102444] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
21
|
Reiman D, Farhat AM, Dai Y. Predicting Host Phenotype Based on Gut Microbiome Using a Convolutional Neural Network Approach. Methods Mol Biol 2021; 2190:249-266. [PMID: 32804370 DOI: 10.1007/978-1-0716-0826-5_12] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Accurate prediction of the host phenotypes from a microbial sample and identification of the associated microbial markers are important in understanding the impact of the microbiome on the pathogenesis and progression of various diseases within the host. A deep learning tool, PopPhy-CNN, has been developed for the task of predicting host phenotypes using a convolutional neural network (CNN). By representing samples as annotated taxonomic trees and further representing these trees as matrices, PopPhy-CNN utilizes the CNN's innate ability to explore locally similar microbes on the taxonomic tree. Furthermore, PopPhy-CNN can be used to evaluate the importance of each taxon in the prediction of host status. Here, we describe the underlying methodology, architecture, and core utility of PopPhy-CNN. We also demonstrate the use of PopPhy-CNN on a microbial dataset.
Collapse
Affiliation(s)
- Derek Reiman
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, USA
| | - Ali M Farhat
- College of Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Yang Dai
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, USA.
| |
Collapse
|
22
|
Reiman D, Metwally AA, Sun J, Dai Y. PopPhy-CNN: A Phylogenetic Tree Embedded Architecture for Convolutional Neural Networks to Predict Host Phenotype From Metagenomic Data. IEEE J Biomed Health Inform 2020; 24:2993-3001. [PMID: 32396115 DOI: 10.1109/jbhi.2020.2993761] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Accurate prediction of the host phenotype from a metagenomic sample and identification of the associated microbial markers are important in understanding potential host-microbiome interactions related to disease initiation and progression. We introduce PopPhy-CNN, a novel convolutional neural network (CNN) learning framework that effectively exploits phylogenetic structure in microbial taxa for host phenotype prediction. Our approach takes an input format of a 2D matrix representing the phylogenetic tree populated with the relative abundance of microbial taxa in a metagenomic sample. This conversion empowers CNNs to explore the spatial relationship of the taxonomic annotations on the tree and their quantitative characteristics in metagenomic data. We show the competitiveness of our model compared to other available methods using nine metagenomic datasets of moderate size for binary classification. With synthetic and biological datasets, we show the superior and robust performance of our model for multi-class classification. Furthermore, we design a novel scheme for feature extraction from the learned CNN models and demonstrate improved performance when the extracted features. PopPhy-CNN is a practical deep learning framework for the prediction of host phenotype with the ability of facilitating the retrieval of predictive microbial taxa.
Collapse
|
23
|
Jin S, Zeng X, Xia F, Huang W, Liu X. Application of deep learning methods in biological networks. Brief Bioinform 2020; 22:1902-1917. [PMID: 32363401 DOI: 10.1093/bib/bbaa043] [Citation(s) in RCA: 84] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Revised: 02/19/2020] [Accepted: 03/05/2020] [Indexed: 01/07/2023] Open
Abstract
The increase in biological data and the formation of various biomolecule interaction databases enable us to obtain diverse biological networks. These biological networks provide a wealth of raw materials for further understanding of biological systems, the discovery of complex diseases and the search for therapeutic drugs. However, the increase in data also increases the difficulty of biological networks analysis. Therefore, algorithms that can handle large, heterogeneous and complex data are needed to better analyze the data of these network structures and mine their useful information. Deep learning is a branch of machine learning that extracts more abstract features from a larger set of training data. Through the establishment of an artificial neural network with a network hierarchy structure, deep learning can extract and screen the input information layer by layer and has representation learning ability. The improved deep learning algorithm can be used to process complex and heterogeneous graph data structures and is increasingly being applied to the mining of network data information. In this paper, we first introduce the used network data deep learning models. After words, we summarize the application of deep learning on biological networks. Finally, we discuss the future development prospects of this field.
Collapse
|
24
|
Using multi-layer perceptron with Laplacian edge detector for bladder cancer diagnosis. Artif Intell Med 2019; 102:101746. [PMID: 31980088 DOI: 10.1016/j.artmed.2019.101746] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 10/22/2019] [Accepted: 10/27/2019] [Indexed: 12/26/2022]
Abstract
In this paper, the urinary bladder cancer diagnostic method which is based on Multi-Layer Perceptron and Laplacian edge detector is presented. The aim of this paper is to investigate the implementation possibility of a simpler method (Multi-Layer Perceptron) alongside commonly used methods, such as Deep Learning Convolutional Neural Networks, for the urinary bladder cancer detection. The dataset used for this research consisted of 1997 images of bladder cancer and 986 images of non-cancer tissue. The results of the conducted research showed that using Multi-Layer Perceptron trained and tested with images pre-processed with Laplacian edge detector are achieving AUC value up to 0.99. When different image sizes are compared it can be seen that the best results are achieved if 50×50 and 100×100 images were used.
Collapse
|
25
|
LaPierre N, Ju CJT, Zhou G, Wang W. MetaPheno: A critical evaluation of deep learning and machine learning in metagenome-based disease prediction. Methods 2019; 166:74-82. [PMID: 30885720 PMCID: PMC6708502 DOI: 10.1016/j.ymeth.2019.03.003] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 02/14/2019] [Accepted: 03/04/2019] [Indexed: 01/21/2023] Open
Abstract
The human microbiome plays a number of critical roles, impacting almost every aspect of human health and well-being. Conditions in the microbiome have been linked to a number of significant diseases. Additionally, revolutions in sequencing technology have led to a rapid increase in publicly-available sequencing data. Consequently, there have been growing efforts to predict disease status from metagenomic sequencing data, with a proliferation of new approaches in the last few years. Some of these efforts have explored utilizing a powerful form of machine learning called deep learning, which has been applied successfully in several biological domains. Here, we review some of these methods and the algorithms that they are based on, with a particular focus on deep learning methods. We also perform a deeper analysis of Type 2 Diabetes and obesity datasets that have eluded improved results, using a variety of machine learning and feature extraction methods. We conclude by offering perspectives on study design considerations that may impact results and future directions the field can take to improve results and offer more valuable conclusions. The scripts and extracted features for the analyses conducted in this paper are available via GitHub:https://github.com/nlapier2/metapheno.
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Chelsea J-T Ju
- Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Guangyu Zhou
- Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Wei Wang
- Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
26
|
Radiomic features of glucose metabolism enable prediction of outcome in mantle cell lymphoma. Eur J Nucl Med Mol Imaging 2019; 46:2760-2769. [PMID: 31286200 PMCID: PMC6879438 DOI: 10.1007/s00259-019-04420-6] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Accepted: 06/11/2019] [Indexed: 12/14/2022]
Abstract
PURPOSE To determine whether [18F]FDG PET/CT-derived radiomic features alone or in combination with clinical, laboratory and biological parameters are predictive of 2-year progression-free survival (PFS) in patients with mantle cell lymphoma (MCL), and whether they enable outcome prognostication. METHODS Included in this retrospective study were 107 treatment-naive MCL patients scheduled to receive CD20 antibody-based immuno(chemo)therapy. Standardized uptake values (SUV), total lesion glycolysis, and 16 co-occurrence matrix radiomic features were extracted from metabolic tumour volumes on pretherapy [18F]FDG PET/CT scans. A multilayer perceptron neural network in combination with logistic regression analyses for feature selection was used for prediction of 2-year PFS. International prognostic indices for MCL (MIPI and MIPI-b) were calculated and combined with the radiomic data. Kaplan-Meier estimates with log-rank tests were used for PFS prognostication. RESULTS SUVmean (OR 1.272, P = 0.013) and Entropy (heterogeneity of glucose metabolism; OR 1.131, P = 0.027) were significantly predictive of 2-year PFS: median areas under the curve were 0.72 based on the two radiomic features alone, and 0.82 with the addition of clinical/laboratory/biological data. Higher SUVmean in combination with higher Entropy (SUVmean >3.55 and entropy >3.5), reflecting high "metabolic risk", was associated with a poorer prognosis (median PFS 20.3 vs. 39.4 months, HR 2.285, P = 0.005). The best PFS prognostication was achieved using the MIPI-bm (MIPI-b and metabolic risk combined): median PFS 43.2, 38.2 and 20.3 months in the low-risk, intermediate-risk and high-risk groups respectively (P = 0.005). CONCLUSION In MCL, the [18F]FDG PET/CT-derived radiomic features SUVmean and Entropy may improve prediction of 2-year PFS and PFS prognostication. The best results may be achieved using a combination of metabolic, clinical, laboratory and biological parameters.
Collapse
|
27
|
Zhou YH, Gallins P. A Review and Tutorial of Machine Learning Methods for Microbiome Host Trait Prediction. Front Genet 2019; 10:579. [PMID: 31293616 PMCID: PMC6603228 DOI: 10.3389/fgene.2019.00579] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Accepted: 06/04/2019] [Indexed: 12/19/2022] Open
Abstract
With the growing importance of microbiome research, there is increasing evidence that host variation in microbial communities is associated with overall host health. Advancement in genetic sequencing methods for microbiomes has coincided with improvements in machine learning, with important implications for disease risk prediction in humans. One aspect specific to microbiome prediction is the use of taxonomy-informed feature selection. In this review for non-experts, we explore the most commonly used machine learning methods, and evaluate their prediction accuracy as applied to microbiome host trait prediction. Methods are described at an introductory level, and R/Python code for the analyses is provided.
Collapse
Affiliation(s)
- Yi-Hui Zhou
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, United States
| | - Paul Gallins
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, United States
| |
Collapse
|
28
|
Zhu Q, Li B, He T, Li G, Jiang X. Robust biomarker discovery for microbiome-wide association studies. Methods 2019; 173:44-51. [PMID: 31238097 DOI: 10.1016/j.ymeth.2019.06.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2019] [Revised: 06/06/2019] [Accepted: 06/13/2019] [Indexed: 01/03/2023] Open
Abstract
According to the advances of high-throughput sequencing technology, massive microbiome data accumulated from environmental investigations to human studies. The microbiome-wide association studies are to study the relationship between the microbiome and human health or environment. Recently, Deep Neural Networks (DNNs) are encouraging due to their layer-wise learning ability for representation learning. However, DNNs are considered as black boxes and they require a large amount of training data which makes them impractical to conduct microbiome-wide association studies directly. Meanwhile, the microbiome data is high dimension with many features and noise. A single feature selection method for dealing with the kind of dataset is often unstable. In this work, we introduced a deep learning model named Deep Forest to conduct the microbiome-wide association studies and an ensemble feature selection method is proposed to guide microbial biomarkers' identification. The experiments showed that our ensemble feature method based on Deep Forest had good stability and robustness. The results of feature selection could guide the discovery of microbial biomarkers and help to diagnose microbial-related diseases. The code is available at https://github.com/MicroAVA/MWAS-Biomarkers.git.
Collapse
Affiliation(s)
- Qiang Zhu
- School of Information Management, Central China Normal University, Wuhan, Hubei, China; Hubei Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei, China
| | - Bojing Li
- School of Computer, Central China Normal University, Wuhan, Hubei, China; Hubei Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei, China
| | - Tingting He
- School of Computer, Central China Normal University, Wuhan, Hubei, China; Hubei Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei, China
| | - Guangrong Li
- School of Business, Hunan University, Changsha, Hunan, China
| | - Xingpeng Jiang
- School of Computer, Central China Normal University, Wuhan, Hubei, China; Hubei Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei, China.
| |
Collapse
|
29
|
Cirillo D, Valencia A. Big data analytics for personalized medicine. Curr Opin Biotechnol 2019; 58:161-167. [PMID: 30965188 DOI: 10.1016/j.copbio.2019.03.004] [Citation(s) in RCA: 82] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 02/22/2019] [Accepted: 03/01/2019] [Indexed: 01/06/2023]
Abstract
Big Data are radically changing biomedical research. The unprecedented advances in automated collection of large-scale molecular and clinical data pose major challenges to data analysis and interpretation, calling for the development of new computational approaches. The creation of powerful systems for the effective use of biomedical Big Data in Personalized Medicine (a.k.a. Precision Medicine) will require significant scientific and technical developments, including infrastructure, engineering, project and financial management. We review here how the evolution of data-driven methods offers the possibility to address many of these problems, guiding the formulation of hypotheses on systems functioning and the generation of mechanistic models, and facilitating the design of clinical procedures in Personalized Medicine.
Collapse
Affiliation(s)
- Davide Cirillo
- Barcelona Supercomputing Center (BSC), C/Jordi Girona 29, 08034, Barcelona, Spain.
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), C/Jordi Girona 29, 08034, Barcelona, Spain; ICREA, Pg. Lluís Companys 23, 08010, Barcelona, Spain
| |
Collapse
|
30
|
Tang B, Pan Z, Yin K, Khateeb A. Recent Advances of Deep Learning in Bioinformatics and Computational Biology. Front Genet 2019; 10:214. [PMID: 30972100 PMCID: PMC6443823 DOI: 10.3389/fgene.2019.00214] [Citation(s) in RCA: 89] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 02/27/2019] [Indexed: 01/18/2023] Open
Abstract
Extracting inherent valuable knowledge from omics big data remains as a daunting problem in bioinformatics and computational biology. Deep learning, as an emerging branch from machine learning, has exhibited unprecedented performance in quite a few applications from academia and industry. We highlight the difference and similarity in widely utilized models in deep learning studies, through discussing their basic structures, and reviewing diverse applications and disadvantages. We anticipate the work can serve as a meaningful perspective for further development of its theory, algorithm and application in bioinformatic and computational biology.
Collapse
Affiliation(s)
- Binhua Tang
- Epigenetics & Function Group, Hohai University, Nanjing, China.,School of Public Health, Shanghai Jiao Tong University, Shanghai, China
| | - Zixiang Pan
- Epigenetics & Function Group, Hohai University, Nanjing, China
| | - Kang Yin
- Epigenetics & Function Group, Hohai University, Nanjing, China
| | - Asif Khateeb
- Epigenetics & Function Group, Hohai University, Nanjing, China
| |
Collapse
|
31
|
Yu H, Samuels DC, Zhao YY, Guo Y. Architectures and accuracy of artificial neural network for disease classification from omics data. BMC Genomics 2019; 20:167. [PMID: 30832569 PMCID: PMC6399893 DOI: 10.1186/s12864-019-5546-z] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 02/20/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Deep learning has made tremendous successes in numerous artificial intelligence applications and is unsurprisingly penetrating into various biomedical domains. High-throughput omics data in the form of molecular profile matrices, such as transcriptomes and metabolomes, have long existed as a valuable resource for facilitating diagnosis of patient statuses/stages. It is timely imperative to compare deep learning neural networks against classical machine learning methods in the setting of matrix-formed omics data in terms of classification accuracy and robustness. RESULTS Using 37 high throughput omics datasets, covering transcriptomes and metabolomes, we evaluated the classification power of deep learning compared to traditional machine learning methods. Representative deep learning methods, Multi-Layer Perceptrons (MLP) and Convolutional Neural Networks (CNN), were deployed and explored in seeking optimal architectures for the best classification performance. Together with five classical supervised classification methods (Linear Discriminant Analysis, Multinomial Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machine), MLP and CNN were comparatively tested on the 37 datasets to predict disease stages or to discriminate diseased samples from normal samples. MLPs achieved the highest overall accuracy among all methods tested. More thorough analyses revealed that single hidden layer MLPs with ample hidden units outperformed deeper MLPs. Furthermore, MLP was one of the most robust methods against imbalanced class composition and inaccurate class labels. CONCLUSION Our results concluded that shallow MLPs (of one or two hidden layers) with ample hidden neurons are sufficient to achieve superior and robust classification performance in exploiting numerical matrix-formed omics data for diagnosis purpose. Specific observations regarding optimal network width, class imbalance tolerance, and inaccurate labeling tolerance will inform future improvement of neural network applications on functional genomics data.
Collapse
Affiliation(s)
- Hui Yu
- Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131 USA
| | - David C. Samuels
- Vanderbilt Genetics Institute, Department of Molecular Physiology and Biophysics, Vanderbilt University Medical School, Nashville, TN 37232 USA
| | - Ying-yong Zhao
- Key Laboratory of Resource Biology and Biotechnology in Western China, School of Life Sciences, Northwest University, Xi’an, 710069 Shaanxi China
| | - Yan Guo
- Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131 USA
| |
Collapse
|
32
|
Metwally AA, Yu PS, Reiman D, Dai Y, Finn PW, Perkins DL. Utilizing longitudinal microbiome taxonomic profiles to predict food allergy via Long Short-Term Memory networks. PLoS Comput Biol 2019; 15:e1006693. [PMID: 30716085 PMCID: PMC6361419 DOI: 10.1371/journal.pcbi.1006693] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 12/05/2018] [Indexed: 12/16/2022] Open
Abstract
Food allergy is usually difficult to diagnose in early life, and the inability to diagnose patients with atopic diseases at an early age may lead to severe complications. Numerous studies have suggested an association between the infant gut microbiome and development of allergy. In this work, we investigated the capacity of Long Short-Term Memory (LSTM) networks to predict food allergies in early life (0-3 years) from subjects' longitudinal gut microbiome profiles. Using the DIABIMMUNE dataset, we show an increase in predictive power using our model compared to Hidden Markov Model, Multi-Layer Perceptron Neural Network, Support Vector Machine, Random Forest, and LASSO regression. We further evaluated whether the training of LSTM networks benefits from reduced representations of microbial features. We considered sparse autoencoder for extraction of potential latent representations in addition to standard feature selection procedures based on Minimum Redundancy Maximum Relevance (mRMR) and variance prior to the training of LSTM networks. The comprehensive evaluation reveals that LSTM networks with the mRMR selected features achieve significantly better performance compared to the other tested machine learning models.
Collapse
Affiliation(s)
- Ahmed A. Metwally
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, United States of America
- Department of Computer Science, University of Illinois at Chicago, Chicago, Illinois, United States of America
- Department of Medicine, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Philip S. Yu
- Department of Computer Science, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Derek Reiman
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Yang Dai
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Patricia W. Finn
- Department of Medicine, University of Illinois at Chicago, Chicago, Illinois, United States of America
- Department of Microbiology/Immunology, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - David L. Perkins
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, United States of America
- Department of Medicine, University of Illinois at Chicago, Chicago, Illinois, United States of America
- Department of Surgery, University of Illinois at Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
33
|
Asgari E, Garakani K, McHardy AC, Mofrad MRK. MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples. Bioinformatics 2018; 34:i32-i42. [PMID: 29950008 PMCID: PMC6022683 DOI: 10.1093/bioinformatics/bty296] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Motivation Microbial communities play important roles in the function and maintenance of various biosystems, ranging from the human body to the environment. A major challenge in microbiome research is the classification of microbial communities of different environments or host phenotypes. The most common and cost-effective approach for such studies to date is 16S rRNA gene sequencing. Recent falls in sequencing costs have increased the demand for simple, efficient and accurate methods for rapid detection or diagnosis with proved applications in medicine, agriculture and forensic science. We describe a reference- and alignment-free approach for predicting environments and host phenotypes from 16S rRNA gene sequencing based on k-mer representations that benefits from a bootstrapping framework for investigating the sufficiency of shallow sub-samples. Deep learning methods as well as classical approaches were explored for predicting environments and host phenotypes. Results A k-mer distribution of shallow sub-samples outperformed Operational Taxonomic Unit (OTU) features in the tasks of body-site identification and Crohn's disease prediction. Aside from being more accurate, using k-mer features in shallow sub-samples allows (i) skipping computationally costly sequence alignments required in OTU-picking and (ii) provided a proof of concept for the sufficiency of shallow and short-length 16S rRNA sequencing for phenotype prediction. In addition, k-mer features predicted representative 16S rRNA gene sequences of 18 ecological environments, and 5 organismal environments with high macro-F1 scores of 0.88 and 0.87. For large datasets, deep learning outperformed classical methods such as Random Forest and Support Vector Machine. Availability and implementation The software and datasets are available at https://llp.berkeley.edu/micropheno. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ehsaneddin Asgari
- Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
| | - Kiavash Garakani
- Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
| | - Mohammad R K Mofrad
- Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Lab, Berkeley, CA, USA
| |
Collapse
|
34
|
A Survey of Data Mining and Deep Learning in Bioinformatics. J Med Syst 2018; 42:139. [DOI: 10.1007/s10916-018-1003-9] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 06/21/2018] [Indexed: 12/13/2022]
|
35
|
Fraser K, Bruckner DM, Dordick JS. Advancing Predictive Hepatotoxicity at the Intersection of Experimental, in Silico, and Artificial Intelligence Technologies. Chem Res Toxicol 2018; 31:412-430. [PMID: 29722533 DOI: 10.1021/acs.chemrestox.8b00054] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Adverse drug reactions, particularly those that result in drug-induced liver injury (DILI), are a major cause of drug failure in clinical trials and drug withdrawals. Hepatotoxicity-mediated drug attrition occurs despite substantial investments of time and money in developing cellular assays, animal models, and computational models to predict its occurrence in humans. Underperformance in predicting hepatotoxicity associated with drugs and drug candidates has been attributed to existing gaps in our understanding of the mechanisms involved in driving hepatic injury after these compounds perfuse and are metabolized by the liver. Herein we assess in vitro, in vivo (animal), and in silico strategies used to develop predictive DILI models. We address the effectiveness of several two- and three-dimensional in vitro cellular methods that are frequently employed in hepatotoxicity screens and how they can be used to predict DILI in humans. We also explore how humanized animal models can recapitulate human drug metabolic profiles and associated liver injury. Finally, we highlight the maturation of computational methods for predicting hepatotoxicity, the untapped potential of artificial intelligence for improving in silico DILI screens, and how knowledge acquired from these predictions can shape the refinement of experimental methods.
Collapse
Affiliation(s)
- Keith Fraser
- Department of Chemical and Biological Engineering and Department of Biological Sciences Center for Biotechnology and Interdisciplinary Studies , Rensselaer Polytechnic Institute , Troy , New York 12180 , United States
| | - Dylan M Bruckner
- Department of Chemical and Biological Engineering and Department of Biological Sciences Center for Biotechnology and Interdisciplinary Studies , Rensselaer Polytechnic Institute , Troy , New York 12180 , United States
| | - Jonathan S Dordick
- Department of Chemical and Biological Engineering and Department of Biological Sciences Center for Biotechnology and Interdisciplinary Studies , Rensselaer Polytechnic Institute , Troy , New York 12180 , United States
| |
Collapse
|
36
|
Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 2018; 15:20170387. [PMID: 29618526 PMCID: PMC5938574 DOI: 10.1098/rsif.2017.0387] [Citation(s) in RCA: 826] [Impact Index Per Article: 137.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 03/07/2018] [Indexed: 11/12/2022] Open
Abstract
Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.
Collapse
Affiliation(s)
- Travers Ching
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI, USA
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Brett K Beaulieu-Jones
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Alexandr A Kalinin
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - Gregory P Way
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Enrico Ferrero
- Computational Biology and Stats, Target Sciences, GlaxoSmithKline, Stevenage, UK
| | | | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Wei Xie
- Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Gail L Rosen
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Benjamin J Lengerich
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Johnny Israeli
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Jack Lanchantin
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Stephen Woloszynek
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Anne E Carpenter
- Imaging Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Evan M Cofer
- Department of Computer Science, Trinity University, San Antonio, TX, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Christopher A Lavender
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
| | - Srinivas C Turaga
- Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - David J Harris
- Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
| | | | - Yanjun Qi
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Yifan Peng
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Laura K Wiley
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Marwin H S Segler
- Institute of Organic Chemistry, Westfälische Wilhelms-Universität Münster, Münster, Germany
| | - Simina M Boca
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - S Joshua Swamidass
- Department of Pathology and Immunology, Washington University in Saint Louis, St Louis, MO, USA
| | - Austin Huang
- Department of Medicine, Brown University, Providence, RI, USA
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
37
|
Fioravanti D, Giarratano Y, Maggio V, Agostinelli C, Chierici M, Jurman G, Furlanello C. Phylogenetic convolutional neural networks in metagenomics. BMC Bioinformatics 2018; 19:49. [PMID: 29536822 PMCID: PMC5850953 DOI: 10.1186/s12859-018-2033-5] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space. RESULTS Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron. CONCLUSION Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.
Collapse
Affiliation(s)
- Diego Fioravanti
- Fondazione Bruno Kessler (FBK), Via Sommarive 18 Povo, Trento, I-38123 Italy
- Max Planck Institute for Intelligent Systems, Spemannstraße 34, Tübingen, 72076 Germany
| | - Ylenia Giarratano
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, 9 Little France Road, Edinburgh, EH16 4UX UK
| | - Valerio Maggio
- Fondazione Bruno Kessler (FBK), Via Sommarive 18 Povo, Trento, I-38123 Italy
| | - Claudio Agostinelli
- Department of Mathematics, University of Trento, Via Sommarive 14 Povo, Trento, I-38123 Italy
| | - Marco Chierici
- Fondazione Bruno Kessler (FBK), Via Sommarive 18 Povo, Trento, I-38123 Italy
| | - Giuseppe Jurman
- Fondazione Bruno Kessler (FBK), Via Sommarive 18 Povo, Trento, I-38123 Italy
| | - Cesare Furlanello
- Fondazione Bruno Kessler (FBK), Via Sommarive 18 Povo, Trento, I-38123 Italy
| |
Collapse
|
38
|
Metwally AA, Yang J, Ascoli C, Dai Y, Finn PW, Perkins DL. MetaLonDA: a flexible R package for identifying time intervals of differentially abundant features in metagenomic longitudinal studies. MICROBIOME 2018; 6:32. [PMID: 29439731 PMCID: PMC5812052 DOI: 10.1186/s40168-018-0402-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 01/12/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND Microbial longitudinal studies are powerful experimental designs utilized to classify diseases, determine prognosis, and analyze microbial systems dynamics. In longitudinal studies, only identifying differential features between two phenotypes does not provide sufficient information to determine whether a change in the relative abundance is short-term or continuous. Furthermore, sample collection in longitudinal studies suffers from all forms of variability such as a different number of subjects per phenotypic group, a different number of samples per subject, and samples not collected at consistent time points. These inconsistencies are common in studies that collect samples from human subjects. RESULTS We present MetaLonDA, an R package that is capable of identifying significant time intervals of differentially abundant microbial features. MetaLonDA is flexible such that it can perform differential abundance tests despite inconsistencies associated with sample collection. Extensive experiments on simulated datasets quantitatively demonstrate the effectiveness of MetaLonDA with significant improvement over alternative methods. We applied MetaLonDA to the DIABIMMUNE cohort ( https://pubs.broadinstitute.org/diabimmune ) substantiating significant early lifetime intervals of exposure to Bacteroides and Bifidobacterium in Finnish and Russian infants. Additionally, we established significant time intervals during which novel differentially relative abundant microbial genera may contribute to aberrant immunogenicity and development of autoimmune disease. CONCLUSION MetaLonDA is computationally efficient and can be run on desktop machines. The identified differentially abundant features and their time intervals have the potential to distinguish microbial biomarkers that may be used for microbial reconstitution through bacteriotherapy, probiotics, or antibiotics. Moreover, MetaLonDA can be applied to any longitudinal count data such as metagenomic sequencing, 16S rRNA gene sequencing, or RNAseq. MetaLonDA is publicly available on CRAN ( https://CRAN.R-project.org/package=MetaLonDA ).
Collapse
Affiliation(s)
- Ahmed A. Metwally
- Department of Bioengineering, University of Illinois at Chicago, Chicago, 60607 IL USA
- Department of Medicine, University of Illinois at Chicago, Chicago, 60612 IL USA
- Department of Computer Science, University of Illinois at Chicago, Chicago, 60607 IL USA
| | - Jie Yang
- Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, 60607 IL USA
| | - Christian Ascoli
- Department of Medicine, University of Illinois at Chicago, Chicago, 60612 IL USA
| | - Yang Dai
- Department of Bioengineering, University of Illinois at Chicago, Chicago, 60607 IL USA
| | - Patricia W. Finn
- Department of Medicine, University of Illinois at Chicago, Chicago, 60612 IL USA
- Department of Microbiology and Immunology, University of Illinois at Chicago, Chicago, 60612 IL USA
| | - David L. Perkins
- Department of Bioengineering, University of Illinois at Chicago, Chicago, 60607 IL USA
- Department of Medicine, University of Illinois at Chicago, Chicago, 60612 IL USA
- Department of Surgery, University of Illinois at Chicago, Chicago, 60612 IL USA
| |
Collapse
|
39
|
Gene Prediction in Metagenomic Fragments with Deep Learning. BIOMED RESEARCH INTERNATIONAL 2017; 2017:4740354. [PMID: 29250541 PMCID: PMC5698827 DOI: 10.1155/2017/4740354] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 10/08/2017] [Indexed: 01/14/2023]
Abstract
Next generation sequencing technologies used in metagenomics yield numerous sequencing fragments which come from thousands of different species. Accurately identifying genes from metagenomics fragments is one of the most fundamental issues in metagenomics. In this article, by fusing multifeatures (i.e., monocodon usage, monoamino acid usage, ORF length coverage, and Z-curve features) and using deep stacking networks learning model, we present a novel method (called Meta-MFDL) to predict the metagenomic genes. The results with 10 CV and independent tests show that Meta-MFDL is a powerful tool for identifying genes from metagenomic fragments.
Collapse
|
40
|
Zhang B, Zhao J, Chen X, Wu J. ECG data compression using a neural network model based on multi-objective optimization. PLoS One 2017; 12:e0182500. [PMID: 28972986 PMCID: PMC5626036 DOI: 10.1371/journal.pone.0182500] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 07/19/2017] [Indexed: 12/05/2022] Open
Abstract
Electrocardiogram (ECG) data analysis is of great significance to the diagnosis of cardiovascular disease. ECG compression should be processed in real time, and the data should be based on lossless compression and have high predictability. In terms of the real time aspect, short-time Fourier transformation is applied to the processing of signal wave for reducing computational time. For the lossless compression requirement, wavelet-transformation that is a coding algorithm can be used to avoid loss of data. In practice, compression is required to avoid storing redundant recording data that are not useful in the diagnosis platform. The obtained data can be preprocessed to remove noise by using wavelet transform, and then a multi-objective optimize neural network model is used to extract feature information. Compared with the existing traditional methods such as direct data processing method and transform method, our proposed compression model has self-learning ability to achieve high data compression ratio at 1:19 without losing important ECG information and compromising quality. Upon testing, we demonstrated that the proposed ECG data compression method based on multi-objective optimization neural network is effective and efficient in clinical practice.
Collapse
Affiliation(s)
- Bo Zhang
- Department of Ultrasound in Medicine, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Jiasheng Zhao
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Xiao Chen
- School of Computing and Communications, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, Australia
| | - Jianhuang Wu
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
41
|
Reiman D, Metwally A. Using convolutional neural networks to explore the microbiome. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2017:4269-4272. [PMID: 29060840 DOI: 10.1109/embc.2017.8037799] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The microbiome has been shown to have an impact on the development of various diseases in the host. Being able to make an accurate prediction of the phenotype of a genomic sample based on its microbial taxonomic abundance profile is an important problem for personalized medicine. In this paper, we examine the potential of using a deep learning framework, a convolutional neural network (CNN), for such a prediction. To facilitate the CNN learning, we explore the structure of abundance profiles by creating the phylogenetic tree and by designing a scheme to embed the tree to a matrix that retains the spatial relationship of nodes in the tree and their quantitative characteristics. The proposed CNN framework is highly accurate, achieving a 99.47% of accuracy based on the evaluation on a dataset 1967 samples of three phenotypes. Our result demonstrated the feasibility and promising aspect of CNN in the classification of sample phenotype.
Collapse
|
42
|
Vakanski A, Ferguson JM, Lee S. Metrics for Performance Evaluation of Patient Exercises during Physical Therapy. INTERNATIONAL JOURNAL OF PHYSICAL MEDICINE & REHABILITATION 2017; 5:403. [PMID: 28752104 PMCID: PMC5526359 DOI: 10.4172/2329-9096.1000403] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE The article proposes a set of metrics for evaluation of patient performance in physical therapy exercises. METHODS Taxonomy is employed that classifies the metrics into quantitative and qualitative categories, based on the level of abstraction of the captured motion sequences. Further, the quantitative metrics are classified into model-less and model-based metrics, in reference to whether the evaluation employs the raw measurements of patient performed motions, or whether the evaluation is based on a mathematical model of the motions. The reviewed metrics include root-mean square distance, Kullback Leibler divergence, log-likelihood, heuristic consistency, Fugl-Meyer Assessment, and similar. RESULTS The metrics are evaluated for a set of five human motions captured with a Kinect sensor. CONCLUSION The metrics can potentially be integrated into a system that employs machine learning for modelling and assessment of the consistency of patient performance in home-based therapy setting. Automated performance evaluation can overcome the inherent subjectivity in human performed therapy assessment, and it can increase the adherence to prescribed therapy plans, and reduce healthcare costs.
Collapse
Affiliation(s)
| | - Jake M. Ferguson
- Center for Modeling Complex Interactions, University of Idaho, Moscow, ID, USA
| | - Stephen Lee
- Department of Statistical Science, University of Idaho, Moscow, ID, USA
| |
Collapse
|
43
|
Ma X, Cheng Y, Hao S. Multi-stage classification method oriented to aerial image based on low-rank recovery and multi-feature fusion sparse representation. APPLIED OPTICS 2016; 55:10038-10044. [PMID: 27958408 DOI: 10.1364/ao.55.010038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Automatic classification of terrain surfaces from an aerial image is essential for an autonomous unmanned aerial vehicle (UAV) landing at an unprepared site by using vision. Diverse terrain surfaces may show similar spectral properties due to the illumination and noise that easily cause poor classification performance. To address this issue, a multi-stage classification algorithm based on low-rank recovery and multi-feature fusion sparse representation is proposed. First, color moments and Gabor texture feature are extracted from training data and stacked as column vectors of a dictionary. Then we perform low-rank matrix recovery for the dictionary by using augmented Lagrange multipliers and construct a multi-stage terrain classifier. Experimental results on an aerial map database that we prepared verify the classification accuracy and robustness of the proposed method.
Collapse
|
44
|
Scalable metagenomics alignment research tool (SMART): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations. BMC Bioinformatics 2016; 17:292. [PMID: 27465705 PMCID: PMC4963998 DOI: 10.1186/s12859-016-1159-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Accepted: 07/21/2016] [Indexed: 11/24/2022] Open
Abstract
Background Next generation sequencing technology has enabled characterization of metagenomics through massively parallel genomic DNA sequencing. The complexity and diversity of environmental samples such as the human gut microflora, combined with the sustained exponential growth in sequencing capacity, has led to the challenge of identifying microbial organisms by DNA sequence. We sought to validate a Scalable Metagenomics Alignment Research Tool (SMART), a novel searching heuristic for shotgun metagenomics sequencing results. Results After retrieving all genomic DNA sequences from the NCBI GenBank, over 1 × 1011 base pairs of 3.3 × 106 sequences from 9.25 × 105 species were indexed using 4 base pair hashtable shards. A MapReduce searching strategy was used to distribute the search workload in a computing cluster environment. In addition, a one base pair permutation algorithm was used to account for single nucleotide polymorphisms and sequencing errors. Simulated datasets used to evaluate Kraken, a similar metagenomics classification tool, were used to measure and compare precision and accuracy. Finally using a same set of training sequences we compared Kraken, CLARK, and SMART within the same computing environment. Utilizing 12 computational nodes, we completed the classification of all datasets in under 10 min each using exact matching with an average throughput of over 1.95 × 106 reads classified per minute. With permutation matching, we achieved sensitivity greater than 83 % and precision greater than 94 % with simulated datasets at the species classification level. We demonstrated the application of this technique applied to conjunctival and gut microbiome metagenomics sequencing results. In our head to head comparison, SMART and CLARK had similar accuracy gains over Kraken at the species classification level, but SMART required approximately half the amount of RAM of CLARK. Conclusions SMART is the first scalable, efficient, and rapid metagenomics classification algorithm capable of matching against all the species and sequences present in the NCBI GenBank and allows for a single step classification of microorganisms as well as large plant, mammalian, or invertebrate genomes from which the metagenomic sample may have been derived.
Collapse
|
45
|
Abstract
Increases in throughput and installed base of biomedical research equipment led to a massive accumulation of -omics data known to be highly variable, high-dimensional, and sourced from multiple often incompatible data platforms. While this data may be useful for biomarker identification and drug discovery, the bulk of it remains underutilized. Deep neural networks (DNNs) are efficient algorithms based on the use of compositional layers of neurons, with advantages well matched to the challenges -omics data presents. While achieving state-of-the-art results and even surpassing human accuracy in many challenging tasks, the adoption of deep learning in biomedicine has been comparatively slow. Here, we discuss key features of deep learning that may give this approach an edge over other machine learning methods. We then consider limitations and review a number of applications of deep learning in biomedical studies demonstrating proof of concept and practical utility.
Collapse
Affiliation(s)
- Polina Mamoshina
- Artificial Intelligence Research, Insilico Medicine, Inc, ETC, Johns Hopkins University , Baltimore, Maryland 21218, United States
| | - Armando Vieira
- RedZebra Analytics , 1 Quality Court, London, WC2A 1HR, U.K
| | - Evgeny Putin
- Artificial Intelligence Research, Insilico Medicine, Inc, ETC, Johns Hopkins University , Baltimore, Maryland 21218, United States
| | - Alex Zhavoronkov
- Artificial Intelligence Research, Insilico Medicine, Inc, ETC, Johns Hopkins University , Baltimore, Maryland 21218, United States
| |
Collapse
|