1
|
Guo M, Ye X, Huang D, Sakurai T. Robust feature learning using contractive autoencoders for multi-omics clustering in cancer subtyping. Methods 2025; 233:52-60. [PMID: 39577512 DOI: 10.1016/j.ymeth.2024.11.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 10/04/2024] [Accepted: 11/18/2024] [Indexed: 11/24/2024] Open
Abstract
Cancer can manifest in virtually any tissue or organ, necessitating precise subtyping of cancer patients to enhance diagnosis, treatment, and prognosis. With the accumulation of vast amounts of omics data, numerous studies have focused on integrating multi-omics data for cancer subtyping using clustering techniques. However, due to the heterogeneity of different omics data, extracting important features to effectively integrate these data for accurate clustering analysis remains a significant challenge. This study proposes a new multi-omics clustering framework for cancer subtyping, which utilizes contractive autoencoder to extract robust features. By encouraging the learned representation to be less sensitive to small changes, the contractive autoencoder learns robust feature representations from different omics. To incorporate survival information into the clustering analysis, Cox proportional hazards regression is used to further select the key features significantly associated with survival for integration. Finally, we utilize K-means clustering on the integrated feature to obtain the clustering result. The proposed framework is evaluated on ten different cancer datasets across four levels of omics data and compared to other existing methods. The experimental results indicate that the proposed framework effectively integrates the four omics datasets and outperforms other methods, achieving higher C-index scores and showing more significant differences between survival curves. Additionally, differential gene analysis and pathway enrichment analysis are performed to further demonstrate the effectiveness of the proposed method framework.
Collapse
Affiliation(s)
- Mengke Guo
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Dong Huang
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| |
Collapse
|
2
|
Bao X, Li Q, Chen D, Dai X, Liu C, Tian W, Zhang H, Jin Y, Wang Y, Cheng J, Lai C, Ye C, Xin S, Li X, Su G, Ding Y, Xiong Y, Xie J, Tano V, Wang Y, Fu W, Deng S, Fang W, Sheng J, Ruan J, Zhao P. A multiomics analysis-assisted deep learning model identifies a macrophage-oriented module as a potential therapeutic target in colorectal cancer. Cell Rep Med 2024; 5:101399. [PMID: 38307032 PMCID: PMC10897549 DOI: 10.1016/j.xcrm.2024.101399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 01/02/2024] [Accepted: 01/08/2024] [Indexed: 02/04/2024]
Abstract
Colorectal cancer (CRC) is a common malignancy involving multiple cellular components. The CRC tumor microenvironment (TME) has been characterized well at single-cell resolution. However, a spatial interaction map of the CRC TME is still elusive. Here, we integrate multiomics analyses and establish a spatial interaction map to improve the prognosis, prediction, and therapeutic development for CRC. We construct a CRC immune module (CCIM) that comprises FOLR2+ macrophages, exhausted CD8+ T cells, tolerant CD8+ T cells, exhausted CD4+ T cells, and regulatory T cells. Multiplex immunohistochemistry is performed to depict the CCIM. Based on this, we utilize advanced deep learning technology to establish a spatial interaction map and predict chemotherapy response. CCIM-Net is constructed, which demonstrates good predictive performance for chemotherapy response in both the training and testing cohorts. Lastly, targeting FOLR2+ macrophage therapeutics is used to disrupt the immunosuppressive CCIM and enhance the chemotherapy response in vivo.
Collapse
Affiliation(s)
- Xuanwen Bao
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China.
| | - Qiong Li
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Dong Chen
- Department of Colorectal Surgery, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Xiaomeng Dai
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Chuan Liu
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Weihong Tian
- Department of Immunology, School of Medicine, Jiangsu University, Zhenjiang, Jiangsu 212013, China
| | - Hangyu Zhang
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Yuzhi Jin
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Yin Wang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Jinlin Cheng
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Chunyu Lai
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Chanqi Ye
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Shan Xin
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Xin Li
- Department of Chronic Inflammation and Cancer, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Ge Su
- College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Yongfeng Ding
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Yangyang Xiong
- Department of Gastroenterology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Jindong Xie
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou 510060, China
| | - Vincent Tano
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 637551, Republic of Singapore
| | - Yanfang Wang
- Ludwig-Maximilians-Universität München (LMU), 80539 Munich, Germany
| | - Wenguang Fu
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan Province 646000, China
| | - Shuiguang Deng
- College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Weijia Fang
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China
| | - Jianpeng Sheng
- Zhejiang Provincial Key Laboratory of Pancreatic Disease, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China.
| | - Jian Ruan
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China; Department of Hepatobiliary Surgery, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan Province 646000, China.
| | - Peng Zhao
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, China.
| |
Collapse
|
3
|
Štancl P, Karlić R. Machine learning for pan-cancer classification based on RNA sequencing data. Front Mol Biosci 2023; 10:1285795. [PMID: 38028533 PMCID: PMC10667476 DOI: 10.3389/fmolb.2023.1285795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023] Open
Abstract
Despite recent improvements in cancer diagnostics, 2%-5% of all malignancies are still cancers of unknown primary (CUP), for which the tissue-of-origin (TOO) cannot be determined at the time of presentation. Since the primary site of cancer leads to the choice of optimal treatment, CUP patients pose a significant clinical challenge with limited treatment options. Data produced by large-scale cancer genomics initiatives, which aim to determine the genomic, epigenomic, and transcriptomic characteristics of a large number of individual patients of multiple cancer types, have led to the introduction of various methods that use machine learning to predict the TOO of cancer patients. In this review, we assess the reproducibility, interpretability, and robustness of results obtained by 20 recent studies that utilize different machine learning methods for TOO prediction based on RNA sequencing data, including their reported performance on independent data sets and identification of important features. Our review investigates the strengths and weaknesses of different methods, checks the correspondence of their results, and identifies potential issues with datasets used for model training and testing, assessing their potential usefulness in a clinical setting and suggesting future improvements.
Collapse
Affiliation(s)
| | - Rosa Karlić
- Bioinformatics Group, Division of Molecular Biology, Department of Biology, Faculty of Science, University of Zagreb, Zagreb, Croatia
| |
Collapse
|
4
|
Alatrany AS, Khan W, Hussain AJ, Mustafina J, Al-Jumeily D. Transfer Learning for Classification of Alzheimer's Disease Based on Genome Wide Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2700-2711. [PMID: 37018274 DOI: 10.1109/tcbb.2022.3233869] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Alzheimer's disease (AD) is a type of brain disorder that is regarded as a degenerative disease because the corresponding symptoms aggravate with the time progression. Single nucleotide polymorphisms (SNPs) have been identified as relevant biomarkers for this condition. This study aims to identify SNPs biomarkers associated with the AD in order to perform a reliable classification of AD. In contrast to existing related works, we utilize deep transfer learning with varying experimental analysis for reliable classification of AD. For this purpose, the convolutional neural networks (CNN) are firstly trained over the genome-wide association studies (GWAS) dataset requested from the AD neuroimaging initiative. We then employ the deep transfer learning for further training of our CNN (as base model) over a different AD GWAS dataset, to extract the final set of features. The extracted features are then fed into Support Vector Machine for classification of AD. Detailed experiments are performed using multiple datasets and varying experimental configurations. The statistical outcomes indicate an accuracy of 89% which is a significant improvement when benchmarked with existing related works.
Collapse
|
5
|
Mahdi-Esferizi R, Haji Molla Hoseyni B, Mehrpanah A, Golzade Y, Najafi A, Elahian F, Zadeh Shirazi A, Gomez GA, Tahmasebian S. DeeP4med: deep learning for P4 medicine to predict normal and cancer transcriptome in multiple human tissues. BMC Bioinformatics 2023; 24:275. [PMID: 37403016 DOI: 10.1186/s12859-023-05400-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Accepted: 06/25/2023] [Indexed: 07/06/2023] Open
Abstract
BACKGROUND P4 medicine (predict, prevent, personalize, and participate) is a new approach to diagnosing and predicting diseases on a patient-by-patient basis. For the prevention and treatment of diseases, prediction plays a fundamental role. One of the intelligent strategies is the design of deep learning models that can predict the state of the disease using gene expression data. RESULTS We create an autoencoder deep learning model called DeeP4med, including a Classifier and a Transferor that predicts cancer's gene expression (mRNA) matrix from its matched normal sample and vice versa. The range of the F1 score of the model, depending on tissue type in the Classifier, is from 0.935 to 0.999 and in Transferor from 0.944 to 0.999. The accuracy of DeeP4med for tissue and disease classification was 0.986 and 0.992, respectively, which performed better compared to seven classic machine learning models (Support Vector Classifier, Logistic Regression, Linear Discriminant Analysis, Naive Bayes, Decision Tree, Random Forest, K Nearest Neighbors). CONCLUSIONS Based on the idea of DeeP4med, by having the gene expression matrix of a normal tissue, we can predict its tumor gene expression matrix and, in this way, find effective genes in transforming a normal tissue into a tumor tissue. Results of Differentially Expressed Genes (DEGs) and enrichment analysis on the predicted matrices for 13 types of cancer showed a good correlation with the literature and biological databases. This led that by using the gene expression matrix, to train the model with features of each person in a normal and cancer state, this model could predict diagnosis based on gene expression data from healthy tissue and be used to identify possible therapeutic interventions for those patients.
Collapse
Affiliation(s)
- Roohallah Mahdi-Esferizi
- Department of Medical Biotechnology, School of Advanced Technologies, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | | | - Amir Mehrpanah
- Faculty of Mathematics, Shahid Beheshti University, Tehran, Iran
| | - Yazdan Golzade
- Department of Mathematics, Faculty of Basic Sciences, Iran University of Science and Technology,(IUST), Tehran, Iran
| | - Ali Najafi
- Molecular Biology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Fatemeh Elahian
- Department of Medical Biotechnology, School of Advanced Technologies, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | - Amin Zadeh Shirazi
- Centre for Cancer Biology, SA Pathology and University of South Australia, Adelaide, SA, 5000, Australia
| | - Guillermo A Gomez
- Centre for Cancer Biology, SA Pathology and University of South Australia, Adelaide, SA, 5000, Australia
| | - Shahram Tahmasebian
- Cellular and Molecular Research Center, Basic Health Sciences Institute, Shahrekord University of Medical Sciences, Shahrekord, Iran.
| |
Collapse
|
6
|
Krieger KL, Mann EK, Lee KJ, Bolterstein E, Jebakumar D, Ittmann MM, Dal Zotto VL, Shaban M, Sreekumar A, Gassman NR. Spatial mapping of the DNA adducts in cancer. DNA Repair (Amst) 2023; 128:103529. [PMID: 37390674 PMCID: PMC10330576 DOI: 10.1016/j.dnarep.2023.103529] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/19/2023] [Accepted: 06/21/2023] [Indexed: 07/02/2023]
Abstract
DNA adducts and strand breaks are induced by various exogenous and endogenous agents. Accumulation of DNA damage is implicated in many disease processes, including cancer, aging, and neurodegeneration. The continuous acquisition of DNA damage from exogenous and endogenous stressors coupled with defects in DNA repair pathways contribute to the accumulation of DNA damage within the genome and genomic instability. While mutational burden offers some insight into the level of DNA damage a cell may have experienced and subsequently repaired, it does not quantify DNA adducts and strand breaks. Mutational burden also infers the identity of the DNA damage. With advances in DNA adduct detection and quantification methods, there is an opportunity to identify DNA adducts driving mutagenesis and correlate with a known exposome. However, most DNA adduct detection methods require isolation or separation of the DNA and its adducts from the context of the nuclei. Mass spectrometry, comet assays, and other techniques precisely quantify lesion types but lose the nuclear context and even tissue context of the DNA damage. The growth in spatial analysis technologies offers a novel opportunity to leverage DNA damage detection with nuclear and tissue context. However, we lack a wealth of techniques capable of detecting DNA damage in situ. Here, we review the limited existing in situ DNA damage detection methods and examine their potential to offer spatial analysis of DNA adducts in tumors or other tissues. We also offer a perspective on the need for spatial analysis of DNA damage in situ and highlight Repair Assisted Damage Detection (RADD) as an in situ DNA adduct technique with the potential to integrate with spatial analysis and the challenges to be addressed.
Collapse
Affiliation(s)
- Kimiko L Krieger
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA; Center for Translational Metabolism and Health Disparities (C-TMH), Baylor College of Medicine, Houston, TX 77030, USA
| | - Elise K Mann
- Department of Physiology and Cell Biology, College of Medicine, University of South Alabama, Mobile, AL 36688, USA; Mitchell Cancer Institute, University of South Alabama, Mobile, AL 36604, USA
| | - Kevin J Lee
- Department of Physiology and Cell Biology, College of Medicine, University of South Alabama, Mobile, AL 36688, USA; Mitchell Cancer Institute, University of South Alabama, Mobile, AL 36604, USA
| | - Elyse Bolterstein
- Department of Biology, Northeastern Illinois University, Chicago, IL 60625, USA
| | - Deborah Jebakumar
- Department of Anatomic Pathology, Baylor Scott & White Medical Center, Temple, TX 76508, USA; Texas A&M College of Medicine, Temple, TX 76508, USA
| | - Michael M Ittmann
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA; Human Tissue Acquisition & Pathology Shared Resource, Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Valeria L Dal Zotto
- Department of Pathology, College of Medicine, University of South Alabama, Mobile, AL 36688, USA
| | - Mohamed Shaban
- Department of Electrical and Computer Engineering, University of South Alabama, Mobile, AL 36688, USA
| | - Arun Sreekumar
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA; Center for Translational Metabolism and Health Disparities (C-TMH), Baylor College of Medicine, Houston, TX 77030, USA; Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Natalie R Gassman
- Department of Pharmacology and Toxicology, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
| |
Collapse
|
7
|
Heil BJ, Crawford J, Greene CS. The effect of non-linear signal in classification problems using gene expression. PLoS Comput Biol 2023; 19:e1010984. [PMID: 36972227 PMCID: PMC10079219 DOI: 10.1371/journal.pcbi.1010984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 04/06/2023] [Accepted: 02/28/2023] [Indexed: 03/29/2023] Open
Abstract
Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be.
Collapse
Affiliation(s)
- Benjamin J. Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, United States of America
| | - Jake Crawford
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, United States of America
| | - Casey S. Greene
- Department of Pharmacology, University of Colorado School of Medicine, Colorado, United States of America
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Colorado, United States of America
| |
Collapse
|
8
|
Bakrania A, Joshi N, Zhao X, Zheng G, Bhat M. Artificial intelligence in liver cancers: Decoding the impact of machine learning models in clinical diagnosis of primary liver cancers and liver cancer metastases. Pharmacol Res 2023; 189:106706. [PMID: 36813095 DOI: 10.1016/j.phrs.2023.106706] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 02/17/2023] [Accepted: 02/19/2023] [Indexed: 02/22/2023]
Abstract
Liver cancers are the fourth leading cause of cancer-related mortality worldwide. In the past decade, breakthroughs in the field of artificial intelligence (AI) have inspired development of algorithms in the cancer setting. A growing body of recent studies have evaluated machine learning (ML) and deep learning (DL) algorithms for pre-screening, diagnosis and management of liver cancer patients through diagnostic image analysis, biomarker discovery and predicting personalized clinical outcomes. Despite the promise of these early AI tools, there is a significant need to explain the 'black box' of AI and work towards deployment to enable ultimate clinical translatability. Certain emerging fields such as RNA nanomedicine for targeted liver cancer therapy may also benefit from application of AI, specifically in nano-formulation research and development given that they are still largely reliant on lengthy trial-and-error experiments. In this paper, we put forward the current landscape of AI in liver cancers along with the challenges of AI in liver cancer diagnosis and management. Finally, we have discussed the future perspectives of AI application in liver cancer and how a multidisciplinary approach using AI in nanomedicine could accelerate the transition of personalized liver cancer medicine from bench side to the clinic.
Collapse
Affiliation(s)
- Anita Bakrania
- Toronto General Hospital Research Institute, Toronto, ON, Canada; Ajmera Transplant Program, University Health Network, Toronto, ON, Canada; Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada.
| | | | - Xun Zhao
- Toronto General Hospital Research Institute, Toronto, ON, Canada; Ajmera Transplant Program, University Health Network, Toronto, ON, Canada
| | - Gang Zheng
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Mamatha Bhat
- Toronto General Hospital Research Institute, Toronto, ON, Canada; Ajmera Transplant Program, University Health Network, Toronto, ON, Canada; Division of Gastroenterology, Department of Medicine, University Health Network and University of Toronto, Toronto, ON, Canada; Department of Medical Sciences, Toronto, ON, Canada.
| |
Collapse
|
9
|
A deep learning model to classify neoplastic state and tissue origin from transcriptomic data. Sci Rep 2022; 12:9669. [PMID: 35690622 PMCID: PMC9188604 DOI: 10.1038/s41598-022-13665-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 04/11/2022] [Indexed: 12/20/2022] Open
Abstract
Application of deep learning methods to transcriptomic data has the potential to enhance the accuracy and efficiency of tissue classification and cell state identification. Herein, we developed a multitask deep learning model for tissue classification combining publicly available whole transcriptomic (RNA-seq) datasets of non-neoplastic, neoplastic and peri-neoplastic tissue to classify disease state, tissue origin and neoplastic subclass. RNA-seq data from a total of 10,116 patient samples processed through a common pipeline were used for model training and validation. The model achieved 99% accuracy for disease state classification (ROC-AUC of 0.98) and 97% accuracy for tissue origin (ROC-AUC of 0.99). Moreover, the model achieved an accuracy of 92% (ROC-AUC 0.95) for neoplastic subclassification. This is the first multitask deep learning algorithm developed for tissue classification employing a uniform pipeline analysis of transcriptomic data with multiple tissue classifiers. This model serves as a framework for incorporating large transcriptomic datasets across conditions to facilitate clinical diagnosis and cell-based treatment strategies.
Collapse
|
10
|
Go S, Wang Q, Wang B, Jiang Y, Bajalovic N, Loke DK. Continual Learning Electrical Conduction in Resistive‐Switching‐Memory Materials. ADVANCED THEORY AND SIMULATIONS 2022. [DOI: 10.1002/adts.202200226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Shao‐Xiang Go
- Department of Science, Mathematics and Technology Singapore University of Technology and Design 487372 Singapore
| | - Qiang Wang
- Department of Science, Mathematics and Technology Singapore University of Technology and Design 487372 Singapore
| | - Bo Wang
- Department of Information Systems Technology and Design Singapore University of Technology and Design 487372 Singapore
| | - Yu Jiang
- Department of Science, Mathematics and Technology Singapore University of Technology and Design 487372 Singapore
| | - Natasa Bajalovic
- Department of Science, Mathematics and Technology Singapore University of Technology and Design 487372 Singapore
| | - Desmond K. Loke
- Department of Science, Mathematics and Technology Singapore University of Technology and Design 487372 Singapore
| |
Collapse
|
11
|
Bhat GR, Sethi I, Rah B, Kumar R, Afroze D. Innovative in Silico Approaches for Characterization of Genes and Proteins. Front Genet 2022; 13:865182. [PMID: 35664302 PMCID: PMC9159363 DOI: 10.3389/fgene.2022.865182] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 04/11/2022] [Indexed: 11/13/2022] Open
Abstract
Bioinformatics is an amalgamation of biology, mathematics and computer science. It is a science which gathers the information from biology in terms of molecules and applies the informatic techniques to the gathered information for understanding and organizing the data in a useful manner. With the help of bioinformatics, the experimental data generated is stored in several databases available online like nucleotide database, protein databases, GENBANK and others. The data stored in these databases is used as reference for experimental evaluation and validation. Till now several online tools have been developed to analyze the genomic, transcriptomic, proteomics, epigenomics and metabolomics data. Some of them include Human Splicing Finder (HSF), Exonic Splicing Enhancer Mutation taster, and others. A number of SNPs are observed in the non-coding, intronic regions and play a role in the regulation of genes, which may or may not directly impose an effect on the protein expression. Many mutations are thought to influence the splicing mechanism by affecting the existing splice sites or creating a new sites. To predict the effect of mutation (SNP) on splicing mechanism/signal, HSF was developed. Thus, the tool is helpful in predicting the effect of mutations on splicing signals and can provide data even for better understanding of the intronic mutations that can be further validated experimentally. Additionally, rapid advancement in proteomics have steered researchers to organize the study of protein structure, function, relationships, and dynamics in space and time. Thus the effective integration of all of these technological interventions will eventually lead to steering up of next-generation systems biology, which will provide valuable biological insights in the field of research, diagnostic, therapeutic and development of personalized medicine.
Collapse
Affiliation(s)
- Gh. Rasool Bhat
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
| | - Itty Sethi
- Institute of Human Genetics, University of Jammu, Jammu, India
| | - Bilal Rah
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
| | - Rakesh Kumar
- School of Biotechnology, Shri Mata Vaishno Devi University, Katra, India
| | - Dil Afroze
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
| |
Collapse
|
12
|
Integration of Multimodal Data from Disparate Sources for Identifying Disease Subtypes. BIOLOGY 2022; 11:biology11030360. [PMID: 35336734 PMCID: PMC8945377 DOI: 10.3390/biology11030360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 02/17/2022] [Accepted: 02/23/2022] [Indexed: 11/17/2022]
Abstract
Simple Summary The diagnostic and treatment strategies of cancer remain generally suboptimal resulting in over-diagnosis or under-treatment. Though many attempts on optimizing treatment decisions by early prediction of disease progression have been undertaken, these efforts yielded only modest success so far due to the heterogeneity of cancer with multifactorial etiology. Here, we propose a deep-learning based data integration model capable of predicting disease progression by integrating collective information available through multiple studies with different cohorts and heterogeneous data types. The results have shown that the proposed data integration pipeline is able to identify disease progression with higher accuracy and robustness compared to using a single cohort, by offering a more complete picture of the specific disease on patients with brain, blood, and pancreatic cancers. Abstract Studies over the past decade have generated a wealth of molecular data that can be leveraged to better understand cancer risk, progression, and outcomes. However, understanding the progression risk and differentiating long- and short-term survivors cannot be achieved by analyzing data from a single modality due to the heterogeneity of disease. Using a scientifically developed and tested deep-learning approach that leverages aggregate information collected from multiple repositories with multiple modalities (e.g., mRNA, DNA Methylation, miRNA) could lead to a more accurate and robust prediction of disease progression. Here, we propose an autoencoder based multimodal data fusion system, in which a fusion encoder flexibly integrates collective information available through multiple studies with partially coupled data. Our results on a fully controlled simulation-based study have shown that inferring the missing data through the proposed data fusion pipeline allows a predictor that is superior to other baseline predictors with missing modalities. Results have further shown that short- and long-term survivors of glioblastoma multiforme, acute myeloid leukemia, and pancreatic adenocarcinoma can be successfully differentiated with an AUC of 0.94, 0.75, and 0.96, respectively.
Collapse
|
13
|
Withnell E, Zhang X, Sun K, Guo Y. XOmiVAE: an interpretable deep learning model for cancer classification using high-dimensional omics data. Brief Bioinform 2021; 22:bbab315. [PMID: 34402865 PMCID: PMC8575033 DOI: 10.1093/bib/bbab315] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 07/04/2021] [Accepted: 07/20/2021] [Indexed: 12/26/2022] Open
Abstract
The lack of explainability is one of the most prominent disadvantages of deep learning applications in omics. This 'black box' problem can undermine the credibility and limit the practical implementation of biomedical deep learning models. Here we present XOmiVAE, a variational autoencoder (VAE)-based interpretable deep learning model for cancer classification using high-dimensional omics data. XOmiVAE is capable of revealing the contribution of each gene and latent dimension for each classification prediction and the correlation between each gene and each latent dimension. It is also demonstrated that XOmiVAE can explain not only the supervised classification but also the unsupervised clustering results from the deep learning network. To the best of our knowledge, XOmiVAE is one of the first activation level-based interpretable deep learning models explaining novel clusters generated by VAE. The explainable results generated by XOmiVAE were validated by both the performance of downstream tasks and the biomedical knowledge. In our experiments, XOmiVAE explanations of deep learning-based cancer classification and clustering aligned with current domain knowledge including biological annotation and academic literature, which shows great potential for novel biomedical knowledge discovery from deep learning models.
Collapse
Affiliation(s)
- Eloise Withnell
- Data Science Institute Imperial College London, SW7 2AZ London, UK
- Department of Health Informatics University College London, WC1E 6BT London, UK
| | - Xiaoyu Zhang
- Data Science Institute Imperial College London, SW7 2AZ London, UK
| | - Kai Sun
- Data Science Institute Imperial College London, SW7 2AZ London, UK
| | - Yike Guo
- Data Science Institute Imperial College London, SW7 2AZ London, UK
- Department of Computer Science Hong Kong Baptist University, Hong Kong China
| |
Collapse
|
14
|
Rincón-Riveros A, Morales D, Rodríguez JA, Villegas VE, López-Kleine L. Bioinformatic Tools for the Analysis and Prediction of ncRNA Interactions. Int J Mol Sci 2021; 22:11397. [PMID: 34768830 PMCID: PMC8583695 DOI: 10.3390/ijms222111397] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 09/30/2021] [Accepted: 09/30/2021] [Indexed: 12/16/2022] Open
Abstract
Noncoding RNAs (ncRNAs) play prominent roles in the regulation of gene expression via their interactions with other biological molecules such as proteins and nucleic acids. Although much of our knowledge about how these ncRNAs operate in different biological processes has been obtained from experimental findings, computational biology can also clearly substantially boost this knowledge by suggesting possible novel interactions of these ncRNAs with other molecules. Computational predictions are thus used as an alternative source of new insights through a process of mutual enrichment because the information obtained through experiments continuously feeds through into computational methods. The results of these predictions in turn shed light on possible interactions that are subsequently validated experimentally. This review describes the latest advances in databases, bioinformatic tools, and new in silico strategies that allow the establishment or prediction of biological interactions of ncRNAs, particularly miRNAs and lncRNAs. The ncRNA species described in this work have a special emphasis on those found in humans, but information on ncRNA of other species is also included.
Collapse
Affiliation(s)
- Andrés Rincón-Riveros
- Bioinformatics and Systems Biology Group, Universidad Nacional de Colombia, Bogotá 111221, Colombia;
| | - Duvan Morales
- Centro de Investigaciones en Microbiología y Biotecnología-UR (CIMBIUR), Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá 111221, Colombia;
| | - Josefa Antonia Rodríguez
- Grupo de Investigación en Biología del Cáncer, Instituto Nacional de Cancerología, Bogotá 111221, Colombia;
| | - Victoria E. Villegas
- Centro de Investigaciones en Microbiología y Biotecnología-UR (CIMBIUR), Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá 111221, Colombia;
| | - Liliana López-Kleine
- Department of Statistics, Faculty of Science, Universidad Nacional de Colombia, Bogotá 111221, Colombia
| |
Collapse
|
15
|
Azimi SA, Afarideh H, Chai JS, Kalinowski M, Gheddou A, Hofman R. Classification of radioxenon spectra with deep learning algorithm. JOURNAL OF ENVIRONMENTAL RADIOACTIVITY 2021; 237:106718. [PMID: 34425549 DOI: 10.1016/j.jenvrad.2021.106718] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 08/07/2021] [Accepted: 08/09/2021] [Indexed: 06/13/2023]
Abstract
In this study, we propose for the first time a model of classification for Beta-Gamma coincidence radioxenon spectra using a deep learning approach through the convolution neural network (CNN) technique. We utilize the entire spectrum of actual data from a noble gas system in Charlottesville (USX75 station) between 2012 and 2019. This study shows that the deep learning categorization can be done as an important pre-screening method without directly involving critical limits and abnormal thresholds. Our results demonstrate that the proposed approach of combining nuclear engineering and deep learning is a promising tool for assisting experts in accelerating and optimizing the review process of clean background and CTBT-relevant samples with high classification average accuracies of 92% and 98%, respectively.
Collapse
Affiliation(s)
- Sepideh Alsadat Azimi
- Amirkabir University of Technology, Faculty of Physics and Energy Engineering, No. 350, Hafez Ave, Valiasr Square, Tehran, Iran.
| | - Hossein Afarideh
- Amirkabir University of Technology, Faculty of Physics and Energy Engineering, No. 350, Hafez Ave, Valiasr Square, Tehran, Iran.
| | - Jong-Seo Chai
- Sungkyunkwan University, College of Information & Communication Engineering, Suwon-si, South Korea.
| | - Martin Kalinowski
- Preparatory Commission for the Comprehensive Nuclear-Test-Ban-Treaty Organization, Provisional Technical Secretariat, VIC, Vienna, Austria.
| | - Abdelhakim Gheddou
- Preparatory Commission for the Comprehensive Nuclear-Test-Ban-Treaty Organization, Provisional Technical Secretariat, VIC, Vienna, Austria.
| | - Radek Hofman
- Preparatory Commission for the Comprehensive Nuclear-Test-Ban-Treaty Organization, Provisional Technical Secretariat, VIC, Vienna, Austria.
| |
Collapse
|
16
|
Nagy M, Radakovich N, Nazha A. Machine Learning in Oncology: What Should Clinicians Know? JCO Clin Cancer Inform 2021; 4:799-810. [PMID: 32926637 DOI: 10.1200/cci.20.00049] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The volume and complexity of scientific and clinical data in oncology have grown markedly over recent years, including but not limited to the realms of electronic health data, radiographic and histologic data, and genomics. This growth holds promise for a deeper understanding of malignancy and, accordingly, more personalized and effective oncologic care. Such goals require, however, the development of new methods to fully make use of the wealth of available data. Improvements in computer processing power and algorithm development have positioned machine learning, a branch of artificial intelligence, to play a prominent role in oncology research and practice. This review provides an overview of the basics of machine learning and highlights current progress and challenges in applying this technology to cancer diagnosis, prognosis, and treatment recommendations, including a discussion of current takeaways for clinicians.
Collapse
Affiliation(s)
- Matthew Nagy
- Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH
| | - Nathan Radakovich
- Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH
| | - Aziz Nazha
- Center for Clinical Artificial Intelligence, Cleveland Clinic, Cleveland, OH.,Department of Hematology and Medical Oncology, Cleveland Clinic, Cleveland, OH
| |
Collapse
|
17
|
Zhang X, Xing Y, Sun K, Guo Y. OmiEmbed: A Unified Multi-Task Deep Learning Framework for Multi-Omics Data. Cancers (Basel) 2021; 13:3047. [PMID: 34207255 PMCID: PMC8235477 DOI: 10.3390/cancers13123047] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/12/2021] [Accepted: 06/16/2021] [Indexed: 02/06/2023] Open
Abstract
High-dimensional omics data contain intrinsic biomedical information that is crucial for personalised medicine. Nevertheless, it is challenging to capture them from the genome-wide data, due to the large number of molecular features and small number of available samples, which is also called "the curse of dimensionality" in machine learning. To tackle this problem and pave the way for machine learning-aided precision medicine, we proposed a unified multi-task deep learning framework named OmiEmbed to capture biomedical information from high-dimensional omics data with the deep embedding and downstream task modules. The deep embedding module learnt an omics embedding that mapped multiple omics data types into a latent space with lower dimensionality. Based on the new representation of multi-omics data, different downstream task modules were trained simultaneously and efficiently with the multi-task strategy to predict the comprehensive phenotype profile of each sample. OmiEmbed supports multiple tasks for omics data including dimensionality reduction, tumour type classification, multi-omics integration, demographic and clinical feature reconstruction, and survival prediction. The framework outperformed other methods on all three types of downstream tasks and achieved better performance with the multi-task strategy compared to training them individually. OmiEmbed is a powerful and unified framework that can be widely adapted to various applications of high-dimensional omics data and has great potential to facilitate more accurate and personalised clinical decision making.
Collapse
Affiliation(s)
- Xiaoyu Zhang
- Data Science Institute, Imperial College London, London SW7 2AZ, UK; (Y.X.); (K.S.)
| | - Yuting Xing
- Data Science Institute, Imperial College London, London SW7 2AZ, UK; (Y.X.); (K.S.)
| | - Kai Sun
- Data Science Institute, Imperial College London, London SW7 2AZ, UK; (Y.X.); (K.S.)
| | - Yike Guo
- Data Science Institute, Imperial College London, London SW7 2AZ, UK; (Y.X.); (K.S.)
- Department of Computer Science, Hong Kong Baptist University, Hong Kong 999077, China
| |
Collapse
|
18
|
Zhou K, Arslanturk S, Craig DB, Heath E, Draghici S. Discovery of primary prostate cancer biomarkers using cross cancer learning. Sci Rep 2021; 11:10433. [PMID: 34001952 PMCID: PMC8128891 DOI: 10.1038/s41598-021-89789-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 04/30/2021] [Indexed: 02/03/2023] Open
Abstract
Prostate cancer (PCa), the second leading cause of cancer death in American men, is a relatively slow-growing malignancy with multiple early treatment options. Yet, a significant number of low-risk PCa patients are over-diagnosed and over-treated with significant and long-term quality of life effects. Further, there is ever increasing evidence of metastasis and higher mortality when hormone-sensitive or castration-resistant PCa tumors are treated indistinctively. Hence, the critical need is to discover clinically-relevant and actionable PCa biomarkers by better understanding the biology of PCa. In this paper, we have discovered novel biomarkers of PCa tumors through cross-cancer learning by leveraging the pathological and molecular similarities in the DNA repair pathways of ovarian, prostate, and breast cancer tumors. Cross-cancer disease learning enriches the study population and identifies genetic/phenotypic commonalities that are important across diseases with pathological and molecular similarities. Our results show that ADIRF, SLC2A5, C3orf86, HSPA1B are among the most significant PCa biomarkers, while MTRNR2L1, EEPD1, TEPP and VN1R2 are jointly important biomarkers across prostate, breast and ovarian cancers. Our validation results have further shown that the discovered biomarkers can predict the disease state better than any randomly selected subset of differentially expressed prostate cancer genes.
Collapse
Affiliation(s)
- Kaiyue Zhou
- Department of Computer Science, Wayne State University, Detroit, 48201, USA
| | - Suzan Arslanturk
- Department of Computer Science, Wayne State University, Detroit, 48201, USA.
| | - Douglas B Craig
- Department of Oncology, Wayne State University, Detroit, 48201, USA
- Bioinformatics and Biostatistics Core, Barbara Ann Karmanos Cancer Institute, Detroit, 48201, USA
| | - Elisabeth Heath
- Department of Oncology, Wayne State University, Detroit, 48201, USA
- Molecular Therapeutics Program, Barbara Ann Karmanos Cancer Institute, Detroit, 48201, USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, 48201, USA
| |
Collapse
|
19
|
Improving prediction for medical institution with limited patient data: Leveraging hospital-specific data based on multicenter collaborative research network. Artif Intell Med 2021; 113:102024. [PMID: 33685587 DOI: 10.1016/j.artmed.2021.102024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 11/25/2020] [Accepted: 01/18/2021] [Indexed: 12/18/2022]
Abstract
BACKGROUND AND OBJECTIVE Clinical decision support assisted by prediction models usually faces the challenges of limited clinical data and a lack of labels when the model is developed with data from a single medical institution. Accordingly, research on multicenter clinical collaborative networks, which can provide external medical data, has received increasing attention. With the increasing availability of machine learning techniques such as transfer learning, leveraging large-scale patient data from multiple hospitals to build data-driven predictive models with clinical application potential provides an alternative solution to address the problem of limited patient data. METHODS A multicenter hybrid semi-supervised transfer learning model (MHSTL) is proposed in this study on the basis of unified common data model to ensure multicenter data standardized representation. Then the hospital-specific features, along with the co-occurrence features across domains, are aligned through a representation learning architecture that is built based on deep neural networks and the newly proposed neural decision forest model. In this process, limited patient data from the target hospital, both labeled and unlabeled, are incorporated during the feature adaptation process, thereby contributing to better model performance. Without patient-level data sharing, the proposed model learning strategy which overcomes feature misalignment and distribution divergence, enables the multi-source transfer learning process in the case of insufficient and unlabeled patient data at target hospital. RESULTS The effectiveness of the proposed transfer learning model was evaluated on a collaborative research network of colorectal cancer patients in the US and China. The results demonstrate that the proposed model can achieve much better performance for predicting target risk with limited resources on patient data than baseline models . Better discrimination and calibration ability are also observed when sufficient labeled data are not available in the target hospital for prognosis prediction tasks . Further exploratory experiments show that the proposed approach exhibits good model generalizability regardless of the data heterogeneity. With the help of the SHapley Additive exPlanations for model interpretation, the effectiveness of incorporating hospital-specific features in the transfer learning model is shown. CONCLUSIONS In this study, the proposed method can develop prediction models from multiple source hospitals and exhibit good performance by leveraging cross-domain hospital-specific feature information, therefore enhancing the model prediction when applied to single medical institution with limited patient data.
Collapse
|
20
|
Koumakis L. Deep learning models in genomics; are we there yet? Comput Struct Biotechnol J 2020; 18:1466-1473. [PMID: 32637044 PMCID: PMC7327302 DOI: 10.1016/j.csbj.2020.06.017] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 06/07/2020] [Accepted: 06/08/2020] [Indexed: 12/23/2022] Open
Abstract
With the evolution of biotechnology and the introduction of the high throughput sequencing, researchers have the ability to produce and analyze vast amounts of genomics data. Since genomics produce big data, most of the bioinformatics algorithms are based on machine learning methodologies, and lately deep learning, to identify patterns, make predictions and model the progression or treatment of a disease. Advances in deep learning created an unprecedented momentum in biomedical informatics and have given rise to new bioinformatics and computational biology research areas. It is evident that deep learning models can provide higher accuracies in specific tasks of genomics than the state of the art methodologies. Given the growing trend on the application of deep learning architectures in genomics research, in this mini review we outline the most prominent models, we highlight possible pitfalls and discuss future directions. We foresee deep learning accelerating changes in the area of genomics, especially for multi-scale and multimodal data analysis for precision medicine.
Collapse
Affiliation(s)
- Lefteris Koumakis
- Foundation for Research and Technology - Hellas (FORTH), Institute of Computer Science, Heraklion, Crete, Greece
| |
Collapse
|
21
|
Tkachev V, Sorokin M, Borisov C, Garazha A, Buzdin A, Borisov N. Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology. Int J Mol Sci 2020; 21:ijms21030713. [PMID: 31979006 PMCID: PMC7037338 DOI: 10.3390/ijms21030713] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 01/16/2020] [Accepted: 01/17/2020] [Indexed: 12/21/2022] Open
Abstract
(1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irrelevant features. (2) Methods: Here, we applied FloWPS to seven popular ML methods, including linear SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naïve Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). (3) Results: We performed computational experiments for 21 high throughput gene expression datasets (41–235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FloWPS essentially improved the classifier quality for all global ML methods (SVM, RF, BNB, ADA, MLP), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61–0.88 range to 0.70–0.94. We tested FloWPS-empowered methods for overtraining by interrogating the importance of different features for different ML methods in the same model datasets. (4) Conclusions: We showed that FloWPS increases the correlation of feature importance between the different ML methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology.
Collapse
Affiliation(s)
- Victor Tkachev
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
| | - Maxim Sorokin
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
- Institute for Personailzed Medicine, I.M. Sechenov First Moscow State Medical University, 119991 Moscow, Russia
| | - Constantin Borisov
- National Research University—Higher School of Economics, 101000 Moscow, Russia;
| | - Andrew Garazha
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
| | - Anton Buzdin
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
- Institute for Personailzed Medicine, I.M. Sechenov First Moscow State Medical University, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Moscow Oblast, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, 117997 Moscow, Russia
| | - Nicolas Borisov
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
- Institute for Personailzed Medicine, I.M. Sechenov First Moscow State Medical University, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Moscow Oblast, Russia
- Correspondence: ; Tel.: +7-903-218-7261
| |
Collapse
|
22
|
Moridi M, Ghadirinia M, Sharifi-Zarchi A, Zare-Mirakabad F. The assessment of efficient representation of drug features using deep learning for drug repositioning. BMC Bioinformatics 2019; 20:577. [PMID: 31726977 PMCID: PMC6854697 DOI: 10.1186/s12859-019-3165-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Accepted: 10/21/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND De novo drug discovery is a time-consuming and expensive process. Nowadays, drug repositioning is utilized as a common strategy to discover a new drug indication for existing drugs. This strategy is mostly used in cases with a limited number of candidate pairs of drugs and diseases. In other words, they are not scalable to a large number of drugs and diseases. Most of the in-silico methods mainly focus on linear approaches while non-linear models are still scarce for new indication predictions. Therefore, applying non-linear computational approaches can offer an opportunity to predict possible drug repositioning candidates. RESULTS In this study, we present a non-linear method for drug repositioning. We extract four drug features and two disease features to find the semantic relations between drugs and diseases. We utilize deep learning to extract an efficient representation for each feature. These representations reduce the dimension and heterogeneity of biological data. Then, we assess the performance of different combinations of drug features to introduce a pipeline for drug repositioning. In the available database, there are different numbers of known drug-disease associations corresponding to each combination of drug features. Our assessment shows that as the numbers of drug features increase, the numbers of available drugs decrease. Thus, the proposed method with large numbers of drug features is as accurate as small numbers. CONCLUSION Our pipeline predicts new indications for existing drugs systematically, in a more cost-effective way and shorter timeline. We assess the pipeline to discover the potential drug-disease associations based on cross-validation experiments and some clinical trial studies.
Collapse
Affiliation(s)
- Mahroo Moridi
- Department of Mathematics and Computer Science, Amirkabir University of Technology, (Tehran Polytechnic), Tehran, Iran
| | - Marzieh Ghadirinia
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Ali Sharifi-Zarchi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Fatemeh Zare-Mirakabad
- Department of Mathematics and Computer Science, Amirkabir University of Technology, (Tehran Polytechnic), Tehran, Iran.
| |
Collapse
|