1
|
Lian Y, Shi Y, Shang H, Zhan H. Predicting Treatment Outcomes in Patients with Low Back Pain Using Gene Signature-Based Machine Learning Models. Pain Ther 2024:10.1007/s40122-024-00700-8. [PMID: 39722081 DOI: 10.1007/s40122-024-00700-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Accepted: 12/11/2024] [Indexed: 12/28/2024] Open
Abstract
INTRODUCTION Low back pain (LBP) is a significant global health burden, with variable treatment outcomes and an unclear underlying molecular mechanism. Effective prediction of treatment responses remains a challenge. In this study, we aimed to develop gene signature-based machine learning models using transcriptomic data from peripheral immune cells to predict treatment outcomes in patients with LBP. METHODS The transcriptomic data of patients with LBP from peripheral immune cells were retrieved from the GEO database. Patients with LBP were recruited, and treatment outcomes were assessed after 3 months. Patients were classified into two groups: those with resolved pain and those with persistent pain. Differentially expressed genes (DEGs) between the two groups were identified through bioinformatic analysis. Key genes were selected using five machine learning models, including Lasso, Elastic Net, Random Forest, SVM, and GBM. These key genes were then used to train 45 machine learning models by combining nine different algorithms: Logistic Regression, K-Nearest Neighbors, Support Vector Machine, Decision Tree, Random Forest, Gradient Boosting Machine, Multilayer Perceptron, Naive Bayes, and Linear Discriminant Analysis. Five-fold cross-validation was employed to ensure robust model evaluation and minimize overfitting. In each fold, the dataset was split into training and validation sets, with model performance assessed using multiple metrics including accuracy, precision, recall, and F1 score. The final model performance was reported as the mean and standard deviation across all five folds, providing a more reliable estimate of the models' ability to predict LBP treatment outcomes using gene expression data from peripheral immune cells. RESULTS A total of 61 DEGs were identified between patients with resolved and persistent pain. From these genes, 45 machine learning models were constructed using different combinations of feature selection methods and classification algorithms. The Elastic Net with Logistic Regression achieved the highest accuracy of 88.7% ± 8.0% (mean ± standard deviation), followed closely by Elastic Net with Linear Discriminant Analysis (88.7% ± 7.5%) and Lasso with Multilayer Perceptron (87.7% ± 6.7%). Overall, 15 models demonstrated robust performance with accuracy > 80%, suggesting the reliability of our machine learning approach in predicting LBP treatment outcomes. The SHapley Additive exPlanations (SHAP) method was used to visualize the contribution of core genes to model performance, highlighting their roles in predicting treatment outcomes. CONCLUSION The study demonstrates the potential of using transcriptomic data from peripheral immune cells and machine learning models to predict treatment outcomes in patients with LBP. The identification of key genes and the high accuracy of certain models provide a basis for future personalized treatment strategies in LBP management. Visualizing gene importance with SHAP adds interpretability to the predictive models, enhancing their clinical relevance.
Collapse
Affiliation(s)
- Youzhi Lian
- Baoshan Hospital Affiliated to Shanghai University of Chinese Medicine, Shanghai, 201999, China
- Baoshan District Integrated Traditional Chinese and Western Medicine Hospital, Shanghai, 201999, China
| | - Yinyu Shi
- Shanghai University of Traditional Chinese Medicine Affiliated Shuguang Hospital, Shanghai, 200021, China
- Shi's Orthopedic Medical Center, Shanghai, 200021, China
| | - Haibin Shang
- Baoshan Hospital Affiliated to Shanghai University of Chinese Medicine, Shanghai, 201999, China
- Baoshan District Integrated Traditional Chinese and Western Medicine Hospital, Shanghai, 201999, China
| | - Hongsheng Zhan
- Shanghai University of Traditional Chinese Medicine Affiliated Shuguang Hospital, Shanghai, 200021, China.
- Shi's Orthopedic Medical Center, Shanghai, 200021, China.
| |
Collapse
|
2
|
Ma W, Tang W, Kwok JS, Tong AH, Lo CW, Chu AT, Chung BH. A review on trends in development and translation of omics signatures in cancer. Comput Struct Biotechnol J 2024; 23:954-971. [PMID: 38385061 PMCID: PMC10879706 DOI: 10.1016/j.csbj.2024.01.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 01/31/2024] [Accepted: 01/31/2024] [Indexed: 02/23/2024] Open
Abstract
The field of cancer genomics and transcriptomics has evolved from targeted profiling to swift sequencing of individual tumor genome and transcriptome. The steady growth in genome, epigenome, and transcriptome datasets on a genome-wide scale has significantly increased our capability in capturing signatures that represent both the intrinsic and extrinsic biological features of tumors. These biological differences can help in precise molecular subtyping of cancer, predicting tumor progression, metastatic potential, and resistance to therapeutic agents. In this review, we summarized the current development of genomic, methylomic, transcriptomic, proteomic and metabolic signatures in the field of cancer research and highlighted their potentials in clinical applications to improve diagnosis, prognosis, and treatment decision in cancer patients.
Collapse
Affiliation(s)
- Wei Ma
- Hong Kong Genome Institute, Hong Kong, China
| | - Wenshu Tang
- Hong Kong Genome Institute, Hong Kong, China
| | | | | | | | | | - Brian H.Y. Chung
- Hong Kong Genome Institute, Hong Kong, China
- Department of Pediatrics and Adolescent Medicine, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Hong Kong Genome Project
- Hong Kong Genome Institute, Hong Kong, China
- Department of Pediatrics and Adolescent Medicine, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
3
|
Clark AJ, Lillard JW. A Comprehensive Review of Bioinformatics Tools for Genomic Biomarker Discovery Driving Precision Oncology. Genes (Basel) 2024; 15:1036. [PMID: 39202397 PMCID: PMC11353282 DOI: 10.3390/genes15081036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 07/25/2024] [Accepted: 07/30/2024] [Indexed: 09/03/2024] Open
Abstract
The rapid advancement of high-throughput technologies, particularly next-generation sequencing (NGS), has revolutionized cancer research by enabling the investigation of genetic variations such as SNPs, copy number variations, gene expression, and protein levels. These technologies have elevated the significance of precision oncology, creating a demand for biomarker identification and validation. This review explores the complex interplay of oncology, cancer biology, and bioinformatics tools, highlighting the challenges in statistical learning, experimental validation, data processing, and quality control that underpin this transformative field. This review outlines the methodologies and applications of bioinformatics tools in cancer genomics research, encompassing tools for data structuring, pathway analysis, network analysis, tools for analyzing biomarker signatures, somatic variant interpretation, genomic data analysis, and visualization tools. Open-source tools and repositories like The Cancer Genome Atlas (TCGA), Genomic Data Commons (GDC), cBioPortal, UCSC Genome Browser, Array Express, and Gene Expression Omnibus (GEO) have emerged to streamline cancer omics data analysis. Bioinformatics has significantly impacted cancer research, uncovering novel biomarkers, driver mutations, oncogenic pathways, and therapeutic targets. Integrating multi-omics data, network analysis, and advanced ML will be pivotal in future biomarker discovery and patient prognosis prediction.
Collapse
Affiliation(s)
| | - James W. Lillard
- Department of Microbiology, Biochemistry, and Immunology, Morehouse School of Medicine, Atlanta, GA 30310, USA;
| |
Collapse
|
4
|
Jeong Y, Chu J, Kang J, Baek S, Lee JH, Jung DS, Kim WW, Kim YR, Kang J, Do IG. Application of Transcriptome-Based Gene Set Featurization for Machine Learning Model to Predict the Origin of Metastatic Cancer. Curr Issues Mol Biol 2024; 46:7291-7302. [PMID: 39057073 PMCID: PMC11276602 DOI: 10.3390/cimb46070432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 07/03/2024] [Accepted: 07/03/2024] [Indexed: 07/28/2024] Open
Abstract
Identifying the primary site of origin of metastatic cancer is vital for guiding treatment decisions, especially for patients with cancer of unknown primary (CUP). Despite advanced diagnostic techniques, CUP remains difficult to pinpoint and is responsible for a considerable number of cancer-related fatalities. Understanding its origin is crucial for effective management and potentially improving patient outcomes. This study introduces a machine learning framework, ONCOfind-AI, that leverages transcriptome-based gene set features to enhance the accuracy of predicting the origin of metastatic cancers. We demonstrate its potential to facilitate the integration of RNA sequencing and microarray data by using gene set scores for characterization of transcriptome profiles generated from different platforms. Integrating data from different platforms resulted in improved accuracy of machine learning models for predicting cancer origins. We validated our method using external data from clinical samples collected through the Kangbuk Samsung Medical Center and Gene Expression Omnibus. The external validation results demonstrate a top-1 accuracy ranging from 0.80 to 0.86, with a top-2 accuracy of 0.90. This study highlights that incorporating biological knowledge through curated gene sets can help to merge gene expression data from different platforms, thereby enhancing the compatibility needed to develop more effective machine learning prediction models.
Collapse
Affiliation(s)
- Yeonuk Jeong
- Oncocross Ltd., Seoul 04168, Republic of Korea (W.-W.K.); (Y.-R.K.)
| | - Jinah Chu
- Department of Pathology, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Republic of Korea;
| | - Juwon Kang
- Oncocross Ltd., Seoul 04168, Republic of Korea (W.-W.K.); (Y.-R.K.)
- Yonsei Institute of Pharmaceutical Sciences, College of Pharmacy, Yonsei University, Incheon 21983, Republic of Korea
| | - Seungjun Baek
- Oncocross Ltd., Seoul 04168, Republic of Korea (W.-W.K.); (Y.-R.K.)
| | - Jae-Hak Lee
- Oncocross Ltd., Seoul 04168, Republic of Korea (W.-W.K.); (Y.-R.K.)
| | - Dong-Sub Jung
- Oncocross Ltd., Seoul 04168, Republic of Korea (W.-W.K.); (Y.-R.K.)
| | - Won-Woo Kim
- Oncocross Ltd., Seoul 04168, Republic of Korea (W.-W.K.); (Y.-R.K.)
| | - Yi-Rang Kim
- Oncocross Ltd., Seoul 04168, Republic of Korea (W.-W.K.); (Y.-R.K.)
| | - Jihoon Kang
- Oncocross Ltd., Seoul 04168, Republic of Korea (W.-W.K.); (Y.-R.K.)
| | - In-Gu Do
- Department of Pathology, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Republic of Korea;
| |
Collapse
|
5
|
Xie J, Chen Y, Luo S, Yang W, Lin Y, Wang L, Ding X, Tong M, Yu R. Tracing unknown tumor origins with a biological-pathway-based transformer model. CELL REPORTS METHODS 2024; 4:100797. [PMID: 38889685 PMCID: PMC11228371 DOI: 10.1016/j.crmeth.2024.100797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/01/2024] [Accepted: 05/21/2024] [Indexed: 06/20/2024]
Abstract
Cancer of unknown primary (CUP) represents metastatic cancer where the primary site remains unidentified despite standard diagnostic procedures. To determine the tumor origin in such cases, we developed BPformer, a deep learning method integrating the transformer model with prior knowledge of biological pathways. Trained on transcriptomes from 10,410 primary tumors across 32 cancer types, BPformer achieved remarkable accuracy rates of 94%, 92%, and 89% in primary tumors and primary and metastatic sites of metastatic tumors, respectively, surpassing existing methods. Additionally, BPformer was validated in a retrospective study, demonstrating consistency with tumor sites diagnosed through immunohistochemistry and histopathology. Furthermore, BPformer was able to rank pathways based on their contribution to tumor origin identification, which helped to classify oncogenic signaling pathways into those that are highly conservative among different cancers versus those that are highly variable depending on their origins.
Collapse
Affiliation(s)
- Jiajing Xie
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Ying Chen
- School of Informatics, Xiamen University, Xiamen, Fujian 361005, China
| | - Shijie Luo
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Wenxian Yang
- Aginome Scientific, Xiamen, Fujian 361005, China
| | - Yuxiang Lin
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Liansheng Wang
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China; School of Informatics, Xiamen University, Xiamen, Fujian 361005, China
| | - Xin Ding
- Department of Pathology, Zhongshan Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, Fujian 361004, China.
| | - Mengsha Tong
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China; State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian 361102, China.
| | - Rongshan Yu
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China; School of Informatics, Xiamen University, Xiamen, Fujian 361005, China; Aginome Scientific, Xiamen, Fujian 361005, China.
| |
Collapse
|
6
|
Darmofal M, Suman S, Atwal G, Toomey M, Chen JF, Chang JC, Vakiani E, Varghese AM, Balakrishnan Rema A, Syed A, Schultz N, Berger MF, Morris Q. Deep-Learning Model for Tumor-Type Prediction Using Targeted Clinical Genomic Sequencing Data. Cancer Discov 2024; 14:1064-1081. [PMID: 38416134 PMCID: PMC11145170 DOI: 10.1158/2159-8290.cd-23-0996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/07/2023] [Accepted: 02/23/2024] [Indexed: 02/29/2024]
Abstract
Tumor type guides clinical treatment decisions in cancer, but histology-based diagnosis remains challenging. Genomic alterations are highly diagnostic of tumor type, and tumor-type classifiers trained on genomic features have been explored, but the most accurate methods are not clinically feasible, relying on features derived from whole-genome sequencing (WGS), or predicting across limited cancer types. We use genomic features from a data set of 39,787 solid tumors sequenced using a clinically targeted cancer gene panel to develop Genome-Derived-Diagnosis Ensemble (GDD-ENS): a hyperparameter ensemble for classifying tumor type using deep neural networks. GDD-ENS achieves 93% accuracy for high-confidence predictions across 38 cancer types, rivaling the performance of WGS-based methods. GDD-ENS can also guide diagnoses of rare type and cancers of unknown primary and incorporate patient-specific clinical information for improved predictions. Overall, integrating GDD-ENS into prospective clinical sequencing workflows could provide clinically relevant tumor-type predictions to guide treatment decisions in real time. SIGNIFICANCE We describe a highly accurate tumor-type prediction model, designed specifically for clinical implementation. Our model relies only on widely used cancer gene panel sequencing data, predicts across 38 distinct cancer types, and supports integration of patient-specific nongenomic information for enhanced decision support in challenging diagnostic situations. See related commentary by Garg, p. 906. This article is featured in Selected Articles from This Issue, p. 897.
Collapse
Affiliation(s)
- Madison Darmofal
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, New York
| | - Shalabh Suman
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Gurnit Atwal
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
| | - Michael Toomey
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, New York
| | - Jie-Fu Chen
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Jason C. Chang
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Efsevia Vakiani
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Anna M. Varghese
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | | | - Aijazuddin Syed
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Nikolaus Schultz
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Michael F. Berger
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Quaid Morris
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York
| |
Collapse
|
7
|
Lotter W, Hassett MJ, Schultz N, Kehl KL, Van Allen EM, Cerami E. Artificial Intelligence in Oncology: Current Landscape, Challenges, and Future Directions. Cancer Discov 2024; 14:711-726. [PMID: 38597966 PMCID: PMC11131133 DOI: 10.1158/2159-8290.cd-23-1199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/29/2024] [Accepted: 02/28/2024] [Indexed: 04/11/2024]
Abstract
Artificial intelligence (AI) in oncology is advancing beyond algorithm development to integration into clinical practice. This review describes the current state of the field, with a specific focus on clinical integration. AI applications are structured according to cancer type and clinical domain, focusing on the four most common cancers and tasks of detection, diagnosis, and treatment. These applications encompass various data modalities, including imaging, genomics, and medical records. We conclude with a summary of existing challenges, evolving solutions, and potential future directions for the field. SIGNIFICANCE AI is increasingly being applied to all aspects of oncology, where several applications are maturing beyond research and development to direct clinical integration. This review summarizes the current state of the field through the lens of clinical translation along the clinical care continuum. Emerging areas are also highlighted, along with common challenges, evolving solutions, and potential future directions for the field.
Collapse
Affiliation(s)
- William Lotter
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Michael J. Hassett
- Harvard Medical School, Boston, MA, USA
- Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Nikolaus Schultz
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center; New York, NY, USA
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Kenneth L. Kehl
- Harvard Medical School, Boston, MA, USA
- Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Eliezer M. Van Allen
- Harvard Medical School, Boston, MA, USA
- Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ethan Cerami
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
8
|
Singh J, Khanna NN, Rout RK, Singh N, Laird JR, Singh IM, Kalra MK, Mantella LE, Johri AM, Isenovic ER, Fouda MM, Saba L, Fatemi M, Suri JS. GeneAI 3.0: powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides. Sci Rep 2024; 14:7154. [PMID: 38531923 PMCID: PMC11344070 DOI: 10.1038/s41598-024-56786-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/11/2024] [Indexed: 03/28/2024] Open
Abstract
Due to the intricate relationship between the small non-coding ribonucleic acid (miRNA) sequences, the classification of miRNA species, namely Human, Gorilla, Rat, and Mouse is challenging. Previous methods are not robust and accurate. In this study, we present AtheroPoint's GeneAI 3.0, a powerful, novel, and generalized method for extracting features from the fixed patterns of purines and pyrimidines in each miRNA sequence in ensemble paradigms in machine learning (EML) and convolutional neural network (CNN)-based deep learning (EDL) frameworks. GeneAI 3.0 utilized five conventional (Entropy, Dissimilarity, Energy, Homogeneity, and Contrast), and three contemporary (Shannon entropy, Hurst exponent, Fractal dimension) features, to generate a composite feature set from given miRNA sequences which were then passed into our ML and DL classification framework. A set of 11 new classifiers was designed consisting of 5 EML and 6 EDL for binary/multiclass classification. It was benchmarked against 9 solo ML (SML), 6 solo DL (SDL), 12 hybrid DL (HDL) models, resulting in a total of 11 + 27 = 38 models were designed. Four hypotheses were formulated and validated using explainable AI (XAI) as well as reliability/statistical tests. The order of the mean performance using accuracy (ACC)/area-under-the-curve (AUC) of the 24 DL classifiers was: EDL > HDL > SDL. The mean performance of EDL models with CNN layers was superior to that without CNN layers by 0.73%/0.92%. Mean performance of EML models was superior to SML models with improvements of ACC/AUC by 6.24%/6.46%. EDL models performed significantly better than EML models, with a mean increase in ACC/AUC of 7.09%/6.96%. The GeneAI 3.0 tool produced expected XAI feature plots, and the statistical tests showed significant p-values. Ensemble models with composite features are highly effective and generalized models for effectively classifying miRNA sequences.
Collapse
Affiliation(s)
- Jaskaran Singh
- Department of Computer Science, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India
| | - Narendra N Khanna
- Department of Cardiology, Indraprastha APOLLO Hospitals, New Delhi, India
| | - Ranjeet K Rout
- Department of Computer Science and Engineering, NIT Srinagar, Hazratbal, Srinagar, India
| | - Narpinder Singh
- Department of Food Science, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India
| | - John R Laird
- Heart and Vascular Institute, Adventist Health St. Helena, St Helena, CA, USA
| | - Inder M Singh
- Advanced Cardiac and Vascular Institute, Sacramento, CA, USA
| | - Mannudeep K Kalra
- Department of Radiology, Massachusetts General Hospital, Boston, MA, 02115, USA
| | - Laura E Mantella
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada
| | - Amer M Johri
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada
| | - Esma R Isenovic
- Laboratory for Molecular Genetics and Radiobiology, University of Belgrade, Belgrade, Serbia
| | - Mostafa M Fouda
- Department of Electrical and Computer Engineering, Idaho State University, Pocatello, ID, 83209, USA
| | - Luca Saba
- Department of Neurology, University of Cagliari, Cagliari, Italy
| | - Mostafa Fatemi
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, 55905, USA
| | - Jasjit S Suri
- Stroke Monitoring and Diagnostic Division, AtheroPoint LLC, Roseville, CA, 95661, USA.
| |
Collapse
|
9
|
Rydzewski NR, Shi Y, Li C, Chrostek MR, Bakhtiar H, Helzer KT, Bootsma ML, Berg TJ, Harari PM, Floberg JM, Blitzer GC, Kosoff D, Taylor AK, Sharifi MN, Yu M, Lang JM, Patel KR, Citrin DE, Sundling KE, Zhao SG. A platform-independent AI tumor lineage and site (ATLAS) classifier. Commun Biol 2024; 7:314. [PMID: 38480799 PMCID: PMC10937974 DOI: 10.1038/s42003-024-05981-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 02/27/2024] [Indexed: 03/17/2024] Open
Abstract
Histopathologic diagnosis and classification of cancer plays a critical role in guiding treatment. Advances in next-generation sequencing have ushered in new complementary molecular frameworks. However, existing approaches do not independently assess both site-of-origin (e.g. prostate) and lineage (e.g. adenocarcinoma) and have minimal validation in metastatic disease, where classification is more difficult. Utilizing gradient-boosted machine learning, we developed ATLAS, a pair of separate AI Tumor Lineage and Site-of-origin models from RNA expression data on 8249 tumor samples. We assessed performance independently in 10,376 total tumor samples, including 1490 metastatic samples, achieving an accuracy of 91.4% for cancer site-of-origin and 97.1% for cancer lineage. High confidence predictions (encompassing the majority of cases) were accurate 98-99% of the time in both localized and remarkably even in metastatic samples. We also identified emergent properties of our lineage scores for tumor types on which the model was never trained (zero-shot learning). Adenocarcinoma/sarcoma lineage scores differentiated epithelioid from biphasic/sarcomatoid mesothelioma. Also, predicted lineage de-differentiation identified neuroendocrine/small cell tumors and was associated with poor outcomes across tumor types. Our platform-independent single-sample approach can be easily translated to existing RNA-seq platforms. ATLAS can complement and guide traditional histopathologic assessment in challenging situations and tumors of unknown primary.
Collapse
Affiliation(s)
- Nicholas R Rydzewski
- Radiation Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Yue Shi
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Chenxuan Li
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | | | - Hamza Bakhtiar
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Kyle T Helzer
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Matthew L Bootsma
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Tracy J Berg
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Paul M Harari
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
| | - John M Floberg
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
| | - Grace C Blitzer
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
| | - David Kosoff
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Medicine, University of Wisconsin, Madison, WI, USA
| | - Amy K Taylor
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Medicine, University of Wisconsin, Madison, WI, USA
| | - Marina N Sharifi
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Medicine, University of Wisconsin, Madison, WI, USA
| | - Menggang Yu
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
| | - Joshua M Lang
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Medicine, University of Wisconsin, Madison, WI, USA
| | - Krishnan R Patel
- Radiation Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Deborah E Citrin
- Radiation Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kaitlin E Sundling
- Department of Pathology and Laboratory Medicine, University of Wisconsin, Madison, WI, USA
- Wisconsin State Laboratory of Hygiene, University of Wisconsin, Madison, WI, USA
| | - Shuang G Zhao
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA.
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA.
- William S. Middleton Veterans Hospital, Madison, WI, USA.
| |
Collapse
|
10
|
Qin S, Sun S, Wang Y, Li C, Fu L, Wu M, Yan J, Li W, Lv J, Chen L. Immune, metabolic landscapes of prognostic signatures for lung adenocarcinoma based on a novel deep learning framework. Sci Rep 2024; 14:527. [PMID: 38177198 PMCID: PMC10767103 DOI: 10.1038/s41598-023-51108-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/30/2023] [Indexed: 01/06/2024] Open
Abstract
Lung adenocarcinoma (LUAD) is a malignant tumor with high lethality, and the aim of this study was to identify promising biomarkers for LUAD. Using the TCGA-LUAD dataset as a discovery cohort, a novel joint framework VAEjMLP based on variational autoencoder (VAE) and multilayer perceptron (MLP) was proposed. And the Shapley Additive Explanations (SHAP) method was introduced to evaluate the contribution of feature genes to the classification decision, which helped us to develop a biologically meaningful biomarker potential scoring algorithm. Nineteen potential biomarkers for LUAD were identified, which were involved in the regulation of immune and metabolic functions in LUAD. A prognostic risk model for LUAD was constructed by the biomarkers HLA-DRB1, SCGB1A1, and HLA-DRB5 screened by Cox regression analysis, dividing the patients into high-risk and low-risk groups. The prognostic risk model was validated with external datasets. The low-risk group was characterized by enrichment of immune pathways and higher immune infiltration compared to the high-risk group. While, the high-risk group was accompanied by an increase in metabolic pathway activity. There were significant differences between the high- and low-risk groups in metabolic reprogramming of aerobic glycolysis, amino acids, and lipids, as well as in angiogenic activity, epithelial-mesenchymal transition, tumorigenic cytokines, and inflammatory response. Furthermore, high-risk patients were more sensitive to Afatinib, Gefitinib, and Gemcitabine as predicted by the pRRophetic algorithm. This study provides prognostic signatures capable of revealing the immune and metabolic landscapes for LUAD, and may shed light on the identification of other cancer biomarkers.
Collapse
Affiliation(s)
- Shimei Qin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Shibin Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Yahui Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Chao Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Lei Fu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Ming Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Jinxing Yan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Junjie Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China.
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China.
| |
Collapse
|
11
|
Štancl P, Karlić R. Machine learning for pan-cancer classification based on RNA sequencing data. Front Mol Biosci 2023; 10:1285795. [PMID: 38028533 PMCID: PMC10667476 DOI: 10.3389/fmolb.2023.1285795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023] Open
Abstract
Despite recent improvements in cancer diagnostics, 2%-5% of all malignancies are still cancers of unknown primary (CUP), for which the tissue-of-origin (TOO) cannot be determined at the time of presentation. Since the primary site of cancer leads to the choice of optimal treatment, CUP patients pose a significant clinical challenge with limited treatment options. Data produced by large-scale cancer genomics initiatives, which aim to determine the genomic, epigenomic, and transcriptomic characteristics of a large number of individual patients of multiple cancer types, have led to the introduction of various methods that use machine learning to predict the TOO of cancer patients. In this review, we assess the reproducibility, interpretability, and robustness of results obtained by 20 recent studies that utilize different machine learning methods for TOO prediction based on RNA sequencing data, including their reported performance on independent data sets and identification of important features. Our review investigates the strengths and weaknesses of different methods, checks the correspondence of their results, and identifies potential issues with datasets used for model training and testing, assessing their potential usefulness in a clinical setting and suggesting future improvements.
Collapse
Affiliation(s)
| | - Rosa Karlić
- Bioinformatics Group, Division of Molecular Biology, Department of Biology, Faculty of Science, University of Zagreb, Zagreb, Croatia
| |
Collapse
|
12
|
Dos Santos GA, Chatsirisupachai K, Avelar RA, de Magalhães JP. Transcriptomic analysis reveals a tissue-specific loss of identity during ageing and cancer. BMC Genomics 2023; 24:644. [PMID: 37884865 PMCID: PMC10604446 DOI: 10.1186/s12864-023-09756-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 10/20/2023] [Indexed: 10/28/2023] Open
Abstract
INTRODUCTION Understanding changes in cell identity in cancer and ageing is of great importance. In this work, we analyzed how gene expression changes in human tissues are associated with tissue specificity during cancer and ageing using transcriptome data from TCGA and GTEx. RESULTS We found significant downregulation of tissue-specific genes during ageing in 40% of the tissues analyzed, which suggests loss of tissue identity with age. For most cancer types, we have noted a consistent pattern of downregulation in genes that are specific to the tissue from which the tumor originated. Moreover, we observed in cancer an activation of genes not usually expressed in the tissue of origin as well as an upregulation of genes specific to other tissues. These patterns in cancer were associated with patient survival. The age of the patient, however, did not influence these patterns. CONCLUSION We identified loss of cellular identity in 40% of the tissues analysed during human ageing, and a clear pattern in cancer, where during tumorigenesis cells express genes specific to other organs while suppressing the expression of genes from their original tissue. The loss of cellular identity observed in cancer is associated with prognosis and is not influenced by age, suggesting that it is a crucial stage in carcinogenesis.
Collapse
Affiliation(s)
- Gabriel Arantes Dos Santos
- Laboratory of Medical Investigation (LIM55), Urology Department, Faculdade de Medicina FMUSP, Universidade de Sao Paulo, Sao Paulo, Brazil
- Genomics of Ageing and Rejuvenation Lab, Institute of Inflammation and Ageing, University of Birmingham, Birmingham, B15 2WB, UK
| | - Kasit Chatsirisupachai
- Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool, L7 8TX, UK
| | - Roberto A Avelar
- Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool, L7 8TX, UK
| | - João Pedro de Magalhães
- Genomics of Ageing and Rejuvenation Lab, Institute of Inflammation and Ageing, University of Birmingham, Birmingham, B15 2WB, UK.
| |
Collapse
|
13
|
Darmofal M, Suman S, Atwal G, Chen JF, Chang JC, Toomey M, Vakiani E, Varghese AM, Rema AB, Syed A, Schultz N, Berger M, Morris Q. Deep Learning Model for Tumor Type Prediction using Targeted Clinical Genomic Sequencing Data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.08.23295131. [PMID: 37732244 PMCID: PMC10508812 DOI: 10.1101/2023.09.08.23295131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
Tumor type guides clinical treatment decisions in cancer, but histology-based diagnosis remains challenging. Genomic alterations are highly diagnostic of tumor type, and tumor type classifiers trained on genomic features have been explored, but the most accurate methods are not clinically feasible, relying on features derived from whole genome sequencing (WGS), or predicting across limited cancer types. We use genomic features from a dataset of 39,787 solid tumors sequenced using a clinical targeted cancer gene panel to develop Genome-Derived-Diagnosis Ensemble (GDD-ENS): a hyperparameter ensemble for classifying tumor type using deep neural networks. GDD-ENS achieves 93% accuracy for high-confidence predictions across 38 cancer types, rivalling performance of WGS-based methods. GDD-ENS can also guide diagnoses on rare type and cancers of unknown primary, and incorporate patient-specific clinical information for improved predictions. Overall, integrating GDD-ENS into prospective clinical sequencing workflows has enabled clinically-relevant tumor type predictions to guide treatment decisions in real time.
Collapse
Affiliation(s)
- Madison Darmofal
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine; New York, NY 10065, USA
| | - Shalabh Suman
- Department of Pathology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | - Gurnit Atwal
- Computational Biology Program, Ontario Institute for Cancer Research; Toronto, ON M5G 0A3, Canada
- Department of Molecular Genetics, University of Toronto; Toronto, ON M5S 1A8, Canada
- Vector Institute; Toronto, ON M5G 1M1, Canada
| | - Jie-Fu Chen
- Department of Pathology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | - Jason C. Chang
- Department of Pathology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | - Michael Toomey
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine; New York, NY 10065, USA
| | - Efsevia Vakiani
- Department of Pathology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | - Anna M Varghese
- Department of Medicine, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | | | - Aijazuddin Syed
- Department of Pathology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | - Nikolaus Schultz
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Michael Berger
- Department of Pathology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | - Quaid Morris
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| |
Collapse
|
14
|
MacDonald S, Foley H, Yap M, Johnston RL, Steven K, Koufariotis LT, Sharma S, Wood S, Addala V, Pearson JV, Roosta F, Waddell N, Kondrashova O, Trzaskowski M. Generalising uncertainty improves accuracy and safety of deep learning analytics applied to oncology. Sci Rep 2023; 13:7395. [PMID: 37149669 PMCID: PMC10164181 DOI: 10.1038/s41598-023-31126-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 03/07/2023] [Indexed: 05/08/2023] Open
Abstract
Uncertainty estimation is crucial for understanding the reliability of deep learning (DL) predictions, and critical for deploying DL in the clinic. Differences between training and production datasets can lead to incorrect predictions with underestimated uncertainty. To investigate this pitfall, we benchmarked one pointwise and three approximate Bayesian DL models for predicting cancer of unknown primary, using three RNA-seq datasets with 10,968 samples across 57 cancer types. Our results highlight that simple and scalable Bayesian DL significantly improves the generalisation of uncertainty estimation. Moreover, we designed a prototypical metric-the area between development and production curve (ADP), which evaluates the accuracy loss when deploying models from development to production. Using ADP, we demonstrate that Bayesian DL improves accuracy under data distributional shifts when utilising 'uncertainty thresholding'. In summary, Bayesian DL is a promising approach for generalising uncertainty, improving performance, transparency, and safety of DL models for deployment in the real world.
Collapse
Affiliation(s)
- Samual MacDonald
- Max Kelsen, Brisbane, QLD, Australia
- ARC Training Centre for Information Resilience (CIRES), Brisbane, Australia
- The University of Queensland, Brisbane, Australia
| | | | | | | | | | | | - Sowmya Sharma
- QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
- ACL Pathology, Bella Vista, NSW, Australia
| | - Scott Wood
- QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | | | - John V Pearson
- QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - Fred Roosta
- ARC Training Centre for Information Resilience (CIRES), Brisbane, Australia
- The University of Queensland, Brisbane, Australia
| | - Nicola Waddell
- QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - Olga Kondrashova
- QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia.
| | - Maciej Trzaskowski
- Max Kelsen, Brisbane, QLD, Australia.
- ARC Training Centre for Information Resilience (CIRES), Brisbane, Australia.
- The University of Queensland, Brisbane, Australia.
- QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia.
| |
Collapse
|
15
|
Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering (Basel) 2023; 10:bioengineering10020173. [PMID: 36829667 PMCID: PMC9952758 DOI: 10.3390/bioengineering10020173] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, as well as convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, we review pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification.
Collapse
|
16
|
Lu J, Li J, Ren J, Ding S, Zeng Z, Huang T, Cai YD. Functional and embedding feature analysis for pan-cancer classification. Front Oncol 2022; 12:979336. [PMID: 36248961 PMCID: PMC9559388 DOI: 10.3389/fonc.2022.979336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
With the increasing number of people suffering from cancer, this illness has become a major health problem worldwide. Exploring the biological functions and signaling pathways of carcinogenesis is essential for cancer detection and research. In this study, a mutation dataset for eleven cancer types was first obtained from a web-based resource called cBioPortal for Cancer Genomics, followed by extracting 21,049 features from three aspects: relationship to GO and KEGG (enrichment features), mutated genes learned by word2vec (text features), and protein-protein interaction network analyzed by node2vec (network features). Irrelevant features were then excluded using the Boruta feature filtering method, and the retained relevant features were ranked by four feature selection methods (least absolute shrinkage and selection operator, minimum redundancy maximum relevance, Monte Carlo feature selection and light gradient boosting machine) to generate four feature-ranked lists. Incremental feature selection was used to determine the optimal number of features based on these feature lists to build the optimal classifiers and derive interpretable classification rules. The results of four feature-ranking methods were integrated to identify key functional pathways, such as olfactory transduction (hsa04740) and colorectal cancer (hsa05210), and the roles of these functional pathways in cancers were discussed in reference to literature. Overall, this machine learning-based study revealed the altered biological functions of cancers and provided a reference for the mechanisms of different cancers.
Collapse
Affiliation(s)
- Jian Lu
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Science, Shanghai, China
| | - JiaRui Li
- Advanced Research Computing, University of British Columbia, Vancouver, BC, Canada
| | - Jingxin Ren
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Shijian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Zhenbing Zeng
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Science, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
17
|
Shen Y, Cao Y, Zhou L, Wu J, Mao M. Construction of an endoplasmic reticulum stress-related gene model for predicting prognosis and immune features in kidney renal clear cell carcinoma. Front Mol Biosci 2022; 9:928006. [PMID: 36120545 PMCID: PMC9478755 DOI: 10.3389/fmolb.2022.928006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 08/12/2022] [Indexed: 11/13/2022] Open
Abstract
Background: Kidney renal clear cell carcinoma (KIRC) is one of the most lethal malignant tumors with a propensity for poor prognosis and difficult treatment. Endoplasmic reticulum (ER) stress served as a pivotal role in the progression of the tumor. However, the implications of ER stress on the clinical outcome and immune features of KIRC patients still need elucidation.Methods: We identified differentially expressed ER stress-related genes between KIRC specimens and normal specimens with TCGA dataset. Then, we explored the biological function and genetic mutation of ER stress-related differentially expressed genes (DEGs) by multiple bioinformatics analysis. Subsequently, LASSO analysis and univariate Cox regression analysis were applied to construct a novel prognostic model based on ER stress-related DEGs. Next, we confirmed the predictive performance of this model with the GEO dataset and explored the potential biological functions by functional enrichment analysis. Finally, KIRC patients stratified by the prognostic model were assessed for tumor microenvironment (TME), immune infiltration, and immune checkpoints through single-sample Gene Set Enrichment Analysis (ssGSEA) and ESTIMATE analysis.Results: We constructed a novel prognostic model, including eight ER stress-related DEGs, which could stratify two risk groups in KIRC. The prognostic model and a model-based nomogram could accurately predict the prognosis of KIRC patients. Functional enrichment analysis indicated several biological functions related to the progression of KIRC. The high-risk group showed higher levels of tumor infiltration by immune cells and higher immune scores.Conclusion: In this study, we constructed a novel prognostic model based on eight ER stress-related genes for KIRC patients, which would help predict the prognosis of KIRC and provide a new orientation to further research studies on personalized immunotherapy in KIRC.
Collapse
Affiliation(s)
- Yuanhao Shen
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yinghao Cao
- Department of Orthopedics, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Lei Zhou
- Department of Orthopedics, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jianfeng Wu
- Department of Orthopedics, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Min Mao
- Department of Orthopedics, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- *Correspondence: Min Mao,
| |
Collapse
|
18
|
Ibrahim A, Mohamed HK, Maher A, Zhang B. A Survey on Human Cancer Categorization Based on Deep Learning. Front Artif Intell 2022; 5:884749. [PMID: 35832207 PMCID: PMC9271903 DOI: 10.3389/frai.2022.884749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Accepted: 05/09/2022] [Indexed: 11/13/2022] Open
Abstract
In recent years, we have witnessed the fast growth of deep learning, which involves deep neural networks, and the development of the computing capability of computer devices following the advance of graphics processing units (GPUs). Deep learning can prototypically and successfully categorize histopathological images, which involves imaging classification. Various research teams apply deep learning to medical diagnoses, especially cancer diseases. Convolutional neural networks (CNNs) detect the conventional visual features of disease diagnoses, e.g., lung, skin, brain, prostate, and breast cancer. A CNN has a procedure for perfectly investigating medicinal science images. This study assesses the main deep learning concepts relevant to medicinal image investigation and surveys several charities in the field. In addition, it covers the main categories of imaging procedures in medication. The survey comprises the usage of deep learning for object detection, classification, and human cancer categorization. In addition, the most popular cancer types have also been introduced. This article discusses the Vision-Based Deep Learning System among the dissimilar sorts of data mining techniques and networks. It then introduces the most extensively used DL network category, which is convolutional neural networks (CNNs) and investigates how CNN architectures have evolved. Starting with Alex Net and progressing with the Google and VGG networks, finally, a discussion of the revealed challenges and trends for upcoming research is held.
Collapse
Affiliation(s)
- Ahmad Ibrahim
- Department of Computer Science, October 6 University, Cairo, Egypt
| | - Hoda K. Mohamed
- Department of Computer Engineering, Ain Shams University, Cairo, Egypt
| | - Ali Maher
- Department of Computer Science, October 6 University, Cairo, Egypt
| | - Baochang Zhang
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
| |
Collapse
|