1
|
Dwivedi K, Rajpal A, Rajpal S, Kumar V, Agarwal M, Kumar N. XL 1R-Net: Explainable AI-driven improved L 1-regularized deep neural architecture for NSCLC biomarker identification. Comput Biol Chem 2024; 108:107990. [PMID: 38000327 DOI: 10.1016/j.compbiolchem.2023.107990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 10/29/2023] [Accepted: 11/21/2023] [Indexed: 11/26/2023]
Abstract
BACKGROUND AND OBJECTIVE Non-small cell lung cancer (NSCLC) exhibits intrinsic molecular heterogeneity, primarily driven by the mutation of specific biomarkers. Identification of these biomarkers would assist not only in distinguishing NSCLC into its major subtypes - Adenocarcinoma and Squamous Cell Carcinoma, but also in developing targeted therapy. Medical practitioners use one or more types of omic data to identify these biomarkers, copy number variation (CNV) being one such type. CNV provides a measure of genomic instability, which is considered a hallmark of carcinoma. However, the CNV data has not received much attention for biomarker identification. This paper aims to identify biomarkers for NSCLC using CNV data. METHODS An eXplainable AI (XAI)-driven L1-regularized deep learning architecture, XL1R-Net, is proposed that introduces a novel modification of the standard L1-regularized gradient descent algorithm to arrive at an improved deep neural classifier for NSCLC subtyping. Further, XAI-based feature identification has been used to leverage the trained classifier to uncover a set of twenty NCSLC-relevant biomarkers. RESULTS The identified biomarkers are evaluated based on their classification performance and clinical relevance. Using Multilayer Perceptron (MLP)-based model, a classification accuracy of 84.95% using 10-fold cross-validation is achieved. Moreover, the statistical significance test on the classification performance also revealed the superiority of the MLP model over the competitive machine learning models. Further, the publicly available Drug-Gene Interaction Database reveals twelve of the identified biomarkers as potentially druggable. The K-M Plotter tool was used to verify eighteen of the identified biomarkers with a high probability of predicting NSCLC patients' likelihood of survival. While nine of the identified biomarkers confirm the recent literature, five find mention in the OncoKB Gene List. CONCLUSION A set of seven novel biomarkers that have not been reported in the literature could be investigated for their potential contribution towards NSCLC therapy. Given NSCLC's genetic diversity, using only one omics data type may not adequately capture the tumor's complexity. Multiomics data and its integration with other sources will be examined in the future to better understand NSCLC heterogeneity.
Collapse
Affiliation(s)
- Kountay Dwivedi
- Department of Computer Science, University of Delhi, Delhi, India.
| | - Ankit Rajpal
- Department of Computer Science, University of Delhi, Delhi, India.
| | - Sheetal Rajpal
- Department of Computer Science, Dyal Singh College, Delhi, India.
| | - Virendra Kumar
- Department of Nuclear Magnetic Resonance, All India Institute of Medical Sciences, New Delhi, India.
| | - Manoj Agarwal
- Department of Computer Science, Hans Raj College, University of Delhi, Delhi, India.
| | - Naveen Kumar
- Department of Computer Science, University of Delhi, Delhi, India.
| |
Collapse
|
2
|
Pan J, Ma B, Hou X, Li C, Xiong T, Gong Y, Song F. The construction of transcriptional risk scores for breast cancer based on lightGBM and multiple omics data. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:12353-12370. [PMID: 36654001 DOI: 10.3934/mbe.2022576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
BACKGROUND Polygenic risk score (PRS) can evaluate the individual-level genetic risk of breast cancer. However, standalone single nucleotide polymorphisms (SNP) data used for PRS may not provide satisfactory prediction accuracy. Additionally, current PRS models based on linear regression have insufficient power to leverage non-linear effects from thousands of associated SNPs. Here, we proposed a transcriptional risk score (TRS) based on multiple omics data to estimate the risk of breast cancer. METHODS The multiple omics data and clinical data of breast invasive carcinoma (BRCA) were collected from the cancer genome atlas (TCGA) and the gene expression omnibus (GEO). First, we developed a novel TRS model for BRCA utilizing single omic data and LightGBM algorithm. Subsequently, we built a combination model of TRS derived from each omic data to further improve the prediction accuracy. Finally, we performed association analysis and prognosis prediction to evaluate the utility of the TRS generated by our method. RESULTS The proposed TRS model achieved better predictive performance than the linear models and other ML methods in single omic dataset. An independent validation dataset also verified the effectiveness of our model. Moreover, the combination of the TRS can efficiently strengthen prediction accuracy. The analysis of prevalence and the associations of the TRS with phenotypes including case-control and cancer stage indicated that the risk of breast cancer increases with the increases of TRS. The survival analysis also suggested that TRS for the cancer stage is an effective prognostic metric of breast cancer patients. CONCLUSIONS Our proposed TRS model expanded the current definition of PRS from standalone SNP data to multiple omics data and outperformed the linear models, which may provide a powerful tool for diagnostic and prognostic prediction of breast cancer.
Collapse
Affiliation(s)
- Jianqiao Pan
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology, Tianjin, National Clinical Research Center of Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, China
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Xiaoyu Hou
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Chongyang Li
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Tong Xiong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Yi Gong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Fengju Song
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology, Tianjin, National Clinical Research Center of Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, China
| |
Collapse
|
3
|
Panja S, Rahem S, Chu CJ, Mitrofanova A. Big Data to Knowledge: Application of Machine Learning to Predictive Modeling of Therapeutic Response in Cancer. Curr Genomics 2021; 22:244-266. [PMID: 35273457 PMCID: PMC8822229 DOI: 10.2174/1389202921999201224110101] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 09/16/2020] [Accepted: 09/30/2020] [Indexed: 11/22/2022] Open
Abstract
Background In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches in light of their application to therapeutic response modeling in cancer. Conclusion We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.
Collapse
Affiliation(s)
| | | | | | - Antonina Mitrofanova
- Address correspondence to this author at the Department of Health Informatics, Rutgers School of Health Professions, Rutgers Biomedical and Health Sciences, Newark, NJ 07107, USA; E-mail:
| |
Collapse
|
4
|
Prats L, Izquierdo JL. [Respiratory Disease in the Era of Big Data]. OPEN RESPIRATORY ARCHIVES 2020; 2:284-288. [PMID: 38620700 PMCID: PMC7481841 DOI: 10.1016/j.opresp.2020.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 07/03/2020] [Indexed: 11/29/2022] Open
Abstract
One of the key elements of medicine in the second decade of the 21st century is the exponential growth of patient-produced information, due not only to the transition to the digitization of medical records, but also to the emergence of new sources of information and the capacity for analysis and interpretation of existing ones. The amount of medical information is expected to double every 2 years, which means that there will be 50 times more information available in 2020 than in 2011. In this setting, these large amounts of data or «big data» must be properly managed to implement new initiatives that improve the diagnosis, treatment, and prognosis of patients on the path to personalized medicine.The concept of personalization or precision medicine is of special interest in chronic respiratory disease. In recent years, research in entities such as asthma, COPD, cancer, or SAHS has focused on the identification of genomic, molecular, metabolic, and protein changes (biomarkers). Big data analysis tools can be used to move on from models based on the mean response to treatment, which are suboptimal for most patients, to focus on the individualized response. Part of this journey involves systems medicine, which also integrates clinical and population data to provide a multidimensional view of the disease and help identify causal associations that are usually only evident on big data analysis.
Collapse
Affiliation(s)
- Lourdes Prats
- Departamento de Medicina y Especialidades, Universidad de Alcalá, Alcalá de Henares, España
| | - José Luis Izquierdo
- Departamento de Medicina y Especialidades, Universidad de Alcalá, Alcalá de Henares, España
- Neumología, Hospital Universitario de Guadalajara, Guadalajara, España
| |
Collapse
|
5
|
Gene selection of non-small cell lung cancer data for adjuvant chemotherapy decision using cell separation algorithm. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01740-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
6
|
A feature-fusion framework of clinical, genomics, and histopathological data for METABRIC breast cancer subtype classification. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106238] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
7
|
Chen Q, Gao P, Song Y, Huang X, Xiao Q, Chen X, Lv X, Wang Z. Predicting the effect of 5-fluorouracil-based adjuvant chemotherapy on colorectal cancer recurrence: A model using gene expression profiles. Cancer Med 2020; 9:3043-3056. [PMID: 32150672 PMCID: PMC7196071 DOI: 10.1002/cam4.2952] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 02/08/2020] [Accepted: 02/16/2020] [Indexed: 12/21/2022] Open
Abstract
It is critical to identify patients with stage II and III colorectal cancer (CRC) who will benefit from adjuvant chemotherapy (ACT) after curative surgery, while the only use of clinical factors is insufficient to predict this beneficial effect. In this study, we performed genetic algorithm (GA) to select ACT candidate genes, and built a predictive model of support vector machine (SVM) using gene expression profiles from the Gene Expression Omnibus database. The model contained four ACT candidate genes (EDEM1, MVD, SEMA5B, and WWP2) and TNM stage (stage II or III). After using Subpopulation Treatment Effect Pattern Plot to determine the optimal cutoff value of predictive scores, the validated patients from The Cancer Genome Atlas database can be divided into the predictive ACT-benefit/-futile groups. Patients in the predictive ACT-benefit group with 5-fluorouracil (5-Fu)-based ACT had significantly longer relapse-free survival (RFS) compared to those without ACT (P = .015); However, the difference in RFS in the predictive ACT-futile group was insignificant (P = .596). The multivariable analysis found that the predictive groups were significantly associated with the effect of ACT (Pinteraction = .011). Consequently, we developed a predictive model based on the SVM and GA algorithm which was further validated to define patients who benefit from ACT on recurrence.
Collapse
Affiliation(s)
- Quan Chen
- Department of Surgical Oncology and General SurgeryKey Laboratory of Precision Diagnosis and Treatment of Gastrointestinal TumorsMinistry of EducationThe First Affiliated Hospital of China Medical UniversityShenyang CityChina
| | - Peng Gao
- Department of Surgical Oncology and General SurgeryKey Laboratory of Precision Diagnosis and Treatment of Gastrointestinal TumorsMinistry of EducationThe First Affiliated Hospital of China Medical UniversityShenyang CityChina
| | - Yongxi Song
- Department of Surgical Oncology and General SurgeryKey Laboratory of Precision Diagnosis and Treatment of Gastrointestinal TumorsMinistry of EducationThe First Affiliated Hospital of China Medical UniversityShenyang CityChina
| | - Xuanzhang Huang
- Department of Surgical Oncology and General SurgeryKey Laboratory of Precision Diagnosis and Treatment of Gastrointestinal TumorsMinistry of EducationThe First Affiliated Hospital of China Medical UniversityShenyang CityChina
| | - Qiong Xiao
- Department of Surgical Oncology and General SurgeryKey Laboratory of Precision Diagnosis and Treatment of Gastrointestinal TumorsMinistry of EducationThe First Affiliated Hospital of China Medical UniversityShenyang CityChina
| | - Xiaowan Chen
- Department of Surgical Oncology and General SurgeryKey Laboratory of Precision Diagnosis and Treatment of Gastrointestinal TumorsMinistry of EducationThe First Affiliated Hospital of China Medical UniversityShenyang CityChina
| | - Xinger Lv
- Department of Surgical Oncology and General SurgeryKey Laboratory of Precision Diagnosis and Treatment of Gastrointestinal TumorsMinistry of EducationThe First Affiliated Hospital of China Medical UniversityShenyang CityChina
| | - Zhenning Wang
- Department of Surgical Oncology and General SurgeryKey Laboratory of Precision Diagnosis and Treatment of Gastrointestinal TumorsMinistry of EducationThe First Affiliated Hospital of China Medical UniversityShenyang CityChina
| |
Collapse
|
8
|
Tong D, Tian Y, Zhou T, Ye Q, Li J, Ding K, Li J. Improving prediction performance of colon cancer prognosis based on the integration of clinical and multi-omics data. BMC Med Inform Decis Mak 2020; 20:22. [PMID: 32033604 PMCID: PMC7006213 DOI: 10.1186/s12911-020-1043-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2019] [Accepted: 01/31/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Colon cancer is common worldwide and is the leading cause of cancer-related death. Multiple levels of omics data are available due to the development of sequencing technologies. In this study, we proposed an integrative prognostic model for colon cancer based on the integration of clinical and multi-omics data. METHODS In total, 344 patients were included in this study. Clinical, gene expression, DNA methylation and miRNA expression data were retrieved from The Cancer Genome Atlas (TCGA). To accommodate the high dimensionality of omics data, unsupervised clustering was used as dimension reduction method. The bias-corrected Harrell's concordance index was used to verify which clustering result provided the best prognostic performance. Finally, we proposed a prognostic prediction model based on the integration of clinical data and multi-omics data. Uno's concordance index with cross-validation was used to compare the discriminative performance of the prognostic model constructed with different covariates. RESULTS Combinations of clinical and multi-omics data can improve prognostic performance, as shown by the increase of the bias-corrected Harrell's concordance of the prognostic model from 0.7424 (clinical features only) to 0.7604 (clinical features and three types of omics features). Additionally, 2-year, 3-year and 5-year Uno's concordance statistics increased from 0.7329, 0.7043, and 0.7002 (clinical features only) to 0.7639, 0.7474 and 0.7597 (clinical features and three types of omics features), respectively. CONCLUSION In conclusion, this study successfully combined clinical and multi-omics data for better prediction of colon cancer prognosis.
Collapse
Affiliation(s)
- Danyang Tong
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China
| | - Yu Tian
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China
| | - Tianshu Zhou
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China
| | - Qiancheng Ye
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China
| | - Jun Li
- Department of Surgical Oncology, Second Affiliated Hospital, Zhejiang University School of Medicine, No. 88 Jiefang Road, Hangzhou, 31009, Zhejiang Province, China
| | - Kefeng Ding
- Department of Surgical Oncology, Second Affiliated Hospital, Zhejiang University School of Medicine, No. 88 Jiefang Road, Hangzhou, 31009, Zhejiang Province, China
| | - Jingsong Li
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China.
- Research Center for Healthcare Data Science, Zhejiang Lab, Hangzhou, China.
| |
Collapse
|
9
|
Zhu X, Chen N, Liu L, Pu Q. [An Overview of the Application of Artificial Neural Networks in Lung Cancer Research]. ZHONGGUO FEI AI ZA ZHI = CHINESE JOURNAL OF LUNG CANCER 2019; 22:245-249. [PMID: 31014444 PMCID: PMC6500498 DOI: 10.3779/j.issn.1009-3419.2019.04.08] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
肺癌是目前全世界发病率、死亡率最高的肿瘤,目前的诊疗手段效果有限,精准医学的全面开展为提高肺癌诊疗水平带来了新的契机。但临床医生很难对精准医学需要的多维度多角度的资料(生物组学、临床检测指标以及非生物的环境背景资料等)进行有效的整合和利用,难以为患者选择最优的诊治方案。借助计算机技术的发展,以人工神经网络(artificial neural networks, ANNs)为代表的人工智能具有高容错性、智能性和具有自我学习能力的特点,其强大的信息整合能力可以对精准医学的发展与应用起到很大的帮助,在肺癌的基础研究和临床实践中发挥巨大的作用。本文对肺癌领域ANNs应用的现状进行综述。
Collapse
Affiliation(s)
- Xingyu Zhu
- West China School of Medicine, Sichuan University, Chengdu 610041, China
| | - Nan Chen
- West China School of Medicine, Sichuan University, Chengdu 610041, China.,Department of Thoracic Surgery, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Lunxu Liu
- Department of Thoracic Surgery, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Qiang Pu
- Department of Thoracic Surgery, West China Hospital, Sichuan University, Chengdu 610041, China
| |
Collapse
|
10
|
A survey of neural network-based cancer prediction models from microarray data. Artif Intell Med 2019; 97:204-214. [PMID: 30797633 DOI: 10.1016/j.artmed.2019.01.006] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 10/22/2018] [Accepted: 01/27/2019] [Indexed: 12/17/2022]
Abstract
Neural networks are powerful tools used widely for building cancer prediction models from microarray data. We review the most recently proposed models to highlight the roles of neural networks in predicting cancer from gene expression data. We identified articles published between 2013-2018 in scientific databases using keywords such as cancer classification, cancer analysis, cancer prediction, cancer clustering and microarray data. Analyzing the studies reveals that neural network methods have been either used for filtering (data engineering) the gene expressions in a prior step to prediction; predicting the existence of cancer, cancer type or the survivability risk; or for clustering unlabeled samples. This paper also discusses some practical issues that can be considered when building a neural network-based cancer prediction model. Results indicate that the functionality of the neural network determines its general architecture. However, the decision on the number of hidden layers, neurons, hypermeters and learning algorithm is made using trail-and-error techniques.
Collapse
|
11
|
Ghosh A, Yan H. Hydrogen bond analysis of the EGFR-ErbB3 heterodimer related to non-small cell lung cancer and drug resistance. J Theor Biol 2018; 464:63-71. [PMID: 30593826 DOI: 10.1016/j.jtbi.2018.12.035] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2018] [Revised: 12/19/2018] [Accepted: 12/24/2018] [Indexed: 01/25/2023]
Abstract
Lung cancer is the predominant cause of cancer deaths on a worldwide scale. A mutation in the epidermal growth factor receptor (EGFR) can cause non-small cell lung cancer (NSCLC). The L858R one-point mutation in exon 21 in EGFR is the most prevalent in NSCLC. For over 60% of EGFR-muted NSCLC, another mutation T790M can cause drug resistance. In this paper, we consider EGFR and ErbB3 heterodimers involving three structures of EGFR, wild-type, with L858R mutation, and with L858R and T790M mutations. We perform molecular dynamics (MD) simulations to analyze hydrogen bonds in all three instances. The hydrogen bonds contribute to the conformational stability of the protein and molecular recognition. Several other parameters are also investigated in the present study, which reveals significant changes in the dimer at different levels of mutation. The knowledge and results obtained from this study lead to useful insight into the mechanism of NSCLC drug resistance.
Collapse
Affiliation(s)
- Avirup Ghosh
- Department of Electronics Engineering, City University of Hong Kong, Kowloon, Hong Kong.
| | - Hong Yan
- Department of Electronics Engineering, City University of Hong Kong, Kowloon, Hong Kong
| |
Collapse
|
12
|
Wang JH, Chen YH. Overlapping group screening for detection of gene-gene interactions: application to gene expression profiles with survival trait. BMC Bioinformatics 2018; 19:335. [PMID: 30241463 PMCID: PMC6150983 DOI: 10.1186/s12859-018-2372-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 09/12/2018] [Indexed: 01/29/2023] Open
Abstract
Background The development of a disease is a complex process that may result from joint effects of multiple genes. In this article, we propose the overlapping group screening (OGS) approach to determining active genes and gene-gene interactions incorporating prior pathway information. The OGS method is developed to overcome the challenges in genome-wide data analysis that the number of the genes and gene-gene interactions is far greater than the sample size, and the pathways generally overlap with one another. The OGS method is further proposed for patients’ survival prediction based on gene expression data. Results Simulation studies demonstrate that the performance of the OGS approach in identifying the true main and interaction effects is good and the survival prediction accuracy of OGS with the Lasso penalty is better than the ordinary Lasso method. In real data analysis, we identify several significant genes and/or epistasis interactions that are associated with clinical survival outcomes of diffuse large B-cell lymphoma (DLBCL) and non-small-cell lung cancer (NSCLC) by utilizing prior pathway information from the KEGG pathway and the GO biological process databases, respectively. Conclusions The OGS approach is useful for selecting important genes and epistasis interactions in the ultra-high dimensional feature space. The prediction ability of OGS with the Lasso penalty is better than existing methods. The OGS approach is generally applicable to various types of outcome data (quantitative, qualitative, censored event time data) and regression models (e.g. linear, logistic, and Cox’s regression models). Electronic supplementary material The online version of this article (10.1186/s12859-018-2372-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jie-Huei Wang
- Institute of Statistical Science, Academia Sinica, Nankang, Taipei, Taiwan
| | - Yi-Hau Chen
- Institute of Statistical Science, Academia Sinica, Nankang, Taipei, Taiwan.
| |
Collapse
|
13
|
Rabbani M, Kanevsky J, Kafi K, Chandelier F, Giles FJ. Role of artificial intelligence in the care of patients with nonsmall cell lung cancer. Eur J Clin Invest 2018; 48. [PMID: 29405289 DOI: 10.1111/eci.12901] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2017] [Accepted: 01/28/2018] [Indexed: 12/27/2022]
Abstract
BACKGROUND Lung cancer is the leading cause of cancer death worldwide. In up to 57% of patients, it is diagnosed at an advanced stage and the 5-year survival rate ranges between 10%-16%. There has been a significant amount of research using machine learning to generate tools using patient data to improve outcomes. METHODS This narrative review is based on research material obtained from PubMed up to Nov 2017. The search terms include "artificial intelligence," "machine learning," "lung cancer," "Nonsmall Cell Lung Cancer (NSCLC)," "diagnosis" and "treatment." RESULTS Recent studies support the use of computer-aided systems and the use of radiomic features to help diagnose lung cancer earlier. Other studies have looked at machine learning (ML) methods that offer prognostic tools to doctors and help them in choosing personalized treatment options for their patients based on molecular, genetics and histological features. Combining artificial intelligence approaches into health care may serve as a beneficial tool for patients with NSCLC, and this review outlines these benefits and current shortcomings throughout the continuum of care. CONCLUSION We present a review of the various applications of ML methods in NSCLC as it relates to improving diagnosis, treatment and outcomes.
Collapse
Affiliation(s)
- Mohamad Rabbani
- McGill University Health Centre, McGill University, Montreal, QC, Canada
| | - Jonathan Kanevsky
- McGill University Health Centre, McGill University, Montreal, QC, Canada
| | - Kamran Kafi
- McGill University Health Centre, McGill University, Montreal, QC, Canada
| | | | | |
Collapse
|
14
|
Excellent Diagnostic Characteristics for Ultrafast Gene Profiling of DEFA1-IL1B-LTF in Detection of Prosthetic Joint Infections. J Clin Microbiol 2017. [PMID: 28637910 PMCID: PMC5648706 DOI: 10.1128/jcm.00558-17] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
The timely and exact diagnosis of prosthetic joint infection (PJI) is crucial for surgical decision-making. Intraoperatively, delivery of the result within an hour is required. Alpha-defensin lateral immunoassay of joint fluid (JF) is precise for the intraoperative exclusion of PJI; however, for patients with a limited amount of JF and/or in cases where the JF is bloody, this test is unhelpful. Important information is hidden in periprosthetic tissues that may much better reflect the current status of implant pathology. We therefore investigated the utility of the gene expression patterns of 12 candidate genes (TLR1, -2, -4, -6, and 10, DEFA1, LTF, IL1B, BPI, CRP, IFNG, and DEFB4A) previously associated with infection for detection of PJI in periprosthetic tissues of patients with total joint arthroplasty (TJA) (n = 76) reoperated for PJI (n = 38) or aseptic failure (n = 38), using the ultrafast quantitative reverse transcription-PCR (RT-PCR) Xxpress system (BJS Biotechnologies Ltd.). Advanced data-mining algorithms were applied for data analysis. For PJI, we detected elevated mRNA expression levels of DEFA1 (P < 0.0001), IL1B (P < 0.0001), LTF (P < 0.0001), TLR1 (P = 0.02), and BPI (P = 0.01) in comparison to those in tissues from aseptic cases. A feature selection algorithm revealed that the DEFA1-IL1B-LTF pattern was the most appropriate for detection/exclusion of PJI, achieving 94.5% sensitivity and 95.7% specificity, with likelihood ratios (LRs) for positive and negative results of 16.3 and 0.06, respectively. Taken together, the results show that DEFA1-IL1B-LTF gene expression detection by use of ultrafast qRT-PCR linked to an electronic calculator allows detection of patients with a high probability of PJI within 45 min after sampling. Further testing on a larger cohort of patients is needed.
Collapse
|
15
|
Naftchali RE, Abadeh MS. A multi-layered incremental feature selection algorithm for adjuvant chemotherapy effectiveness/futileness assessment in non-small cell lung cancer. Biocybern Biomed Eng 2017. [DOI: 10.1016/j.bbe.2017.05.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
16
|
Pivovarov R, Perotte AJ, Grave E, Angiolillo J, Wiggins CH, Elhadad N. Learning probabilistic phenotypes from heterogeneous EHR data. J Biomed Inform 2015; 58:156-165. [PMID: 26464024 DOI: 10.1016/j.jbi.2015.10.001] [Citation(s) in RCA: 89] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Revised: 09/05/2015] [Accepted: 10/04/2015] [Indexed: 12/25/2022]
Abstract
We present the Unsupervised Phenome Model (UPhenome), a probabilistic graphical model for large-scale discovery of computational models of disease, or phenotypes. We tackle this challenge through the joint modeling of a large set of diseases and a large set of clinical observations. The observations are drawn directly from heterogeneous patient record data (notes, laboratory tests, medications, and diagnosis codes), and the diseases are modeled in an unsupervised fashion. We apply UPhenome to two qualitatively different mixtures of patients and diseases: records of extremely sick patients in the intensive care unit with constant monitoring, and records of outpatients regularly followed by care providers over multiple years. We demonstrate that the UPhenome model can learn from these different care settings, without any additional adaptation. Our experiments show that (i) the learned phenotypes combine the heterogeneous data types more coherently than baseline LDA-based phenotypes; (ii) they each represent single diseases rather than a mix of diseases more often than the baseline ones; and (iii) when applied to unseen patient records, they are correlated with the patients' ground-truth disorders. Code for training, inference, and quantitative evaluation is made available to the research community.
Collapse
Affiliation(s)
- Rimma Pivovarov
- Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA.
| | - Adler J Perotte
- Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA.
| | - Edouard Grave
- Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA.
| | - John Angiolillo
- College of Physicians and Surgeons, Columbia University, New York, NY, USA.
| | - Chris H Wiggins
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY, USA.
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA.
| |
Collapse
|