1
|
Caruana A, Bandara M, Musial K, Catchpoole D, Kennedy PJ. Machine learning for administrative health records: A systematic review of techniques and applications. Artif Intell Med 2023; 144:102642. [PMID: 37783537 DOI: 10.1016/j.artmed.2023.102642] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 08/21/2023] [Accepted: 08/25/2023] [Indexed: 10/04/2023]
Abstract
Machine learning provides many powerful and effective techniques for analysing heterogeneous electronic health records (EHR). Administrative Health Records (AHR) are a subset of EHR collected for administrative purposes, and the use of machine learning on AHRs is a growing subfield of EHR analytics. Existing reviews of EHR analytics emphasise that the data-modality of the EHR limits the breadth of suitable machine learning techniques, and pursuable healthcare applications. Despite emphasising the importance of data modality, the literature fails to analyse which techniques and applications are relevant to AHRs. AHRs contain uniquely well-structured, categorically encoded records which are distinct from other data-modalities captured by EHRs, and they can provide valuable information pertaining to how patients interact with the healthcare system. This paper systematically reviews AHR-based research, analysing 70 relevant studies and spanning multiple databases. We identify and analyse which machine learning techniques are applied to AHRs and which health informatics applications are pursued in AHR-based research. We also analyse how these techniques are applied in pursuit of each application, and identify the limitations of these approaches. We find that while AHR-based studies are disconnected from each other, the use of AHRs in health informatics research is substantial and accelerating. Our synthesis of these studies highlights the utility of AHRs for pursuing increasingly complex and diverse research objectives despite a number of pervading data- and technique-based limitations. Finally, through our findings, we propose a set of future research directions that can enhance the utility of AHR data and machine learning techniques for health informatics research.
Collapse
Affiliation(s)
- Adrian Caruana
- Australian Artificial Intelligence Institute, Faculty of Engineering and IT, University of Technology Sydney, Australia.
| | - Madhushi Bandara
- Australian Artificial Intelligence Institute, Faculty of Engineering and IT, University of Technology Sydney, Australia
| | - Katarzyna Musial
- Complex Adaptive Systems Lab, Data Science Institute, Faculty of Engineering and IT, University of Technology Sydney, Australia
| | - Daniel Catchpoole
- Australian Artificial Intelligence Institute, Faculty of Engineering and IT, University of Technology Sydney, Australia; Biospecimen Research Services, The Children's Cancer Research Unit, The Children's Hospital at Westmead, Australia
| | - Paul J Kennedy
- Australian Artificial Intelligence Institute, Faculty of Engineering and IT, University of Technology Sydney, Australia; Joint Research Centre in AI for Health and Wellness, University of Technology Sydney, Australia, and Ontario Tech University, Canada
| |
Collapse
|
2
|
Shakir H, Aijaz B, Khan TMR, Hussain M. A deep learning-based cancer survival time classifier for small datasets. Comput Biol Med 2023; 160:106896. [PMID: 37150085 DOI: 10.1016/j.compbiomed.2023.106896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 03/07/2023] [Accepted: 04/09/2023] [Indexed: 05/09/2023]
Abstract
Cancer survival time prediction using Deep Learning (DL) has been an emerging area of research. However, non-availability of large-sized annotated medical imaging databases affects the training performance of DL models leading to their arguable usage in many clinical applications. In this research work, a neural network model is customized for small sample space to avoid data over-fitting for DL training. A set of prognostic radiomic features is selected through an iterative process using average of multiple dropouts which results in back-propagated gradients with low variance, thus increasing the network learning capability, reliable feature selection and better training over a small database. The proposed classifier is further compared with erasing feature selection method proposed in the literature for improved network training and with other well-known classifiers on small sample size. Achieved results which were statistically validated show efficient and improved classification of cancer survival time into three intervals of 6 months, between 6 months up to 2 years, and above 2 years; and has the potential to aid health care professionals in lung tumor evaluation for timely treatment and patient care.
Collapse
Affiliation(s)
- Hina Shakir
- Department of Software Engineering, Bahria University, 13-National Stadium Road Karachi, 75620, Pakistan.
| | - Bushra Aijaz
- Department of Electrical Engineering, Bahria University, 13-National Stadium Road Karachi, 75620, Pakistan.
| | - Tariq Mairaj Rasool Khan
- Department of Electrical and Power Engineering, Pakistan Navy Engineering College, National University of Science and Technology, Karachi, Pakistan.
| | - Muhammad Hussain
- Department of Electrical Engineering, Bahria University, 13-National Stadium Road Karachi, 75620, Pakistan.
| |
Collapse
|
3
|
Li R, Zhang C, Du K, Dan H, Ding R, Cai Z, Duan L, Xie Z, Zheng G, Wu H, Ren G, Dou X, Feng F, Zheng J. Analysis of Prognostic Factors of Rectal Cancer and Construction of a Prognostic Prediction Model Based on Bayesian Network. Front Public Health 2022; 10:842970. [PMID: 35784233 PMCID: PMC9247333 DOI: 10.3389/fpubh.2022.842970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 05/20/2022] [Indexed: 11/13/2022] Open
Abstract
BackgroundThe existing prognostic models of rectal cancer after radical resection ignored the relationships among prognostic factors and their mutual effects on prognosis. Thus, a new modeling method is required to remedy this defect. The present study aimed to construct a new prognostic prediction model based on the Bayesian network (BN), a machine learning tool for data mining, clinical decision-making, and prognostic prediction.MethodsFrom January 2015 to December 2017, the clinical data of 705 patients with rectal cancer who underwent radical resection were analyzed. The entire cohort was divided into training and testing datasets. A new prognostic prediction model based on BN was constructed and compared with a nomogram.ResultsA univariate analysis showed that age, Carcinoembryonic antigen (CEA), Carbohydrate antigen19-9 (CA19-9), Carbohydrate antigen 125 (CA125), preoperative chemotherapy, macropathology type, tumor size, differentiation status, T stage, N stage, vascular invasion, KRAS mutation, and postoperative chemotherapy were associated with overall survival (OS) of the training dataset. Based on the above-mentioned variables, a 3-year OS prognostic prediction BN model of the training dataset was constructed using the Tree Augmented Naïve Bayes method. In addition, age, CEA, CA19-9, CA125, differentiation status, T stage, N stage, KRAS mutation, and postoperative chemotherapy were identified as independent prognostic factors of the training dataset through multivariate Cox regression and were used to construct a nomogram. Then, based on the testing dataset, the two models were evaluated using the receiver operating characteristic (ROC) curve. The results showed that the area under the curve (AUC) of ROC of the BN model and nomogram was 80.11 and 74.23%, respectively.ConclusionThe present study established a BN model for prognostic prediction of rectal cancer for the first time, which was demonstrated to be more accurate than a nomogram.
Collapse
Affiliation(s)
- Ruikai Li
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Chi Zhang
- Department of Industrial Engineering, School of Mechantronics, Northwestern Polytechnical University, Xi'an, China
| | - Kunli Du
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Hanjun Dan
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Ruxin Ding
- Department of Cell Biology and Genetics, Medical College of Yan'an University, Yan'an, China
| | - Zhiqiang Cai
- Department of Industrial Engineering, School of Mechantronics, Northwestern Polytechnical University, Xi'an, China
| | - Lili Duan
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Zhenyu Xie
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Gaozan Zheng
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Hongze Wu
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Guangming Ren
- Graduate Work Department, Xi'an Medical University, Xi'an, China
| | - Xinyu Dou
- Graduate Work Department, Xi'an Medical University, Xi'an, China
| | - Fan Feng
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
- Fan Feng
| | - Jianyong Zheng
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
- *Correspondence: Jianyong Zheng
| |
Collapse
|
5
|
An Integrated Approach for Cancer Survival Prediction Using Data Mining Techniques. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2021:6342226. [PMID: 34992648 PMCID: PMC8727098 DOI: 10.1155/2021/6342226] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/27/2021] [Indexed: 12/31/2022]
Abstract
Ovarian cancer is the third most common gynecologic cancers worldwide. Advanced ovarian cancer patients bear a significant mortality rate. Survival estimation is essential for clinicians and patients to understand better and tolerate future outcomes. The present study intends to investigate different survival predictors available for cancer prognosis using data mining techniques. Dataset of 140 advanced ovarian cancer patients containing data from different data profiles (clinical, treatment, and overall life quality) has been collected and used to foresee cancer patients' survival. Attributes from each data profile have been processed accordingly. Clinical data has been prepared corresponding to missing values and outliers. Treatment data including varying time periods were created using sequence mining techniques to identify the treatments given to the patients. And lastly, different comorbidities were combined into a single factor by computing Charlson Comorbidity Index for each patient. After appropriate preprocessing, the integrated dataset is classified using appropriate machine learning algorithms. The proposed integrated model approach gave the highest accuracy of 76.4% using ensemble technique with sequential pattern mining including time intervals of 2 months between treatments. Thus, the treatment sequences and, most importantly, life quality attributes significantly contribute to the survival prediction of cancer patients.
Collapse
|
6
|
Bohannan ZS, Coffman F, Mitrofanova A. Random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia. Comput Struct Biotechnol J 2022; 20:583-597. [PMID: 35116134 PMCID: PMC8777142 DOI: 10.1016/j.csbj.2022.01.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 12/30/2021] [Accepted: 01/01/2022] [Indexed: 12/16/2022] Open
Abstract
High-risk pediatric B-ALL patients experience 5-year negative event rates up to 25%. Although some biomarkers of relapse are utilized in the clinic, their ability to predict outcomes in high-risk patients is limited. Here, we propose a random survival forest (RSF) machine learning model utilizing interpretable genomic inputs to predict relapse/death in high-risk pediatric B-ALL patients. We utilized whole exome sequencing profiles from 156 patients in the TARGET-ALL study (with samples collected at presentation) further stratified into training and test cohorts (109 and 47 patients, respectively). To avoid overfitting and facilitate the interpretation of machine learning results, input genomic variables were engineered using a stepwise approach involving univariable Cox models to select variables directly associated with outcomes, genomic coordinate-based analysis to select mutational hotspots, and correlation analysis to eliminate feature co-linearity. Model training identified 7 genomic regions most predictive of relapse/death-free survival. The test cohort error rate was 12.47%, and a polygenic score based on the sum of the top 7 variables effectively stratified patients into two groups, with significant differences in time to relapse/death (log-rank P = 0.001, hazard ratio = 5.41). Our model outperformed other EFS modeling approaches including an RSF using gold-standard prognostic variables (error rate = 24.35%). Validation in 174 standard-risk patients and 3 patients who failed to respond to induction therapy confirmed that our RSF model and polygenic score were specific to high-risk disease. We propose that our feature selection/engineering approach can increase the clinical interpretability of RSF, and our polygenic score could be utilized for enhance clinical decision-making in high-risk B-ALL.
Collapse
Affiliation(s)
- Zachary S. Bohannan
- Rutgers, The State University of New Jersey, School of Health Professions, Department of Health Informatics, 65 Bergen Street, Suite 120, Newark, NJ 07107-1709, United States
| | - Frederick Coffman
- Rutgers, The State University of New Jersey, School of Health Professions, Department of Health Informatics, 65 Bergen Street, Suite 120, Newark, NJ 07107-1709, United States
| | - Antonina Mitrofanova
- Rutgers, The State University of New Jersey, School of Health Professions, Department of Health Informatics, 65 Bergen Street, Suite 120, Newark, NJ 07107-1709, United States
| |
Collapse
|