1
|
Wang C, Yang X, Sun M, Gu Y, Niu J, Zhang W. Multimodal fusion network for ICU patient outcome prediction. Neural Netw 2024; 180:106672. [PMID: 39236409 DOI: 10.1016/j.neunet.2024.106672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 06/20/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024]
Abstract
Over the past decades, massive Electronic Health Records (EHRs) have been accumulated in Intensive Care Unit (ICU) and many other healthcare scenarios. The rich and comprehensive information recorded presents an exceptional opportunity for patient outcome predictions. Nevertheless, due to the diversity of data modalities, EHRs exhibit a heterogeneous characteristic, raising a difficulty to organically leverage information from various modalities. It is an urgent need to capture the underlying correlations among different modalities. In this paper, we propose a novel framework named Multimodal Fusion Network (MFNet) for ICU patient outcome prediction. First, we incorporate multiple modality-specific encoders to learn different modality representations. Notably, a graph guided encoder is designed to capture underlying global relationships among medical codes, and a text encoder with pre-fine-tuning strategy is adopted to extract appropriate text representations. Second, we propose to pairwise merge multimodal representations with a tailored hierarchical fusion mechanism. The experiments conducted on the eICU-CRD dataset validate that MFNet achieves superior performance on mortality prediction and Length of Stay (LoS) prediction compared with various representative and state-of-the-art baselines. Moreover, comprehensive ablation study demonstrates the effectiveness of each component of MFNet.
Collapse
Affiliation(s)
- Chutong Wang
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xuebing Yang
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Mengxuan Sun
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yifan Gu
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jinghao Niu
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Wensheng Zhang
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; Guangzhou University, Guangzhou, 510006, China.
| |
Collapse
|
2
|
Diaz FJ. Measuring the individualization potential of treatment individualization rules: Application to rules built with a new parametric interaction model for parallel-group clinical trials. Stat Methods Med Res 2024; 33:1355-1375. [PMID: 39105416 DOI: 10.1177/09622802241259172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/07/2024]
Abstract
For personalized medicine, we propose a general method of evaluating the potential performance of an individualized treatment rule in future clinical applications with new patients. We focus on rules that choose the most beneficial treatment for the patient out of two active (nonplacebo) treatments, which the clinician will prescribe regularly to the patient after the decision. We develop a measure of the individualization potential (IP) of a rule. The IP compares the expected effectiveness of the rule in a future clinical individualization setting versus the effectiveness of not trying individualization. We illustrate our evaluation method by explaining how to measure the IP of a useful type of individualized rules calculated through a new parametric interaction model of data from parallel-group clinical trials with continuous responses. Our interaction model implies a structural equation model we use to estimate the rule and its IP. We examine the IP both theoretically and with simulations when the estimated individualized rule is put into practice in new patients. Our individualization approach was superior to outcome-weighted machine learning according to simulations. We also show connections with crossover and N-of-1 trials. As a real data application, we estimate a rule for the individualization of treatments for diabetic macular edema and evaluate its IP.
Collapse
Affiliation(s)
- Francisco J Diaz
- Department of Biostatistics & Data Science, The University of Kansas Medical Center, Kansas City, KS, USA
| |
Collapse
|
3
|
Boldingh JWHL, Arbous MS, Biemond BJ, Blijlevens NMA, van Bommel J, Hilkens MGEC, Kusadasi N, Muller MCA, de Vries VA, Steyerberg EW, van den Bergh WM. Development and Validation of a Prediction Model for 1-Year Mortality in Patients With a Hematologic Malignancy Admitted to the ICU. Crit Care Explor 2024; 6:e1093. [PMID: 38813435 PMCID: PMC11132307 DOI: 10.1097/cce.0000000000001093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2024] Open
Abstract
OBJECTIVES To develop and validate a prediction model for 1-year mortality in patients with a hematologic malignancy acutely admitted to the ICU. DESIGN A retrospective cohort study. SETTING Five university hospitals in the Netherlands between 2002 and 2015. PATIENTS A total of 1097 consecutive patients with a hematologic malignancy were acutely admitted to the ICU for at least 24 h. INTERVENTIONS None. MEASUREMENTS AND MAIN RESULTS We created a 13-variable model from 22 potential predictors. Key predictors included active disease, age, previous hematopoietic stem cell transplantation, mechanical ventilation, lowest platelet count, acute kidney injury, maximum heart rate, and type of malignancy. A bootstrap procedure reduced overfitting and improved the model's generalizability. This involved estimating the optimism in the initial model and shrinking the regression coefficients accordingly in the final model. We assessed performance using internal-external cross-validation by center and compared it with the Acute Physiology and Chronic Health Evaluation II model. Additionally, we evaluated clinical usefulness through decision curve analysis. The overall 1-year mortality rate observed in the study was 62% (95% CI, 59-65). Our 13-variable prediction model demonstrated acceptable calibration and discrimination at internal-external validation across centers (C-statistic 0.70; 95% CI, 0.63-0.77), outperforming the Acute Physiology and Chronic Health Evaluation II model (C-statistic 0.61; 95% CI, 0.57-0.65). Decision curve analysis indicated overall net benefit within a clinically relevant threshold probability range of 60-100% predicted 1-year mortality. CONCLUSIONS Our newly developed 13-variable prediction model predicts 1-year mortality in hematologic malignancy patients admitted to the ICU more accurately than the Acute Physiology and Chronic Health Evaluation II model. This model may aid in shared decision-making regarding the continuation of ICU care and end-of-life considerations.
Collapse
Affiliation(s)
- Jan-Willem H L Boldingh
- Department of Critical Care, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Department of Anaesthesiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - M Sesmu Arbous
- Department of Critical Care, Leiden University Medical Center, Leiden, The Netherlands
| | - Bart J Biemond
- Department of Hematology, Amsterdam University Medical Center (location AMC), University of Amsterdam, Amsterdam, The Netherlands
| | - Nicole M A Blijlevens
- Department of Hematology, Radboud University Medical Center Nijmegen, Nijmegen, The Netherlands
| | - Jasper van Bommel
- Department of Critical Care, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Murielle G E C Hilkens
- Department of Critical Care, Radboud University Medical Center Nijmegen, Nijmegen, The Netherlands
| | - Nuray Kusadasi
- Department of Critical Care, Erasmus Medical Center, Rotterdam, The Netherlands
- University Medical Center Utrecht, Utrecht, The Netherlands
| | - Marcella C A Muller
- Department of Critical Care, Amsterdam University Medical Center (location AMC), University of Amsterdam, Amsterdam, The Netherlands
| | - Vera A de Vries
- Department of Critical Care, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Walter M van den Bergh
- Department of Critical Care, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
4
|
Nordin NI, Mustafa WA, Lola MS, Madi EN, Kamil AA, Nasution MD, K. Abdul Hamid AA, Zainuddin NH, Aruchunan E, Abdullah MT. Enhancing COVID-19 Classification Accuracy with a Hybrid SVM-LR Model. Bioengineering (Basel) 2023; 10:1318. [PMID: 38002441 PMCID: PMC10669812 DOI: 10.3390/bioengineering10111318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 10/03/2023] [Accepted: 10/09/2023] [Indexed: 11/26/2023] Open
Abstract
Support ector achine (SVM) is a newer machine learning algorithm for classification, while logistic regression (LR) is an older statistical classification method. Despite the numerous studies contrasting SVM and LR, new improvements such as bagging and ensemble have been applied to them since these comparisons were made. This study proposes a new hybrid model based on SVM and LR for predicting small events per variable (EPV). The performance of the hybrid, SVM, and LR models with different EPV values was evaluated using COVID-19 data from December 2019 to May 2020 provided by the WHO. The study found that the hybrid model had better classification performance than SVM and LR in terms of accuracy, mean squared error (MSE), and root mean squared error (RMSE) for different EPV values. This hybrid model is particularly important for medical authorities and practitioners working in the face of future pandemics.
Collapse
Affiliation(s)
- Noor Ilanie Nordin
- Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia Terengganu, Kuala Nerus 21030, Terengganu, Malaysia or (N.I.N.); (A.A.K.A.H.)
- Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Kelantan, Bukit Ilmu, Machang 18500, Kelantan, Malaysia
| | - Wan Azani Mustafa
- Faculty of Electrical Engineering & Technology, Pauh Putra Campus, Universiti Malaysia Perlis (UniMAP), Arau 02600, Perlis, Malaysia
- Centre of Excellence for Advanced Computing, Pauh Putra Campus, Universiti Malaysia Perlis (UniMAP), Arau 02600, Perlis, Malaysia
| | - Muhamad Safiih Lola
- Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia Terengganu, Kuala Nerus 21030, Terengganu, Malaysia or (N.I.N.); (A.A.K.A.H.)
- Special Interest Group on Modeling and Data Analytics (SIGMDA), Universiti Malaysia Terengganu, Kuala Nerus 21030, Terengganu, Malaysia
| | - Elissa Nadia Madi
- Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin (UniSZA), Besut Campus, Besut 22200, Terengganu, Malaysia;
| | - Anton Abdulbasah Kamil
- Faculty of Economics, Administrative and Social Sciences, Istanbul Gelisim University, Cihangir Mah. Şehit Jandarma Komando Er Hakan Öner Sk. No:1 Avcılar, İstanbul 34310, Turkey;
| | - Marah Doly Nasution
- Faculty of Teacher and Education, University Muhammadiyah Sumatera Utara, Jl. Kapten Muchtar Basri No.3, Glugur Darat II, Kec. Medan Tim., Kota Medan 20238, Sumatera Utara, Indonesia;
| | - Abdul Aziz K. Abdul Hamid
- Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia Terengganu, Kuala Nerus 21030, Terengganu, Malaysia or (N.I.N.); (A.A.K.A.H.)
- Special Interest Group on Applied Informatics and Intelligent Applications (AINIA), Universiti Malaysia Terengganu, Kuala Nerus 21030, Terengganu, Malaysia
| | - Nurul Hila Zainuddin
- Mathematics Department, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, Tanjong Malim 53900, Perak Darul Ridzuan, Malaysia;
| | - Elayaraja Aruchunan
- Department of Decision Science, Faculty of Business and Economics, University Malaya, Kuala Lumpur 50603, Malaysia;
| | - Mohd Tajuddin Abdullah
- Fellow Academy of Sciences Malaysia, Level 20, West Wing Tingkat 20, Menara MATRADE, Jalan Sultan Haji Ahmad Shah, Kuala Lumpur 50480, Malaysia;
| |
Collapse
|
5
|
Liu Y, Yin P, Cui J, Sun C, Chen L, Hong N. Postoperative Relapse Prediction in Patients With Ewing Sarcoma Using Computed Tomography-Based Radiomics Models Covering Tumor Per Se and Peritumoral Signatures. J Comput Assist Tomogr 2023; 47:766-773. [PMID: 37707407 PMCID: PMC10510843 DOI: 10.1097/rct.0000000000001475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 01/27/2023] [Indexed: 06/03/2023]
Abstract
OBJECTIVE We aimed to develop and validate a computed tomography (CT)-based radiomics model for early relapse prediction in patients with Ewing sarcoma (ES). METHODS We recruited 104 patients in this study. Tumor areas and areas with a tumor expansion of 3 mm were used as regions of interest for radiomics analysis. Six different models were constructed: Pre-CT, CT enhancement (CTE), Pre-CT +3 mm , CTE +3 mm , Pre-CT and CTE combined (ComB), and Pre-CT +3 mm and CTE +3 mm combined (ComB +3 mm ). All 3 classifiers used a grid search with 5-fold cross-validation to identify their optimal parameters, followed by repeat 5-fold cross-validation to evaluate the model performance based on these parameters. The average performance of the 5-fold cross-validation and the best one-fold performance of each model were evaluated. The AUC (area under the receiver operating characteristic curve) and accuracy were calculated to evaluate the models. RESULTS The 6 radiomics models performed well in predicting relapse in patients with ES using the 3 classifiers; the ComB and ComB +3 mm models performed better than the other models (AUC -best : 0.820-0.922/0.823-0.833 and 0.799-0.873/0.759-0.880 in the training and validation cohorts, respectively). Although the Pre-CT +3 mm , CTE +3 mm, and ComB +3 mm models covering tumor per se and peritumoral CT features preoperatively forecasted ES relapse, the model was not significantly improved. CONCLUSIONS The radiomics model performed well for early recurrence prediction in patients with ES, and the ComB and ComB +3 mm models may be superior to the other models.
Collapse
Affiliation(s)
- Ying Liu
- From the Department of Radiology, Peking University People's Hospital
| | - Ping Yin
- From the Department of Radiology, Peking University People's Hospital
| | - Jingjing Cui
- United Imaging Intelligence (Beijing) Co, Ltd., Beijing, People's Republic of China
| | - Chao Sun
- From the Department of Radiology, Peking University People's Hospital
| | - Lei Chen
- From the Department of Radiology, Peking University People's Hospital
| | - Nan Hong
- From the Department of Radiology, Peking University People's Hospital
| |
Collapse
|
6
|
Álvarez VE, Quiroga MP, Centrón D. Identification of a Specific Biomarker of Acinetobacter baumannii Global Clone 1 by Machine Learning and PCR Related to Metabolic Fitness of ESKAPE Pathogens. mSystems 2023:e0073422. [PMID: 37184409 DOI: 10.1128/msystems.00734-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023] Open
Abstract
Since the emergence of high-risk clones worldwide, constant investigations have been undertaken to comprehend the molecular basis that led to their prevalent dissemination in nosocomial settings over time. So far, the complex and multifactorial genetic traits of this type of epidemic clones have allowed only the identification of biomarkers with low specificity. A machine learning algorithm was able to recognize unequivocally a biomarker for early and accurate detection of Acinetobacter baumannii global clone 1 (GC1), one of the most disseminated high-risk clones. A support vector machine model identified the U1 sequence with a length of 367 nucleotides that matched a fragment of the moaCB gene, which encodes the molybdenum cofactor biosynthesis C and B proteins. U1 differentiates specifically between A. baumannii GC1 and non-GC1 strains, becoming a suitable biomarker capable of being translated into clinical settings as a molecular typing method for early diagnosis based on PCR as shown here. Since the metabolic pathways of Mo enzymes have been recognized as putative therapeutic targets for ESKAPE (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species) pathogens, our findings highlight that machine learning can also be useful in knowledge gaps of high-risk clones and provides noteworthy support to the literature to identify relevant nosocomial biomarkers for other multidrug-resistant high-risk clones. IMPORTANCE A. baumannii GC1 is an important high-risk clone that rapidly develops extreme drug resistance in the nosocomial niche. Furthermore, several strains have been identified worldwide in environmental samples, exacerbating the risk of human interactions. Early diagnosis is mandatory to limit its dissemination and to outline appropriate antibiotic stewardship schedules. A region with a length of 367 bp (U1) within the moaCB gene that is not subjected to lateral genetic transfer or to antibiotic pressures was successfully found by a support vector machine model that predicts A. baumannii GC1 strains. At the same time, research on the group of Mo enzymes proposed this metabolic pathway related to the superbug's metabolism as a potential future drug target site for ESKAPE pathogens due to its central role in bacterial fitness during infection. These findings confirm that machine learning used for the identification of biomarkers of high-risk lineages can also serve to identify putative novel therapeutic target sites.
Collapse
Affiliation(s)
- Verónica Elizabeth Álvarez
- Laboratorio de Investigaciones en Mecanismos de Resistencia a Antibióticos (LIMRA), Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Tecnológicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
| | - María Paula Quiroga
- Laboratorio de Investigaciones en Mecanismos de Resistencia a Antibióticos (LIMRA), Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Tecnológicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Nodo de Bioinformática. Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Técnicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
| | - Daniela Centrón
- Laboratorio de Investigaciones en Mecanismos de Resistencia a Antibióticos (LIMRA), Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Tecnológicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
| |
Collapse
|
7
|
Wu W, Wang Y, Tang J, Yu M, Yuan J, Zhang G. Developing and evaluating a machine-learning-based algorithm to predict the incidence and severity of ARDS with continuous non-invasive parameters from ordinary monitors and ventilators. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 230:107328. [PMID: 36640602 DOI: 10.1016/j.cmpb.2022.107328] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 12/11/2022] [Accepted: 12/27/2022] [Indexed: 06/17/2023]
Abstract
OBJECTIVES Major observational studies report that the mortality rate of acute respiratory distress syndrome (ARDS) is close to 40%. Different treatment strategies are required for each patient, according to the degree of ARDS. Early prediction of ARDS is helpful to implement targeted drug therapy and mechanical ventilation strategies for patients with different degrees of potential ARDS. In this paper, a new dynamic prediction machine learning model for ARDS incidence and severity is established and evaluated based on 28 parameters from ordinary monitors and ventilators, capable of dynamic prediction of the incidence and severity of ARDS. This new method is expected to meet the clinical practice requirements of user-friendliness and timeliness for wider application. METHODS A total of 4738 hospitalized patients who required ICU care from 159 hospitals are employed in this study. The models are trained by standardized data from electronic medical records. There are 28 structured, continuous non-invasive parameters that are recorded every hour. Seven machine learning models using only continuous, non-invasive parameters are developed for dynamic prediction and compared with methods trained by complete parameters and the traditional risk adjustment method (i.e., oxygenation saturation index method). RESULTS The optimal prediction performance (area under the curve) of the ARDS incidence and severity prediction models built using continuous noninvasive parameters reached0.8691 and 0.7765, respectively. In terms of mild and severe ARDS prediction, the AUC values are both above 0.85. The performance of the model using only continuous non-invasive parameters have an AUC of 0.0133 lower, in comparison with that employing a complete feature set, including continuous non-invasive parameters, demographic information, laboratory parameters and clinical natural language text. CONCLUSIONS A machine learning method was developed in this study using only continuous non-invasive parameters for ARDS incidence and severity prediction. Because the continuous non-invasive parameters can be easily obtained from ordinary monitors and ventilators, the method presented in this study is friendly and convenient to use. It is expected to be applied in pre-hospital setting for early ARDS warning.
Collapse
Affiliation(s)
- Wenzhu Wu
- Chongqing Medical and Pharmaceutical College, Chongqing, China
| | - Yalin Wang
- Department of Medical Engineering, Medical Supplies Center of PLA General Hospital, Beijing, China
| | - Junquan Tang
- Chongqing Medical and Pharmaceutical College, Chongqing, China
| | - Ming Yu
- Institute of Medical Support Technology, Tianjin, China
| | - Jing Yuan
- Institute of Medical Support Technology, Tianjin, China
| | - Guang Zhang
- Institute of Medical Support Technology, Tianjin, China
| |
Collapse
|
8
|
Machine Learning-Based Mortality Prediction Model for Critically Ill Cancer Patients Admitted to the Intensive Care Unit (CanICU). Cancers (Basel) 2023; 15:cancers15030569. [PMID: 36765528 PMCID: PMC9913129 DOI: 10.3390/cancers15030569] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/30/2022] [Accepted: 01/13/2023] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Although cancer patients are increasingly admitted to the intensive care unit (ICU) for cancer- or treatment-related complications, improved mortality prediction remains a big challenge. This study describes a new ML-based mortality prediction model for critically ill cancer patients admitted to ICU. PATIENTS AND METHODS We developed CanICU, a machine learning-based 28-day mortality prediction model for adult cancer patients admitted to ICU from Medical Information Mart for Intensive Care (MIMIC) database in the USA (n = 766), Yonsei Cancer Center (YCC, n = 3571), and Samsung Medical Center in Korea (SMC, n = 2563) from 2 January 2008 to 31 December 2017. The accuracy of CanICU was measured using sensitivity, specificity, and area under the receiver operating curve (AUROC). RESULTS A total of 6900 patients were included, with a 28-day mortality of 10.2%/12.7%/36.6% and a 1-year mortality of 30.0%/36.6%/58.5% in the YCC, SMC, and MIMIC-III cohort. Nine clinical and laboratory factors were used to construct the classifier using a random forest machine-learning algorithm. CanICU had 96% sensitivity/73% specificity with the area under the receiver operating characteristic (AUROC) of 0.94 for 28-day, showing better performance than current prognostic models, including the Acute Physiology and Chronic Health Evaluation (APACHE) or Sequential Organ Failure Assessment (SOFA) score. Application of CanICU in two external data sets across the countries yielded 79-89% sensitivity, 58-59% specificity, and 0.75-0.78 AUROC for 28-day mortality. The CanICU score was also correlated with one-year mortality with 88-93% specificity. CONCLUSION CanICU offers improved performance for predicting mortality in critically ill cancer patients admitted to ICU. A user-friendly online implementation is available and should be valuable for better mortality risk stratification to allocate ICU care for cancer patients.
Collapse
|
9
|
Xing M, Zhang Y, Yu H, Yang Z, Li X, Li Q, Zhao Y, Zhao Z, Luo Y. Predict DLBCL patients' recurrence within two years with Gaussian mixture model cluster oversampling and multi-kernel learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107103. [PMID: 36088813 DOI: 10.1016/j.cmpb.2022.107103] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 08/05/2022] [Accepted: 08/30/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE Diffuse large B-cell lymphoma (DLBCL) is common in adults' non-Hodgkin's lymphoma. Relapse mainly occurs within two years after diagnosis and has a poor prognosis. Relapse after two years is less frequent and has a better prognosis. In this work, we constructed a relapse prediction model for diffuse large B-cell lymphoma patients within two years, expecting to provide a reference for Clinicians to implement individualized treatment. METHOD We propose a secondary-level class imbalance method based on Gaussian mixture model (GMM) clustering resampling to balance the data. Then use a multi-kernel support vector machine(SVM) to inscribe heterogeneous clinical data. Finally, merging them to identify recurrence patients within two years. RESULTS Among all the class imbalance methods in this work, Inverse Weighted -GMM +SMOTEENN has the best performance. Compared with NO-GMM (Directl use the SMOTEENN without the GMM clustering process), its Area Under the ROC Curve(AUC) increases by 8.75%, and ECE and brier scores decrease 2.07% and 3.09%, respectively. Among the four classification algorithms in this work, Multiple kernel learning (MKL) has the most minimized brier scores and expected calibration error(ECE), the largest AUC, accuracy, Recall, precision and F1, has the best discrimination and calibration. CONCLUSION Our inverse weighted -GMM+SMOTEENN+MKL (GMM-SENN-MKL) method can handle data class imbalance and clinical heterogeneity data well and can be used to predict recurrence in DLBCL patients.
Collapse
Affiliation(s)
- Meng Xing
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Yanbo Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Hongmei Yu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Zhenhuan Yang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Xueling Li
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Qiong Li
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Yanlin Zhao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Zhiqiang Zhao
- Department of Hematology, Shanxi Cancer Hospital, Taiyuan, China.
| | - Yanhong Luo
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China.
| |
Collapse
|
10
|
Yang F, Zhang J, Chen W, Lai Y, Wang Y, Zou Q. DeepMPM: a mortality risk prediction model using longitudinal EHR data. BMC Bioinformatics 2022; 23:423. [PMID: 36241976 PMCID: PMC9561325 DOI: 10.1186/s12859-022-04975-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 09/28/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate precision approaches have far not been developed for modeling mortality risk in intensive care unit (ICU) patients. Conventional mortality risk prediction methods can hardly extract the information in longitudinal electronic medical records (EHRs) effectively, since they simply aggregate the heterogeneous variables in EHRs, ignoring the complex relationship and interactions between variables and the time dependence in longitudinal records. Recently deep learning approaches have been widely used in modeling longitudinal EHR data. However, most existing deep learning-based risk prediction approaches only use the information of a single disease, neglecting the interactions between multiple diseases and different conditions. RESULTS In this paper, we address this unmet need by leveraging disease and treatment information in EHRs to develop a mortality risk prediction model based on deep learning (DeepMPM). DeepMPM utilizes a two-level attention mechanism, i.e. visit-level and variable-level attention, to derive the representation of patient risk status from patient's multiple longitudinal medical records. Benefiting from using EHR of patients with multiple diseases and different conditions, DeepMPM can achieve state-of-the-art performances in mortality risk prediction. CONCLUSIONS Experiment results on MIMIC III database demonstrates that with the disease and treatment information DeepMPM can achieve a good performance in terms of Area Under ROC Curve (0.85). Moreover, DeepMPM can successfully model the complex interactions between diseases to achieve better representation learning of disease and treatment than other deep learning approaches, so as to improve the accuracy of mortality prediction. A case study also shows that DeepMPM offers the potential to provide users with insights into feature correlation in data as well as model behavior for each prediction.
Collapse
Affiliation(s)
- Fan Yang
- Shenzhen Research Institute of Xiamen University, Shenzhen, China. .,Department of Automation, Xiamen University, Xiamen, China.
| | - Jian Zhang
- Department of Automation, Xiamen University, Xiamen, China
| | - Wanyi Chen
- Department of Automation, Xiamen University, Xiamen, China
| | - Yongxuan Lai
- School of informatics/Shenzhen Research Institute, Xiamen University, Xiamen/Shenzhen, China.
| | - Ying Wang
- Department of Automation, Xiamen University, Xiamen, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
11
|
Mehrpour O, Saeedi F, Hoyte C, Goss F, Shirazi FM. Utility of support vector machine and decision tree to identify the prognosis of metformin poisoning in the United States: analysis of National Poisoning Data System. BMC Pharmacol Toxicol 2022; 23:49. [PMID: 35831909 PMCID: PMC9281002 DOI: 10.1186/s40360-022-00588-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 06/27/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND With diabetes incidence growing globally and metformin still being the first-line for its treatment, metformin's toxicity and overdose have been increasing. Hence, its mortality rate is increasing. For the first time, we aimed to study the efficacy of machine learning algorithms in predicting the outcome of metformin poisoning using two well-known classification methods, including support vector machine (SVM) and decision tree (DT). METHODS This study is a retrospective cohort study of National Poison Data System (NPDS) data, the largest data repository of poisoning cases in the United States. The SVM and DT algorithms were developed using training and test datasets. We also used precision-recall and ROC curves and Area Under the Curve value (AUC) for model evaluation. RESULTS Our model showed that acidosis, hypoglycemia, electrolyte abnormality, hypotension, elevated anion gap, elevated creatinine, tachycardia, and renal failure are the most important determinants in terms of outcome prediction of metformin poisoning. The average negative predictive value for the decision tree and SVM models was 92.30 and 93.30. The AUC of the ROC curve of the decision tree for major, minor, and moderate outcomes was 0.92, 0.92, and 0.89, respectively. While this figure of SVM model for major, minor, and moderate outcomes was 0.98, 0.90, and 0.82, respectively. CONCLUSIONS In order to predict the prognosis of metformin poisoning, machine learning algorithms might help clinicians in the management and follow-up of metformin poisoning cases.
Collapse
Affiliation(s)
- Omid Mehrpour
- Data Science Institute, Southern Methodist University, Dallas, TX, USA. .,Rocky Mountain Poison & Drug Safety, Denver Health and Hospital Authority, Denver, CO, USA.
| | - Farhad Saeedi
- Medical Toxicology and Drug Abuse Research Center (MTDRC), Birjand University of Medical Sciences (BUMS), Birjand, Iran.,Student Research Committee, Birjand University of Medical Sciences, Birjand, Iran
| | - Christopher Hoyte
- Student Research Committee, Birjand University of Medical Sciences, Birjand, Iran.,University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Foster Goss
- University of Colorado Hospital, Aurora, CO, USA.,Department of Emergency Medicine, University of Colorado Hospital, Aurora, CO, USA
| | - Farshad M Shirazi
- Arizona Poison & Drug Information Center, the University of Arizona, College of Pharmacy and University of Arizona, College of Medicine, Tucson, AZ, USA
| |
Collapse
|
12
|
Predicting Characteristics Associated with Breast Cancer Survival Using Multiple Machine Learning Approaches. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:1249692. [PMID: 35509861 PMCID: PMC9060999 DOI: 10.1155/2022/1249692] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 03/29/2022] [Indexed: 11/23/2022]
Abstract
Breast cancer is one of the most commonly diagnosed female disorders globally. Numerous studies have been conducted to predict survival markers, although the majority of these analyses were conducted using simple statistical techniques. In lieu of that, this research employed machine learning approaches to develop models for identifying and visualizing relevant prognostic indications of breast cancer survival rates. A comprehensive hospital-based breast cancer dataset was collected from the National Cancer Institute's SEER Program's November 2017 update, which offers population-based cancer statistics. The dataset included female patients diagnosed between 2006 and 2010 with infiltrating duct and lobular carcinoma breast cancer (SEER primary cites recode NOS histology codes 8522/3). The dataset included nine predictor factors and one predictor variable that were linked to the patients' survival status (alive or dead). To identify important prognostic markers associated with breast cancer survival rates, prediction models were constructed using K-nearest neighbor (K-NN), decision tree (DT), gradient boosting (GB), random forest (RF), AdaBoost, logistic regression (LR), voting classifier, and support vector machine (SVM). All methods yielded close results in terms of model accuracy and calibration measures, with the lowest achieved from logistic regression (accuracy = 80.57 percent) and the greatest acquired from the random forest (accuracy = 94.64 percent). Notably, the multiple machine learning algorithms utilized in this research achieved high accuracy, suggesting that these approaches might be used as alternative prognostic tools in breast cancer survival studies, especially in the Asian area.
Collapse
|
13
|
Wang P, Xu J, Wang C, Zhang G, Wang H. Method of non-invasive parameters for predicting the probability of early in-hospital death of patients in intensive care unit. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
14
|
Lu Z, Chen H, Jiao X, Zhou W, Han W, Li S, Liu C, Gong J, Li J, Zhang X, Wang X, Peng Z, Qi C, Wang Z, Li Y, Li J, Li Y, Brock M, Zhang H, Shen L. Prediction of immune checkpoint inhibition with immune oncology-related gene expression in gastrointestinal cancer using a machine learning classifier. J Immunother Cancer 2021; 8:jitc-2020-000631. [PMID: 32792359 PMCID: PMC7430448 DOI: 10.1136/jitc-2020-000631] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/22/2020] [Indexed: 12/20/2022] Open
Abstract
Immune checkpoint inhibitors (ICIs) have revolutionized the therapeutic landscape of gastrointestinal cancer. However, biomarkers correlated with the efficacy of ICIs in gastrointestinal cancer are still lacking. In this study, we performed 395-plex immune oncology (IO)-related gene target sequencing in tumor samples from 96 patients with metastatic gastrointestinal cancer patients treated with ICIs, and a linear support vector machine learning strategy was applied to construct a predictive model. ResultsAll 96 patients were randomly assigned into the discovery (n=72) and validation (n=24) cohorts. A 24-gene RNA signature (termed the IO-score) was constructed from 395 immune-related gene expression profiling using a machine learning strategy to identify patients who might benefit from ICIs. The durable clinical benefit rate was higher in patients with a high IO-score than in patients with a low IO-score (discovery cohort: 92.0% vs 4.3%, p<0.001; validation cohort: 85.7% vs 17.6%, p=0.004). The IO-score may exhibit a higher predictive value in the discovery (area under the receiver operating characteristic curve (AUC)=0.97)) and validation (AUC=0.74) cohorts compared with the programmed death ligand 1 positivity (AUC=0.52), tumor mutational burden (AUC=0.69) and microsatellite instability status (AUC=0.59) in the combined cohort. Moreover, patients with a high IO-score also exhibited a prolonged overall survival compared with patients with a low IO-score (discovery cohort: HR, 0.29; 95% CI 0.15 to 0.56; p=0.003; validation cohort: HR, 0.32; 95% CI 0.10 to 1.05; p=0.04). Taken together, our results indicated the potential of IO-score as a biomarker for immunotherapy in patients with gastrointestinal cancers.
Collapse
Affiliation(s)
- Zhihao Lu
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Huan Chen
- Genecast Precision Medicine Technology Institute, Beijing, China
| | - Xi Jiao
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Wei Zhou
- Genecast Precision Medicine Technology Institute, Beijing, China
| | - Wenbo Han
- Genecast Precision Medicine Technology Institute, Beijing, China
| | - Shuang Li
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Chang Liu
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Jifang Gong
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Jian Li
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Xiaotian Zhang
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Xicheng Wang
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Zhi Peng
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Changsong Qi
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Zhenghang Wang
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Yanyan Li
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Jie Li
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Yan Li
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Malcolm Brock
- Department of Surgery, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Henghui Zhang
- Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Lin Shen
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| |
Collapse
|
15
|
Adhikari S, Normand SL, Bloom J, Shahian D, Rose S. Revisiting performance metrics for prediction with rare outcomes. Stat Methods Med Res 2021; 30:2352-2366. [PMID: 34468239 DOI: 10.1177/09622802211038754] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Machine learning algorithms are increasingly used in the clinical literature, claiming advantages over logistic regression. However, they are generally designed to maximize the area under the receiver operating characteristic curve. While area under the receiver operating characteristic curve and other measures of accuracy are commonly reported for evaluating binary prediction problems, these metrics can be misleading. We aim to give clinical and machine learning researchers a realistic medical example of the dangers of relying on a single measure of discriminatory performance to evaluate binary prediction questions. Prediction of medical complications after surgery is a frequent but challenging task because many post-surgery outcomes are rare. We predicted post-surgery mortality among patients in a clinical registry who received at least one aortic valve replacement. Estimation incorporated multiple evaluation metrics and algorithms typically regarded as performing well with rare outcomes, as well as an ensemble and a new extension of the lasso for multiple unordered treatments. Results demonstrated high accuracy for all algorithms with moderate measures of cross-validated area under the receiver operating characteristic curve. False positive rates were <1%, however, true positive rates were <7%, even when paired with a 100% positive predictive value, and graphical representations of calibration were poor. Similar results were seen in simulations, with the addition of high area under the receiver operating characteristic curve (>90%) accompanying low true positive rates. Clinical studies should not primarily report only area under the receiver operating characteristic curve or accuracy.
Collapse
Affiliation(s)
- Samrachana Adhikari
- Department of Population Health, 12296New York University School of Medicine, USA
| | | | - Jordan Bloom
- Department of Surgery, 2348Massachusetts General Hospital, USA
| | - David Shahian
- Department of Surgery, 2348Massachusetts General Hospital, USA
| | - Sherri Rose
- Center for Health Policy, 6429Stanford University, USA
| |
Collapse
|
16
|
Liu Z, Maiti T, Bender AR. A Role for Prior Knowledge in Statistical Classification of the Transition from Mild Cognitive Impairment to Alzheimer's Disease. J Alzheimers Dis 2021; 83:1859-1875. [PMID: 34459391 DOI: 10.3233/jad-201398] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
BACKGROUND The transition from mild cognitive impairment (MCI) to dementia is of great interest to clinical research on Alzheimer's disease and related dementias. This phenomenon also serves as a valuable data source for quantitative methodological researchers developing new approaches for classification. However, the growth of machine learning (ML) approaches for classification may falsely lead many clinical researchers to underestimate the value of logistic regression (LR), which often demonstrates classification accuracy equivalent or superior to other ML methods. Further, when faced with many potential features that could be used for classifying the transition, clinical researchers are often unaware of the relative value of different approaches for variable selection. OBJECTIVE The present study sought to compare different methods for statistical classification and for automated and theoretically guided feature selection techniques in the context of predicting conversion from MCI to dementia. METHODS We used data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) to evaluate different influences of automated feature preselection on LR and support vector machine (SVM) classification methods, in classifying conversion from MCI to dementia. RESULTS The present findings demonstrate how similar performance can be achieved using user-guided, clinically informed pre-selection versus algorithmic feature selection techniques. CONCLUSION These results show that although SVM and other ML techniques are capable of relatively accurate classification, similar or higher accuracy can often be achieved by LR, mitigating SVM's necessity or value for many clinical researchers.
Collapse
Affiliation(s)
- Zihuan Liu
- Department of Statistics, Michigan State University, East Lansing, MI, USA
| | - Tapabrata Maiti
- Department of Statistics, Michigan State University, East Lansing, MI, USA
| | - Andrew R Bender
- Department of Epidemiology and Biostatistics, College of Human Medicine, Michigan State University, East Lansing, MI, USA
| | | |
Collapse
|
17
|
Wu WT, Li YJ, Feng AZ, Li L, Huang T, Xu AD, Lyu J. Data mining in clinical big data: the frequently used databases, steps, and methodological models. Mil Med Res 2021; 8:44. [PMID: 34380547 PMCID: PMC8356424 DOI: 10.1186/s40779-021-00338-z] [Citation(s) in RCA: 198] [Impact Index Per Article: 49.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 08/03/2021] [Indexed: 02/07/2023] Open
Abstract
Many high quality studies have emerged from public databases, such as Surveillance, Epidemiology, and End Results (SEER), National Health and Nutrition Examination Survey (NHANES), The Cancer Genome Atlas (TCGA), and Medical Information Mart for Intensive Care (MIMIC); however, these data are often characterized by a high degree of dimensional heterogeneity, timeliness, scarcity, irregularity, and other characteristics, resulting in the value of these data not being fully utilized. Data-mining technology has been a frontier field in medical research, as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models. Therefore, data mining has unique advantages in clinical big-data research, especially in large-scale medical public databases. This article introduced the main medical public database and described the steps, tasks, and models of data mining in simple language. Additionally, we described data-mining methods along with their practical applications. The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.
Collapse
Affiliation(s)
- Wen-Tao Wu
- Department of Clinical Research, The First Affiliated Hospital of Jinan University, Tianhe District, 613 W. Huangpu Avenue, Guangzhou, 510632, Guangdong, China.,School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, 710061, Shaanxi, China
| | - Yuan-Jie Li
- Department of Human Anatomy, Histology and Embryology, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an, 710061, Shaanxi, China
| | - Ao-Zi Feng
- Department of Clinical Research, The First Affiliated Hospital of Jinan University, Tianhe District, 613 W. Huangpu Avenue, Guangzhou, 510632, Guangdong, China
| | - Li Li
- Department of Clinical Research, The First Affiliated Hospital of Jinan University, Tianhe District, 613 W. Huangpu Avenue, Guangzhou, 510632, Guangdong, China
| | - Tao Huang
- Department of Clinical Research, The First Affiliated Hospital of Jinan University, Tianhe District, 613 W. Huangpu Avenue, Guangzhou, 510632, Guangdong, China
| | - An-Ding Xu
- Department of Neurology, The First Affiliated Hospital of Jinan University, Tianhe District, 613 W. Huangpu Avenue, Guangzhou, 510632, Guangdong, China.
| | - Jun Lyu
- Department of Clinical Research, The First Affiliated Hospital of Jinan University, Tianhe District, 613 W. Huangpu Avenue, Guangzhou, 510632, Guangdong, China.
| |
Collapse
|
18
|
Liu LJ, Ortiz-Soriano V, Neyra JA, Chen J. KGDAL: Knowledge Graph Guided Double Attention LSTM for Rolling Mortality Prediction for AKI-D Patients. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2021; 2021:53. [PMID: 34541583 PMCID: PMC8445228 DOI: 10.1145/3459930.3469513] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
With the rapid accumulation of electronic health record (EHR) data, deep learning (DL) models have exhibited promising performance on patient risk prediction. Recent advances have also demonstrated the effectiveness of knowledge graphs (KG) in providing valuable prior knowledge for further improving DL model performance. However, it is still unclear how KG can be utilized to encode high-order relations among clinical concepts and how DL models can make full use of the encoded concept relations to solve real-world healthcare problems and to interpret the outcomes. We propose a novel knowledge graph guided double attention LSTM model named KGDAL for rolling mortality prediction for critically ill patients with acute kidney injury requiring dialysis (AKI-D). KGDAL constructs a KG-based two-dimension attention in both time and feature spaces. In the experiment with two large healthcare datasets, we compared KGDAL with a variety of rolling mortality prediction models and conducted an ablation study to test the effectiveness, efficacy, and contribution of different attention mechanisms. The results showed that KGDAL clearly outperformed all the compared models. Also, KGDAL-derived patient risk trajectories may assist healthcare providers to make timely decisions and actions. The source code, sample data, and manual of KGDAL are available at https://github.com/lucasliu0928/KGDAL.
Collapse
Affiliation(s)
- Lucas Jing Liu
- Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA
| | - Victor Ortiz-Soriano
- Division of Nephrology, Bone and Mineral Metabolism, University of Kentucky Medical Center, Lexington, Kentucky, USA
| | - Javier A Neyra
- Division of Nephrology, Bone and Mineral Metabolism, University of Kentucky Medical Center, Lexington, Kentucky, USA
| | - Jin Chen
- Department of Internal Medicine, Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA
| |
Collapse
|
19
|
A Machine Learning Classifier Improves Mortality Prediction Compared With Pediatric Logistic Organ Dysfunction-2 Score: Model Development and Validation. Crit Care Explor 2021; 3:e0426. [PMID: 34036277 PMCID: PMC8133049 DOI: 10.1097/cce.0000000000000426] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Supplemental Digital Content is available in the text. Objectives: To determine whether machine learning algorithms can better predict PICU mortality than the Pediatric Logistic Organ Dysfunction-2 score. Design: Retrospective study. Setting: Quaternary care medical-surgical PICU. Patients: All patients admitted to the PICU from 2013 to 2019. Interventions: None. Measurements and Main Results: We investigated the performance of various machine learning algorithms using the same variables used to calculate the Pediatric Logistic Organ Dysfunction-2 score to predict PICU mortality. We used 10,194 patient records from 2013 to 2017 for training and 4,043 patient records from 2018 to 2019 as a holdout validation cohort. Mortality rate was 3.0% in the training cohort and 3.4% in the validation cohort. The best performing algorithm was a random forest model (area under the receiver operating characteristic curve, 0.867 [95% CI, 0.863–0.895]; area under the precision-recall curve, 0.327 [95% CI, 0.246–0.414]; F1, 0.396 [95% CI, 0.321–0.468]) and significantly outperformed the Pediatric Logistic Organ Dysfunction-2 score (area under the receiver operating characteristic curve, 0.761 [95% CI, 0.713–0.810]; area under the precision-recall curve (0.239 [95% CI, 0.165–0.316]; F1, 0.284 [95% CI, 0.209–0.360]), although this difference was reduced after retraining the Pediatric Logistic Organ Dysfunction-2 logistic regression model at the study institution. The random forest model also showed better calibration than the Pediatric Logistic Organ Dysfunction-2 score, and calibration of the random forest model remained superior to the retrained Pediatric Logistic Organ Dysfunction-2 model. Conclusions: A machine learning model achieved better performance than a logistic regression-based score for predicting ICU mortality. Better estimation of mortality risk can improve our ability to adjust for severity of illness in future studies, although external validation is required before this method can be widely deployed.
Collapse
|
20
|
Guo C, Liu M, Lu M. A Dynamic Ensemble Learning Algorithm based on K-means for ICU mortality prediction. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107166] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
21
|
Yun K, Oh J, Hong TH, Kim EY. Prediction of Mortality in Surgical Intensive Care Unit Patients Using Machine Learning Algorithms. Front Med (Lausanne) 2021; 8:621861. [PMID: 33869245 PMCID: PMC8044535 DOI: 10.3389/fmed.2021.621861] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 03/12/2021] [Indexed: 12/03/2022] Open
Abstract
Objective: Predicting prognosis of in-hospital patients is critical. However, it is challenging to accurately predict the life and death of certain patients at certain period. To determine whether machine learning algorithms could predict in-hospital death of critically ill patients with considerable accuracy and identify factors contributing to the prediction power. Materials and Methods: Using medical data of 1,384 patients admitted to the Surgical Intensive Care Unit (SICU) of our institution, we investigated whether machine learning algorithms could predict in-hospital death using demographic, laboratory, and other disease-related variables, and compared predictions using three different algorithmic methods. The outcome measurement was the incidence of unexpected postoperative mortality which was defined as mortality without pre-existing not-for-resuscitation order that occurred within 30 days of the surgery or within the same hospital stay as the surgery. Results: Machine learning algorithms trained with 43 variables successfully classified dead and live patients with very high accuracy. Most notably, the decision tree showed the higher classification results (Area Under the Receiver Operating Curve, AUC = 0.96) than the neural network classifier (AUC = 0.80). Further analysis provided the insight that serum albumin concentration, total prenatal nutritional intake, and peak dose of dopamine drug played an important role in predicting the mortality of SICU patients. Conclusion: Our results suggest that machine learning algorithms, especially the decision tree method, can provide information on structured and explainable decision flow and accurately predict hospital mortality in SICU hospitalized patients.
Collapse
Affiliation(s)
- Kyongsik Yun
- Computation and Neural Systems, California Institute of Technology, Pasadena, CA, United States
| | - Jihoon Oh
- Department of Psychiatry, College of Medicine, Seoul St. Mary's Hospital, The Catholic University of Korea, Seoul, South Korea
| | - Tae Ho Hong
- Division of Hepato-Biliary and Pancreas Surgery, Department of Surgery, College of Medicine, Seoul St. Mary's Hospital, The Catholic University of Korea, Seoul, South Korea
| | - Eun Young Kim
- Division of Trauma and Surgical Critical Care, Department of Surgery, College of Medicine, Seoul St. Mary's Hospital, The Catholic University of Korea, Seoul, South Korea
| |
Collapse
|
22
|
Adeyinka DA, Muhajarine N. Time series prediction of under-five mortality rates for Nigeria: comparative analysis of artificial neural networks, Holt-Winters exponential smoothing and autoregressive integrated moving average models. BMC Med Res Methodol 2020; 20:292. [PMID: 33267817 PMCID: PMC7712624 DOI: 10.1186/s12874-020-01159-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 11/09/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate forecasting model for under-five mortality rate (U5MR) is essential for policy actions and planning. While studies have used traditional time series modeling techniques (e.g., autoregressive integrated moving average (ARIMA) and Holt-Winters smoothing exponential methods), their appropriateness to predict noisy and non-linear data (such as childhood mortality) has been debated. The objective of this study was to model long-term U5MR with group method of data handling (GMDH)-type artificial neural network (ANN), and compare the forecasts with the commonly used conventional statistical methods-ARIMA regression and Holt-Winters exponential smoothing models. METHODS The historical dataset of annual U5MR in Nigeria from 1964 to 2017 was obtained from the official website of World Bank. The optimal models for each forecasting methods were used for forecasting mortality rates to 2030 (ending of Sustainable Development Goal era). The predictive performances of the three methods were evaluated, based on root mean squared errors (RMSE), root mean absolute error (RMAE) and modified Nash-Sutcliffe efficiency (NSE) coefficient. Statistically significant differences in loss function between forecasts of GMDH-type ANN model compared to each of the ARIMA and Holt-Winters models were assessed with Diebold-Mariano (DM) test and Deming regression. RESULTS The modified NSE coefficient was slightly lower for Holt-Winters methods (96.7%), compared to GMDH-type ANN (99.8%) and ARIMA (99.6%). The RMSE of GMDH-type ANN (0.09) was lower than ARIMA (0.23) and Holt-Winters (2.87). Similarly, RMAE was lowest for GMDH-type ANN (0.25), compared with ARIMA (0.41) and Holt-Winters (1.20). From the DM test, the mean absolute error (MAE) was significantly lower for GMDH-type ANN, compared with ARIMA (difference = 0.11, p-value = 0.0003), and Holt-Winters model (difference = 0.62, p-value< 0.001). Based on the intercepts from Deming regression, the predictions from GMDH-type ANN were more accurate (β0 = 0.004 ± standard error: 0.06; 95% confidence interval: - 0.113 to 0.122). CONCLUSIONS GMDH-type neural network performed better in predicting and forecasting of under-five mortality rates for Nigeria, compared to the ARIMA and Holt-Winters models. Therefore, GMDH-type ANN might be more suitable for data with non-linear or unknown distribution, such as childhood mortality. GMDH-type ANN increases forecasting accuracy of childhood mortalities in order to inform policy actions in Nigeria.
Collapse
Affiliation(s)
- Daniel Adedayo Adeyinka
- Department of Community Health and Epidemiology, College of Medicine, University of Saskatchewan, Saskatoon, SK, S7N 5E5, Canada. .,Department of Public Health, Federal Ministry of Health, Abuja, Nigeria.
| | - Nazeem Muhajarine
- Department of Community Health and Epidemiology, College of Medicine, University of Saskatchewan, Saskatoon, SK, S7N 5E5, Canada.,Saskatchewan Population Health and Evaluation Research Unit, Saskatoon, Saskatchewan, Canada
| |
Collapse
|
23
|
Calderón JM, Álvarez-Pitti J, Cuenca I, Ponce F, Redon P. Development of a Minimally Invasive Screening Tool to Identify Obese Pediatric Population at Risk of Obstructive Sleep Apnea/Hypopnea Syndrome. Bioengineering (Basel) 2020; 7:E131. [PMID: 33086521 PMCID: PMC7712243 DOI: 10.3390/bioengineering7040131] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/14/2020] [Accepted: 10/17/2020] [Indexed: 01/20/2023] Open
Abstract
Obstructive sleep apnea syndrome is a reduction of the airflow during sleep which not only produces a reduction in sleep quality but also has major health consequences. The prevalence in the obese pediatric population can surpass 50%, and polysomnography is the current gold standard method for its diagnosis. Unfortunately, it is expensive, disturbing and time-consuming for experienced professionals. The objective is to develop a patient-friendly screening tool for the obese pediatric population to identify those children at higher risk of suffering from this syndrome. Three supervised learning classifier algorithms (i.e., logistic regression, support vector machine and AdaBoost) common in the field of machine learning were trained and tested on two very different datasets where oxygen saturation raw signal was recorded. The first dataset was the Childhood Adenotonsillectomy Trial (CHAT) consisting of 453 individuals, with ages between 5 and 9 years old and one-third of the patients being obese. Cross-validation was performed on the second dataset from an obesity assessment consult at the Pediatric Department of the Hospital General Universitario of Valencia. A total of 27 patients were recruited between 5 and 17 years old; 42% were girls and 63% were obese. The performance of each algorithm was evaluated based on key performance indicators (e.g., area under the curve, accuracy, recall, specificity and positive predicted value). The logistic regression algorithm outperformed (accuracy = 0.79, specificity = 0.96, area under the curve = 0.9, recall = 0.62 and positive predictive value = 0.94) the support vector machine and the AdaBoost algorithm when trained with the CHAT datasets. Cross-validation tests, using the Hospital General de Valencia (HG) dataset, confirmed the higher performance of the logistic regression algorithm in comparison with the others. In addition, only a minor loss of performance (accuracy = 0.75, specificity = 0.88, area under the curve = 0.85, recall = 0.62 and positive predictive value = 0.83) was observed despite the differences between the datasets. The proposed minimally invasive screening tool has shown promising performance when it comes to identifying children at risk of suffering obstructive sleep apnea syndrome. Moreover, it is ideal to be implemented in an outpatient consult in primary and secondary care.
Collapse
Affiliation(s)
- José Miguel Calderón
- Fundación Investigación Hospital Clínico (INCLIVA), Avda. Menedez Pelayo 4, 46010 Valencia, Spain; (J.M.C.); (I.C.)
| | - Julio Álvarez-Pitti
- Pediatric Department, Consorcio Hospital General Universitario de Valencia, Avda. Tres Cruces s/n, 46014 Valencia, Spain; (J.Á.-P.); (F.P.)
| | - Irene Cuenca
- Fundación Investigación Hospital Clínico (INCLIVA), Avda. Menedez Pelayo 4, 46010 Valencia, Spain; (J.M.C.); (I.C.)
| | - Francisco Ponce
- Pediatric Department, Consorcio Hospital General Universitario de Valencia, Avda. Tres Cruces s/n, 46014 Valencia, Spain; (J.Á.-P.); (F.P.)
- CIBEROBN, Health Institute Carlos III, Av. Monforte de Lemos, 3-5. Pavilion 11, 28029 Madrid, Spain
| | - Pau Redon
- Pediatric Department, Consorcio Hospital General Universitario de Valencia, Avda. Tres Cruces s/n, 46014 Valencia, Spain; (J.Á.-P.); (F.P.)
- CIBEROBN, Health Institute Carlos III, Av. Monforte de Lemos, 3-5. Pavilion 11, 28029 Madrid, Spain
| |
Collapse
|
24
|
Suda EY, Watari R, Matias AB, Sacco ICN. Recognition of Foot-Ankle Movement Patterns in Long-Distance Runners With Different Experience Levels Using Support Vector Machines. Front Bioeng Biotechnol 2020; 8:576. [PMID: 32596226 PMCID: PMC7300177 DOI: 10.3389/fbioe.2020.00576] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 05/12/2020] [Indexed: 01/09/2023] Open
Abstract
Running practice could generate musculoskeletal adaptations that modify the body mechanics and generate different biomechanical patterns for individuals with distinct levels of experience. Therefore, the aim of this study was to investigate whether foot-ankle kinetic and kinematic patterns can be used to discriminate different levels of experience in running practice of recreational runners using a machine learning approach. Seventy-eight long-distance runners (40.7 ± 7.0 years) were classified into less experienced (n = 24), moderately experienced (n = 23), or experienced (n = 31) runners using a fuzzy classification system, based on training frequency, volume, competitions and practice time. Three-dimensional kinematics of the foot-ankle and ground reaction forces (GRF) were acquired while the subjects ran on an instrumented treadmill at a self-selected speed (9.5–10.5 km/h). The foot-ankle kinematic and kinetic time series underwent a principal component analysis for data reduction, and combined with the discrete GRF variables to serve as inputs in a support vector machine (SVM), to determine if the groups could be distinguished between them in a one-vs.-all approach. The SVM models successfully classified all experience groups with significant crossvalidated accuracy rates and strong to very strong Matthew’s correlation coefficients, based on features from the input data. Overall, foot mechanics was different according to running experience level. The main distinguishing kinematic factors for the less experienced group were a greater dorsiflexion of the first metatarsophalangeal joint and a larger plantarflexion angles between the calcaneus and metatarsals, whereas the experienced runners displayed the opposite pattern for the same joints. As for the moderately experienced runners, although they were successfully classified, they did not present a visually identifiable running pattern, and seem to be an intermediate group between the less and more experienced runners. The results of this study have the potential to assist the development of training programs targeting improvement in performance and rehabilitation protocols for preventing injuries.
Collapse
Affiliation(s)
- Eneida Yuri Suda
- Physical Therapy, Speech and Occupational Therapy Department, School of Medicine, University of São Paulo, São Paulo, Brazil
| | - Ricky Watari
- Physical Therapy, Speech and Occupational Therapy Department, School of Medicine, University of São Paulo, São Paulo, Brazil
| | - Alessandra Bento Matias
- Physical Therapy, Speech and Occupational Therapy Department, School of Medicine, University of São Paulo, São Paulo, Brazil
| | - Isabel C N Sacco
- Physical Therapy, Speech and Occupational Therapy Department, School of Medicine, University of São Paulo, São Paulo, Brazil
| |
Collapse
|
25
|
Ha CW, Kim SH, Lee DH, Kim H, Park YB. Predictive validity of radiographic signs of complete discoid lateral meniscus in children using machine learning techniques. J Orthop Res 2020; 38:1279-1288. [PMID: 31883134 DOI: 10.1002/jor.24578] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/12/2019] [Accepted: 12/04/2019] [Indexed: 02/04/2023]
Abstract
The diagnostic utility of radiographic signs of complete discoid lateral meniscus remains controversial. This study aimed to investigate the diagnostic accuracy and determine which sign is most reliably detects the presence of a complete discoid lateral meniscus in children. A total of 141 knees (age 7-16) with complete discoid lateral meniscus and 141 age- and sex-matched knees with normal meniscus were included. The following radiographic signs were evaluated: lateral joint (LJ) space, fibular head (FH) height, lateral tibial spine (LTS) height, lateral tibial plateau (LTP) obliquity, lateral femoral condyle (LFC) squaring, LTP cupping, LFC notching, and prominence ratio of the femoral condyle. Prediction models were constructed using logistic regressions, decision trees, and random forest analyses. Receiver operating characteristic curves and area under the curve (AUC) were estimated to compare the diagnostic accuracy of the radiographic signs and model fit. The random forest model yielded the best diagnostic accuracy (AUC: 0.909), with 86.5% sensitivity and 82.2% specificity. LJ space height, FH height, and prominence ratio showed statistically large AUC compared with LTS height and LTP obliquity (P < .05 in all). The cut-off values for diagnosing discoid meniscus to be <12.55 mm for FH height, <0.804 for prominence ratio, and >6.6 mm for LJ space height when using the random forest model. On the basis of the results of this study, in clinical practice, LJ space height, FH height and prominence ratio could be easily used as supplementary tools for complete discoid lateral meniscus in children.
Collapse
Affiliation(s)
- Chul-Won Ha
- Department of Orthopedic Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Seong Hwan Kim
- Department of Orthopedic Surgery, Hyundae General Hospital, Chung-Ang University College of Medicine, Namyangju-si, Gyeonggi-do, South Korea
| | - Dong-Hoon Lee
- Department of Orthopedic Surgery, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, South Korea
| | - Hyojoon Kim
- Department of Computer Science, Princeton University, Princeton, New Jersey
| | - Yong-Beom Park
- Department of Orthopedic Surgery, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, South Korea
| |
Collapse
|
26
|
Guo C, Lu M, Chen J. An evaluation of time series summary statistics as features for clinical prediction tasks. BMC Med Inform Decis Mak 2020; 20:48. [PMID: 32138733 PMCID: PMC7059727 DOI: 10.1186/s12911-020-1063-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 02/23/2020] [Indexed: 11/23/2022] Open
Abstract
Background Clinical prediction tasks such as patient mortality, length of hospital stay, and disease diagnosis are highly important in critical care research. The existing studies for clinical prediction mainly used simple summary statistics to summarize information from physiological time series. However, this lack of statistics leads to a lack of information. In addition, using only maximum and minimum statistics to indicate patient features fails to provide an adequate explanation. Few studies have evaluated which summary statistics best represent physiological time series. Methods In this paper, we summarize 14 statistics describing the characteristics of physiological time series, including the central tendency, dispersion tendency, and distribution shape. Then, we evaluate the use of summary statistics of physiological time series as features for three clinical prediction tasks. To find the combinations of statistics that yield the best performances under different tasks, we use a cross-validation-based genetic algorithm to approximate the optimal statistical combination. Results By experiments using the EHRs of 6,927 patients, we obtained prediction results based on both single statistics and commonly used combinations of statistics under three clinical prediction tasks. Based on the results of an embedded cross-validation genetic algorithm, we obtained 25 optimal sets of statistical combinations and then tested their prediction results. By comparing the performances of prediction with single statistics and commonly used combinations of statistics with quantitative analyses of the optimal statistical combinations, we found that some statistics play central roles in patient representation and different prediction tasks have certain commonalities. Conclusion Through an in-depth analysis of the results, we found many practical reference points that can provide guidance for subsequent related research. Statistics that indicate dispersion tendency, such as min, max, and range, are more suitable for length of stay prediction tasks, and they also provide information for short-term mortality prediction. Mean and quantiles that reflect the central tendency of physiological time series are more suitable for mortality and disease prediction. Skewness and kurtosis perform poorly when used separately for prediction but can be used as supplementary statistics to improve the overall prediction effect.
Collapse
Affiliation(s)
- Chonghui Guo
- Institute of Systems Engineering, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian, 116024, People's Republic of China.
| | - Menglin Lu
- Institute of Systems Engineering, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian, 116024, People's Republic of China
| | - Jingfeng Chen
- Institute of Systems Engineering, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian, 116024, People's Republic of China.,Health Management Center, The First Affiliated Hospital of Zhengzhou University, No. 1 Longhu central ring road, Zhengzhou, 450052, People's Republic of China
| |
Collapse
|
27
|
Wang Z, Wang B, Zhou Y, Li D, Yin Y. Weight-based multiple empirical kernel learning with neighbor discriminant constraint for heart failure mortality prediction. J Biomed Inform 2019; 101:103340. [PMID: 31756495 DOI: 10.1016/j.jbi.2019.103340] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 06/14/2019] [Accepted: 11/10/2019] [Indexed: 11/16/2022]
Abstract
Heart Failure (HF) is one of the most common causes of hospitalization and is burdened by short-term (in-hospital) and long-term (6-12 month) mortality. Accurate prediction of HF mortality plays a critical role in evaluating early treatment effects. However, due to the lack of a simple and effective prediction model, mortality prediction of HF is difficult, resulting in a low rate of control. To handle this issue, we propose a Weight-based Multiple Empirical Kernel Learning with Neighbor Discriminant Constraint (WMEKL-NDC) method for HF mortality prediction. In our method, feature selection by calculating the F-value of each feature is first performed to identify the crucial clinical features. Then, different weights are assigned to each empirical kernel space according to the centered kernel alignment criterion. To make use of the discriminant information of samples, neighbor discriminant constraint is finally integrated into multiple empirical kernel learning framework. Extensive experiments were performed on a real clinical dataset containing 10, 198 in-patients records collected from Shanghai Shuguang Hospital in March 2009 and April 2016. Experimental results demonstrate that our proposed WMEKL-NDC method achieves a highly competitive performance for HF mortality prediction of in-hospital, 30-day and 1-year. Compared with the state-of-the-art multiple kernel learning and baseline algorithms, our proposed WMEKL-NDC is more accurate on mortality prediction Moreover, top 10 crucial clinical features are identified together with their meanings, which are very useful to assist clinicians in the treatment of HF disease.
Collapse
Affiliation(s)
- Zhe Wang
- Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China; Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.
| | - Bolu Wang
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Yangming Zhou
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.
| | - Dongdong Li
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Yichao Yin
- Shanghai Shuguang Hospital, Shanghai 200021, China
| |
Collapse
|
28
|
Feature Engineering for ICU Mortality Prediction Based on Hourly to Bi-Hourly Measurements. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9173525] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Mortality prediction for intensive care unit (ICU) patients is a challenging problem that requires extracting discriminative and informative features. This study presents a proof of concept for exploring features that can provide clinical insight. Through a feature engineering approach, it is attempted to improve ICU mortality prediction in field conditions with low frequently measured data (i.e., hourly to bi-hourly). Features are explored by investigating the vital signs measurements of ICU patients, labelled with mortality or survival at discharge. The vital signs of interest in this study are heart and respiration rate, oxygen saturation and blood pressure. The latter comprises systolic, diastolic and mean arterial pressure. In the feature exploration process, it is aimed to extract simple and interpretable features that can provide clinical insight. For this purpose, a classifier is required that maximises the margin between the two classes (i.e., survival and mortality) with minimum tolerance to misclassification errors. Moreover, it preferably has to provide a linear decision surface in the original feature space without mapping to an unlimited dimensionality feature space. Therefore, a linear hard margin support vector machine (SVM) classifier is suggested. The extracted features are grouped in three categories: statistical, dynamic and physiological. Each category plays an important role in enhancing classification error performance. After extracting several features within the three categories, a manual feature fine-tuning is applied to consider only the most efficient features. The final classification, considering mortality as the positive class, resulted in an accuracy of 91.56 % , sensitivity of 90.59 % , precision of 86.52 % and F 1 -score of 88.50 % . The obtained results show that the proposed feature engineering approach and the extracted features are valid to be considered and further enhanced for the mortality prediction purpose. Moreover, the proposed feature engineering approach moved the modelling methodology from black-box modelling to grey-box modelling in combination with the powerful classifier of SVMs.
Collapse
|
29
|
Yamashita K, Hatae R, Hiwatashi A, Togao O, Kikuchi K, Momosaka D, Yamashita Y, Kuga D, Hata N, Yoshimoto K, Suzuki S, Iwaki T, Iihara K, Honda H. Predicting TERT promoter mutation using MR images in patients with wild-type IDH1 glioblastoma. Diagn Interv Imaging 2019; 100:411-419. [DOI: 10.1016/j.diii.2019.02.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 02/19/2019] [Accepted: 02/21/2019] [Indexed: 01/04/2023]
|
30
|
Souza J, Santos JV, Canedo VB, Betanzos A, Alves D, Freitas A. Importance of coding co-morbidities for APR-DRG assignment: Focus on cardiovascular and respiratory diseases. Health Inf Manag 2019; 49:47-57. [PMID: 31043088 DOI: 10.1177/1833358319840575] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
BACKGROUND The All Patient-Refined Diagnosis-Related Groups (APR-DRGs) system has adjusted the basic DRG structure by incorporating four severity of illness (SOI) levels, which are used for determining hospital payment. A comprehensive report of all relevant diagnoses, namely the patient's underlying co-morbidities, is a key factor for ensuring that SOI determination will be adequate. OBJECTIVE In this study, we aimed to characterise the individual impact of co-morbidities on APR-DRG classification and hospital funding in the context of respiratory and cardiovascular diseases. METHODS Using 6 years of coded clinical data from a nationwide Portuguese inpatient database and support vector machine (SVM) models, we simulated and explored the APR-DRG classification to understand its response to individual removal of Charlson and Elixhauser co-morbidities. We also estimated the amount of hospital payments that could have been lost when co-morbidities are under-reported. RESULTS In our scenario, most Charlson and Elixhauser co-morbidities did considerably influence SOI determination but had little impact on base APR-DRG assignment. The degree of influence of each co-morbidity on SOI was, however, quite specific to the base APR-DRG. Under-coding of all studied co-morbidities led to losses in hospital payments. Furthermore, our results based on the SVM models were consistent with overall APR-DRG grouping logics. CONCLUSION AND IMPLICATIONS Comprehensive reporting of pre-existing or newly acquired co-morbidities should be encouraged in hospitals as they have an important influence on SOI assignment and thus on hospital funding. Furthermore, we recommend that future guidelines to be used by medical coders should include specific rules concerning coding of co-morbidities.
Collapse
Affiliation(s)
- Julio Souza
- Faculty of Medicine of the University of Porto, Portugal.,CINTESIS - Center for Health Technology and Services Research, Portugal
| | - João Vasco Santos
- Faculty of Medicine of the University of Porto, Portugal.,CINTESIS - Center for Health Technology and Services Research, Portugal.,Public Health Unit, ACES Grande Porto VIII - Espinho/Gaia, Portugal
| | | | | | - Domingos Alves
- CINTESIS - Center for Health Technology and Services Research, Portugal.,Ribeirão Preto Medical School of the University of São Paulo, Brazil
| | - Alberto Freitas
- Faculty of Medicine of the University of Porto, Portugal.,CINTESIS - Center for Health Technology and Services Research, Portugal
| |
Collapse
|
31
|
Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK. Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Mak 2019; 19:48. [PMID: 30902088 PMCID: PMC6431077 DOI: 10.1186/s12911-019-0801-4] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Accepted: 03/18/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Breast cancer is one of the most common diseases in women worldwide. Many studies have been conducted to predict the survival indicators, however most of these analyses were predominantly performed using basic statistical methods. As an alternative, this study used machine learning techniques to build models for detecting and visualising significant prognostic indicators of breast cancer survival rate. METHODS A large hospital-based breast cancer dataset retrieved from the University Malaya Medical Centre, Kuala Lumpur, Malaysia (n = 8066) with diagnosis information between 1993 and 2016 was used in this study. The dataset contained 23 predictor variables and one dependent variable, which referred to the survival status of the patients (alive or dead). In determining the significant prognostic factors of breast cancer survival rate, prediction models were built using decision tree, random forest, neural networks, extreme boost, logistic regression, and support vector machine. Next, the dataset was clustered based on the receptor status of breast cancer patients identified via immunohistochemistry to perform advanced modelling using random forest. Subsequently, the important variables were ranked via variable selection methods in random forest. Finally, decision trees were built and validation was performed using survival analysis. RESULTS In terms of both model accuracy and calibration measure, all algorithms produced close outcomes, with the lowest obtained from decision tree (accuracy = 79.8%) and the highest from random forest (accuracy = 82.7%). The important variables identified in this study were cancer stage classification, tumour size, number of total axillary lymph nodes removed, number of positive lymph nodes, types of primary treatment, and methods of diagnosis. CONCLUSION Interestingly the various machine learning algorithms used in this study yielded close accuracy hence these methods could be used as alternative predictive tools in the breast cancer survival studies, particularly in the Asian region. The important prognostic factors influencing survival rate of breast cancer identified in this study, which were validated by survival curves, are useful and could be translated into decision support tools in the medical domain.
Collapse
Affiliation(s)
- Mogana Darshini Ganggayah
- Data Science and Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, University of Malaya, 50603, Kuala Lumpur, Malaysia
| | - Nur Aishah Taib
- Department of Surgery, Faculty of Medicine, University of Malaya, 50603, Kuala Lumpur, Malaysia
| | - Yip Cheng Har
- Department of Surgery, Faculty of Medicine, University of Malaya, 50603, Kuala Lumpur, Malaysia
| | - Pietro Lio
- Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, England
| | - Sarinder Kaur Dhillon
- Data Science and Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, University of Malaya, 50603, Kuala Lumpur, Malaysia.
| |
Collapse
|
32
|
Owczarek AJ, Smertka M, Jędrusik P, Gębska-Kuczerowska A, Chudek J, Wojnicz R. Computerized Systems Supporting Clinical Decision in Medicine. STUDIES IN LOGIC, GRAMMAR AND RHETORIC 2018; 56:107-120. [DOI: 10.2478/slgr-2018-0044] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Abstract
Statistics is the science of collection, summarizing, presentation and interpretation of data. Moreover, it yields methods used in the verification of research hypotheses. The presence of a statistician in a research group remarkably improves both the quality of design and research and the optimization of financial resources. Moreover, the involvement of a statistician in a research team helps the physician to effectively utilize the time and energy spent on diagnosing, which is an important aspect in view of limited healthcare resources. Precise, properly designed and implemented Computerized Clinical Decision Support Systems certainly lead to the improvement of healthcare and the quality of medical services, which increases patient satisfaction and reduces financial burdens on healthcare systems.
Collapse
Affiliation(s)
- Aleksander J. Owczarek
- Department of Statistics, Department of Instrumental Analysis , School of Pharmacy with the Division of Laboratory Medicine in Sosnowiec , Medical University of Silesia in Katowice , Poland
| | - Mike Smertka
- Pathophysiology Unit, Department of Pathophysiology , School of Medicine in Katowice , Medical University of Silesia in Katowice , Poland
| | - Przemysław Jędrusik
- Department of Computer Biomedical Systems, Institute of Computer Science , University of Silesia , Poland
| | | | - Jerzy Chudek
- Department of Internal Medicine and Oncological Chemotherapy, Medical Faculty in Katowice , Medical University of Silesia in Katowice , Poland
| | - Romuald Wojnicz
- Department of Histology and Embryology , School of Medicine with the Division of Dentistry in Zabrze , Medical University of Silesia in Katowice , Poland
| |
Collapse
|
33
|
Support Vector Machines and logistic regression to predict temporal artery biopsy outcomes. Can J Ophthalmol 2018; 54:116-118. [PMID: 30851764 DOI: 10.1016/j.jcjo.2018.05.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Revised: 04/26/2018] [Accepted: 05/02/2018] [Indexed: 11/23/2022]
Abstract
OBJECTIVE Support vector machines (SVM) is a newer statistical method that has been reported to be advantageous to traditional logistic regression for clinical classification. We determine if SVM can better predict the results of temporal artery biopsy (TABx) for giant cell arteritis compared to logistic regression. METHOD A database of 530 TABx patients with 10 covariates was used and randomly split into training and test sets. The area under the receiving operating curve (AUC), misclassification rate (MCR), and false negative rate (FN) were compared for SVM and logistic regression. AUC and MCR were used to tune the SVM. RESULTS The SVM model with optimal AUC had gamma = 0.01267 and cost = 26.466, with 133 support vectors. The AUC/MCR/FN for logistic regression and SVM respectively were 0.827/0.184/0.524 and 0.825/0.168/0.571. CONCLUSION In our dataset of 530 TABx subjects, SVM did not offer any distinct advantage over the logistic regression prediction model.
Collapse
|
34
|
Sivasankaran A, Williams E, Albrecht M, Switzer GE, Cherkassky V, Maiers M. Machine Learning Approach to Predicting Stem Cell Donor Availability. Biol Blood Marrow Transplant 2018; 24:2425-2432. [PMID: 30071322 DOI: 10.1016/j.bbmt.2018.07.035] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Accepted: 07/22/2018] [Indexed: 12/21/2022]
Abstract
The success of unrelated donor stem cell transplants depends on not only finding genetically matched donors, but also donor availability. On average 50% of potential donors in the National Marrow Donor Program database are unavailable for a variety of reasons, after initially matching a patient, with significant variations in availability among subgroups (eg, by race or age). Several studies have established univariate donor characteristics associated with availability. Individual consideration of each applicable characteristic is laborious. Extrapolating group averages to the individual-donor level tends to be highly inaccurate. In the current environment with enhanced donor data collection, we can make better estimates of individual donor availability. We propose a machine learning based approach to predict availability of every registered donor, and evaluate the predictive power on a test cohort of 44,544 requests to be .77 based on the area under the receiver-operating characteristic curve. We propose that this predictor should be used during donor selection to reduce the time to transplant.
Collapse
Affiliation(s)
- Adarsh Sivasankaran
- Bioinformatics and Computational Biology, University of Minnesota, Minneapolis, Minnesota; Center for International Blood and Marrow Transplant Research, Minneapolis, Minnesota
| | - Eric Williams
- Center for International Blood and Marrow Transplant Research, Minneapolis, Minnesota
| | - Mark Albrecht
- Center for International Blood and Marrow Transplant Research, Minneapolis, Minnesota
| | - Galen E Switzer
- Department of Medicine, Psychiatry and Clinical and Translational Science, University of Pittsburgh, Pittsburgh, Pennsylvania; Department of Psychiatry, University of Pittsburgh, Pittsburgh, Pennsylvania; Department of Clinical and Translational Science, University of Pittsburgh, Pittsburgh, Pennsylvania; Center for Health Equity Research and Promotion, Veterans Affairs Pittsburgh Healthcare System, Pittsburgh, Pennsylvania
| | - Vladimir Cherkassky
- Bioinformatics and Computational Biology, University of Minnesota, Minneapolis, Minnesota; Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, Minnesota
| | - Martin Maiers
- Center for International Blood and Marrow Transplant Research, Minneapolis, Minnesota.
| |
Collapse
|
35
|
Manoochehri Z, Salari N, Rezaei M, Khazaie H, Manoochehri S, Pavah BK. Comparison of support vector machine based on genetic algorithm with logistic regression to diagnose obstructive sleep apnea. JOURNAL OF RESEARCH IN MEDICAL SCIENCES 2018; 23:65. [PMID: 30181747 PMCID: PMC6091128 DOI: 10.4103/jrms.jrms_357_17] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Revised: 01/01/2018] [Accepted: 04/29/2018] [Indexed: 11/04/2022]
Abstract
Background Diagnosing of obstructive sleep apnea (OSA) is an important subject in medicine. This study aimed to compare the performance of two data mining techniques, support vector machine (SVM), and logistic regression (LR), in diagnosing OSA. The best-fit model was used as a substitute for polysomnography (PSG), which is the gold standard for diagnosing this disease. Materials and Methods A total of 250 patients with sleep problems complaints and whose disease had been diagnosed by PSG and referred to the Sleep Disorders Research Center of Farabi Hospital, Kermanshah, between 2012 and 2015 were recruited in this study. To fit the best LR model, a model was first fitted with all variables and then compared with a model made from the significant variables using Akaike's information criterion (AIC). The SVM model and radial basis function (RBF) kernel, whose parameters had been optimized by genetic algorithm, were used to diagnose OSA. Results Based on AIC, the best LR model obtained from this study was a model fitted with all variables. The performance of final LR model was compared with SVM model, revealing the accuracy 0.797 versus 0.729, sensitivity 0.714 versus 0.777, and specificity 0.847 vs. 0.702, respectively. Conclusion Both models were found to have an appropriate performance. However, considering accuracy as an important criterion for comparing the performance of models in this domain, it can be argued that SVM could have a better efficiency than LR in diagnosing OSA in patients.
Collapse
Affiliation(s)
- Zohreh Manoochehri
- Student Research Committee, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Nader Salari
- Department of Biostatistics and Epidemiology, School of Public Health, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Mansour Rezaei
- Department of Biostatistics and Epidemiology, School of Public Health, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Habibolah Khazaie
- Sleep Disorders Research Center, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Sara Manoochehri
- Student Research Committee, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Behnam Khaledi Pavah
- Sleep Disorders Research Center, Kermanshah University of Medical Sciences, Kermanshah, Iran
| |
Collapse
|
36
|
Lee J, Kim HR. Prediction of Return-to-original-work after an Industrial Accident Using Machine Learning and Comparison of Techniques. J Korean Med Sci 2018; 33:e144. [PMID: 29736160 PMCID: PMC5934520 DOI: 10.3346/jkms.2018.33.e144] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Accepted: 03/26/2018] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Many studies have tried to develop predictors for return-to-work (RTW). However, since complex factors have been demonstrated to predict RTW, it is difficult to use them practically. This study investigated whether factors used in previous studies could predict whether an individual had returned to his/her original work by four years after termination of the worker's recovery period. METHODS An initial logistic regression analysis of 1,567 participants of the fourth Panel Study of Worker's Compensation Insurance yielded odds ratios. The participants were divided into two subsets, a training dataset and a test dataset. Using the training dataset, logistic regression, decision tree, random forest, and support vector machine models were established, and important variables of each model were identified. The predictive abilities of the different models were compared. RESULTS The analysis showed that only earned income and company-related factors significantly affected return-to-original-work (RTOW). The random forest model showed the best accuracy among the tested machine learning models; however, the difference was not prominent. CONCLUSION It is possible to predict a worker's probability of RTOW using machine learning techniques with moderate accuracy.
Collapse
Affiliation(s)
- Jongin Lee
- Cheongsong Health Center and County Hospital, Cheongsong, Korea
- Department of Medicine, Graduate School, The Catholic University of Korea, Seoul, Korea
| | - Hyoung-Ryoul Kim
- Department of Occupational and Environmental Medicine, College of Medicine, The Catholic University of Korea, Seoul, Korea
| |
Collapse
|
37
|
Jeon JP, Kim C, Oh BD, Kim SJ, Kim YS. Prediction of persistent hemodynamic depression after carotid angioplasty and stenting using artificial neural network model. Clin Neurol Neurosurg 2018; 164:127-131. [DOI: 10.1016/j.clineuro.2017.12.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Revised: 11/13/2017] [Accepted: 12/03/2017] [Indexed: 10/18/2022]
|
38
|
Wang X, Bi J. Bi-convex Optimization to Learn Classifiers from Multiple Biomedical Annotations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:564-575. [PMID: 27295686 PMCID: PMC5159326 DOI: 10.1109/tcbb.2016.2576457] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The problem of constructing classifiers from multiple annotators who provide inconsistent training labels is important and occurs in many application domains. Many existing methods focus on the understanding and learning of the crowd behaviors. Several probabilistic algorithms consider the construction of classifiers for specific tasks using consensus of multiple labelers annotations. These methods impose a prior on the consensus and develop an expectation-maximization algorithm based on logistic regression loss. We extend the discussion to the hinge loss commonly used by support vector machines. Our formulations form bi-convex programs that construct classifiers and estimate the reliability of each labeler simultaneously. Each labeler is associated with a reliability parameter, which can be a constant, or class-dependent, or varies for different examples. The hinge loss is modified by replacing the true labels by the weighted combination of labelers' labels with reliabilities as weights. Statistical justification is discussed to motivate the use of linear combination of labels. In parallel to the expectation-maximization algorithm for logistic-based methods, efficient alternating algorithms are developed to solve the proposed bi-convex programs. Experimental results on benchmark datasets and three real-world biomedical problems demonstrate that the proposed methods either outperform or are competitive to the state of the art.
Collapse
|
39
|
Gayle AA, Shimaoka M. Evaluating the lexico-grammatical differences in the writing of native and non-native speakers of English in peer-reviewed medical journals in the field of pediatric oncology: Creation of the genuine index scoring system. PLoS One 2017; 12:e0172338. [PMID: 28212419 PMCID: PMC5315297 DOI: 10.1371/journal.pone.0172338] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2016] [Accepted: 02/03/2017] [Indexed: 11/19/2022] Open
Abstract
INTRODUCTION The predominance of English in scientific research has created hurdles for "non-native speakers" of English. Here we present a novel application of native language identification (NLI) for the assessment of medical-scientific writing. For this purpose, we created a novel classification system whereby scoring would be based solely on text features found to be distinctive among native English speakers (NS) within a given context. We dubbed this the "Genuine Index" (GI). METHODOLOGY This methodology was validated using a small set of journals in the field of pediatric oncology. Our dataset consisted of 5,907 abstracts, representing work from 77 countries. A support vector machine (SVM) was used to generate our model and for scoring. RESULTS Accuracy, precision, and recall of the classification model were 93.3%, 93.7%, and 99.4%, respectively. Class specific F-scores were 96.5% for NS and 39.8% for our benchmark class, Japan. Overall kappa was calculated to be 37.2%. We found significant differences between countries with respect to the GI score. Significant correlation was found between GI scores and two validated objective measures of writing proficiency and readability. Two sets of key terms and phrases differentiating NS and non-native writing were identified. CONCLUSIONS Our GI model was able to detect, with a high degree of reliability, subtle differences between the terms and phrasing used by native and non-native speakers in peer reviewed journals, in the field of pediatric oncology. In addition, L1 language transfer was found to be very likely to survive revision, especially in non-Western countries such as Japan. These findings show that even when the language used is technically correct, there may still be some phrasing or usage that impact quality.
Collapse
Affiliation(s)
- Alberto Alexander Gayle
- Center for Medical and Nursing Education, Mie University School of Medicine, Mie, Japan
- Department of Immunology, Mie University Graduate School of Medicine, Mie, Japan
| | - Motomu Shimaoka
- Department of Molecular Pathobiology and Cell Adhesion Biology, Mie University Graduate School of Medicine, Mie, Japan
- Center for Disaster Medicine Research and Education, Mie University Graduate School of Medicine, Mie, Japan
| |
Collapse
|
40
|
Cheng Q, Tang Y, Yang Q, Wang E, Liu J, Li X. The prognostic factors for patients with hematological malignancies admitted to the intensive care unit. SPRINGERPLUS 2016; 5:2038. [PMID: 27995015 PMCID: PMC5127914 DOI: 10.1186/s40064-016-3714-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/05/2016] [Accepted: 11/21/2016] [Indexed: 12/15/2022]
Abstract
Owing to the nature of acute illness and adverse effects derived from intensive chemotherapy, patients with hematological malignancies (HM) who are admitted to the Intensive Care Unit (ICU) often present with poor prognosis. However, with advances in life-sustaining therapies and close collaborations between hematologists and intensive care specialists, the prognosis for these patients has improved substantially. Many studies from different countries have examined the prognostic factors of these critically ill HM patients. However, there has not been an up-to-date review on this subject, and very few studies have focused on the prognosis of patients with HM admitted to the ICU in Asian countries. Herein, we aim to explore the current situation and prognostic factors in patients with HM admitted to ICU, mainly focusing on studies published in the last 10 years.
Collapse
Affiliation(s)
- Qian Cheng
- Department of Hematology, The Third Xiangya Hospital, Central South University, Changsha, 410013 Hunan China
| | - Yishu Tang
- Department of Hematology, The Third Xiangya Hospital, Central South University, Changsha, 410013 Hunan China
| | - Qing Yang
- Department of Medicine, Yale New Haven Hospital, New Haven, CT USA
| | - Erhua Wang
- Department of Hematology, The Third Xiangya Hospital, Central South University, Changsha, 410013 Hunan China
| | - Jing Liu
- Department of Hematology, The Third Xiangya Hospital, Central South University, Changsha, 410013 Hunan China
| | - Xin Li
- Department of Hematology, The Third Xiangya Hospital, Central South University, Changsha, 410013 Hunan China
| |
Collapse
|
41
|
AMINI P, AHMADINIA H, POOROLAJAL J, MOQADDASI AMIRI M. Evaluating the High Risk Groups for Suicide: A Comparison of Logistic Regression, Support Vector Machine, Decision Tree and Artificial Neural Network. IRANIAN JOURNAL OF PUBLIC HEALTH 2016; 45:1179-1187. [PMID: 27957463 PMCID: PMC5149472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
BACKGROUND We aimed to assess the high-risk group for suicide using different classification methods includinglogistic regression (LR), decision tree (DT), artificial neural network (ANN), and support vector machine (SVM). METHODS We used the dataset of a study conducted to predict risk factors of completed suicide in Hamadan Province, the west of Iran, in 2010. To evaluate the high-risk groups for suicide, LR, SVM, DT and ANN were performed. The applied methods were compared using sensitivity, specificity, positive predicted value, negative predicted value, accuracy and the area under curve. Cochran-Q test was implied to check differences in proportion among methods. To assess the association between the observed and predicted values, Ø coefficient, contingency coefficient, and Kendall tau-b were calculated. RESULTS Gender, age, and job were the most important risk factors for fatal suicide attempts in common for four methods. SVM method showed the highest accuracy 0.68 and 0.67 for training and testing sample, respectively. However, this method resulted in the highest specificity (0.67 for training and 0.68 for testing sample) and the highest sensitivity for training sample (0.85), but the lowest sensitivity for the testing sample (0.53). Cochran-Q test resulted in differences between proportions in different methods (P<0.001). The association of SVM predictions and observed values, Ø coefficient, contingency coefficient, and Kendall tau-b were 0.239, 0.232 and 0.239, respectively. CONCLUSION SVM had the best performance to classify fatal suicide attempts comparing to DT, LR and ANN.
Collapse
Affiliation(s)
- Payam AMINI
- Dept. of Epidemiology & Reproductive Health, Reproductive Epidemiology Research Center, Royan Institute for Reproductive Biomedicine, ACECR, Tehran, Iran
| | - Hasan AHMADINIA
- Dept. of Biostatistics & Epidemiology, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Jalal POOROLAJAL
- Research Center for Health Sciences and Dept. of Biostatistics & Epidemiology, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Mohammad MOQADDASI AMIRI
- Dept. of Biostatistics & Epidemiology, Hamadan University of Medical Sciences, Hamadan, Iran,Corresponding Author:
| |
Collapse
|
42
|
Wang HY, Hsieh CH, Wen CN, Wen YH, Chen CH, Lu JJ. Cancers Screening in an Asymptomatic Population by Using Multiple Tumour Markers. PLoS One 2016; 11:e0158285. [PMID: 27355357 PMCID: PMC4927114 DOI: 10.1371/journal.pone.0158285] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 06/13/2016] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Analytic measurement of serum tumour markers is one of commonly used methods for cancer risk management in certain areas of the world (e.g. Taiwan). Recently, cancer screening based on multiple serum tumour markers has been frequently discussed. However, the risk-benefit outcomes appear to be unfavourable for patients because of the low sensitivity and specificity. In this study, cancer screening models based on multiple serum tumour markers were designed using machine learning methods, namely support vector machine (SVM), k-nearest neighbour (KNN), and logistic regression, to improve the screening performance for multiple cancers in a large asymptomatic population. METHODS AFP, CEA, CA19-9, CYFRA21-1, and SCC were determined for 20 696 eligible individuals. PSA was measured in men and CA15-3 and CA125 in women. A variable selection process was applied to select robust variables from these serum tumour markers to design cancer detection models. The sensitivity, specificity, positive predictive value (PPV), negative predictive value, area under the curve, and Youden index of the models based on single tumour markers, combined test, and machine learning methods were compared. Moreover, relative risk reduction, absolute risk reduction (ARR), and absolute risk increase (ARI) were evaluated. RESULTS To design cancer detection models using machine learning methods, CYFRA21-1 and SCC were selected for women, and all tumour markers were selected for men. SVM and KNN models significantly outperformed the single tumour markers and the combined test for men. All 3 studied machine learning methods outperformed single tumour markers and the combined test for women. For either men or women, the ARRs were between 0.003-0.008; the ARIs were between 0.119-0.306. CONCLUSION Machine learning methods outperformed the combined test in analysing multiple tumour markers for cancer detection. However, cancer screening based solely on the application of multiple tumour markers remains unfavourable because of the inadequate PPV, ARR, and ARI, even when machine learning methods were incorporated into the analysis.
Collapse
Affiliation(s)
- Hsin-Yao Wang
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan City, Taiwan
| | - Chia-Hsun Hsieh
- Division of Hematology-Oncology, Department of Internal Medicine, Chang Gung Memorial Hospital at Linkou and Chang Gung University, Taoyuan City, Taiwan
| | - Chiao-Ni Wen
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan City, Taiwan
| | - Ying-Hao Wen
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan City, Taiwan
| | - Chun-Hsien Chen
- Department of Information Management, Chang Gung University, Taoyuan City, Taiwan
- * E-mail: (CCH); (JJL)
| | - Jang-Jih Lu
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan City, Taiwan
- Department of Medical Biotechnology and Laboratory Science, Chang Gung University, Taoyuan City, Taiwan
- * E-mail: (CCH); (JJL)
| |
Collapse
|
43
|
Thottakkara P, Ozrazgat-Baslanti T, Hupf BB, Rashidi P, Pardalos P, Momcilovic P, Bihorac A. Application of Machine Learning Techniques to High-Dimensional Clinical Data to Forecast Postoperative Complications. PLoS One 2016; 11:e0155705. [PMID: 27232332 PMCID: PMC4883761 DOI: 10.1371/journal.pone.0155705] [Citation(s) in RCA: 115] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 04/05/2016] [Indexed: 11/18/2022] Open
Abstract
Objective To compare performance of risk prediction models for forecasting postoperative sepsis and acute kidney injury. Design Retrospective single center cohort study of adult surgical patients admitted between 2000 and 2010. Patients 50,318 adult patients undergoing major surgery. Measurements We evaluated the performance of logistic regression, generalized additive models, naïve Bayes and support vector machines for forecasting postoperative sepsis and acute kidney injury. We assessed the impact of feature reduction techniques on predictive performance. Model performance was determined using the area under the receiver operating characteristic curve, accuracy, and positive predicted value. The results were reported based on a 70/30 cross validation procedure where the data were randomly split into 70% used for training the model and the 30% for validation. Main Results The areas under the receiver operating characteristic curve for different models ranged between 0.797 and 0.858 for acute kidney injury and between 0.757 and 0.909 for severe sepsis. Logistic regression, generalized additive model, and support vector machines had better performance compared to Naïve Bayes model. Generalized additive models additionally accounted for non-linearity of continuous clinical variables as depicted in their risk patterns plots. Reducing the input feature space with LASSO had minimal effect on prediction performance, while feature extraction using principal component analysis improved performance of the models. Conclusions Generalized additive models and support vector machines had good performance as risk prediction model for postoperative sepsis and AKI. Feature extraction using principal component analysis improved the predictive performance of all models.
Collapse
Affiliation(s)
- Paul Thottakkara
- Department of Anesthesiology, College of Medicine, University of Florida, Gainesville, Florida, United States of America
- Industrial and Systems Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Tezcan Ozrazgat-Baslanti
- Department of Anesthesiology, College of Medicine, University of Florida, Gainesville, Florida, United States of America
| | - Bradley B. Hupf
- Department of Anesthesiology, College of Medicine, University of Florida, Gainesville, Florida, United States of America
| | - Parisa Rashidi
- Biomedical Engineering Department, University of Florida, Gainesville, Florida, United States of America
| | - Panos Pardalos
- Industrial and Systems Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Petar Momcilovic
- Industrial and Systems Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Azra Bihorac
- Department of Anesthesiology, College of Medicine, University of Florida, Gainesville, Florida, United States of America
- * E-mail:
| |
Collapse
|
44
|
Predicting early post-chemotherapy adverse events in patients with hematological malignancies: a retrospective study. Support Care Cancer 2016; 24:2727-33. [PMID: 26803835 DOI: 10.1007/s00520-016-3085-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2015] [Accepted: 01/14/2016] [Indexed: 10/22/2022]
Abstract
PURPOSE The purpose of this study was to develop a mathematical model that predicts the definite adverse events following chemotherapy in patients with hematological malignancies (HMs). METHODS This is a retrospective cohort study including 1157 cases with HMs. Firstly, we screened and verified the independent risk factors associated with post-chemotherapy adverse events by both univariate and multivariate logistic regression analysis using 70 % of randomly selected cases (training set). Secondly, we proposed a mathematical model based on those selected factors. The calibration and discrimination of the model were assessed by Hosmer-Lemeshow (H-L) test and area under the receiver operating characteristic (ROC) curve, respectively. Lastly, the predicative power of this model was further tested in the remaining 30 % of cases (validation set). RESULTS Our statistical analysis indicated that liver dysfunction (OR = 2.164), active infection (OR = 3.619), coagulation abnormalities (OR = 4.614), intensity of chemotherapy (OR = 10.001), acute leukemia (OR = 2.185), and obesity (OR = 1.604) were independent risk factors for post-chemotherapy adverse events in HM patients (all P < 0.05). Based on the verified risk factors, a predictive model was proposed. This model had good discrimination and calibration. When 0.648 was selected as the cutoff point, the sensitivity and specificity of this predictive model in validation sets was 72.7 and 87.4 %, respectively. Furthermore, this proposed model's positive predictive value, negative predictive value, and consistency rate were 87.3, 73.0 and 80.0 %, respectively. CONCLUSIONS Our study indicated that this six risk factor-based mathematical model is accurate and sufficient enough to predict definite post-chemotherapy adverse events in a HM patient and it may aid clinicians to optimize treatment for a HM patient.
Collapse
|
45
|
Ho KC, Speier W, El-Saden S, Liebeskind DS, Saver JL, Bui AAT, Arnold CW. Predicting discharge mortality after acute ischemic stroke using balanced data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2014; 2014:1787-96. [PMID: 25954451 PMCID: PMC4419881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Several models have been developed to predict stroke outcomes (e.g., stroke mortality, patient dependence, etc.) in recent decades. However, there is little discussion regarding the problem of between-class imbalance in stroke datasets, which leads to prediction bias and decreased performance. In this paper, we demonstrate the use of the Synthetic Minority Over-sampling Technique to overcome such problems. We also compare state of the art machine learning methods and construct a six-variable support vector machine (SVM) model to predict stroke mortality at discharge. Finally, we discuss how the identification of a reduced feature set allowed us to identify additional cases in our research database for validation testing. Our classifier achieved a c-statistic of 0.865 on the cross-validated dataset, demonstrating good classification performance using a reduced set of variables.
Collapse
Affiliation(s)
- King Chung Ho
- Department of Bioengineering, University of California, Los Angeles, CA ; Medical Imaging Informatics, Department of Radiological Sciences, University of California, Los Angeles, CA
| | - William Speier
- Department of Bioengineering, University of California, Los Angeles, CA ; Medical Imaging Informatics, Department of Radiological Sciences, University of California, Los Angeles, CA
| | - Suzie El-Saden
- Medical Imaging Informatics, Department of Radiological Sciences, University of California, Los Angeles, CA
| | - David S Liebeskind
- UCLA Stroke Center, Department of Neurology, University of California, Los Angeles, CA
| | - Jeffery L Saver
- UCLA Stroke Center, Department of Neurology, University of California, Los Angeles, CA
| | - Alex A T Bui
- Department of Bioengineering, University of California, Los Angeles, CA ; Medical Imaging Informatics, Department of Radiological Sciences, University of California, Los Angeles, CA
| | - Corey W Arnold
- Department of Bioengineering, University of California, Los Angeles, CA ; Medical Imaging Informatics, Department of Radiological Sciences, University of California, Los Angeles, CA
| |
Collapse
|
46
|
Harrison D, Muskett H, Harvey S, Grieve R, Shahin J, Patel K, Sadique Z, Allen E, Dybowski R, Jit M, Edgeworth J, Kibbler C, Barnes R, Soni N, Rowan K. Development and validation of a risk model for identification of non-neutropenic, critically ill adult patients at high risk of invasive Candida infection: the Fungal Infection Risk Evaluation (FIRE) Study. Health Technol Assess 2014; 17:1-156. [PMID: 23369845 DOI: 10.3310/hta17030] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND There is increasing evidence that invasive fungal disease (IFD) is more likely to occur in non-neutropenic patients in critical care units. A number of randomised controlled trials (RCTs) have evaluated antifungal prophylaxis in non-neutropenic, critically ill patients, demonstrating a reduction in the risk of proven IFD and suggesting a reduction in mortality. It is necessary to establish a method to identify and target antifungal prophylaxis at those patients at highest risk of IFD, who stand to benefit most from any antifungal prophylaxis strategy. OBJECTIVES To develop and validate risk models to identify non-neutropenic, critically ill adult patients at high risk of invasive Candida infection, who would benefit from antifungal prophylaxis, and to assess the cost-effectiveness of targeting antifungal prophylaxis to high-risk patients based on these models. DESIGN Systematic review, prospective data collection, statistical modelling, economic decision modelling and value of information analysis. SETTING Ninety-six UK adult general critical care units. PARTICIPANTS Consecutive admissions to participating critical care units. INTERVENTIONS None. MAIN OUTCOME MEASURES Invasive fungal disease, defined as a blood culture or sample from a normally sterile site showing yeast/mould cells in a microbiological or histopathological report. For statistical and economic modelling, the primary outcome was invasive Candida infection, defined as IFD-positive for Candida species. RESULTS Systematic review: Thirteen articles exploring risk factors, risk models or clinical decision rules for IFD in critically ill adult patients were identified. Risk factors reported to be significantly associated with IFD were included in the final data set for the prospective data collection. DATA COLLECTION Data were collected on 60,778 admissions between July 2009 and March 2011. Overall, 383 patients (0.6%) were admitted with or developed IFD. The majority of IFD patients (94%) were positive for Candida species. The most common site of infection was blood (55%). The incidence of IFD identified in unit was 4.7 cases per 1000 admissions, and for unit-acquired IFD was 3.2 cases per 1000 admissions. Statistical modelling: Risk models were developed at admission to the critical care unit, 24 hours and the end of calendar day 3. The risk model at admission had fair discrimination (c-index 0.705). Discrimination improved at 24 hours (c-index 0.823) and this was maintained at the end of calendar day 3 (c-index 0.835). There was a drop in model performance in the validation sample. Economic decision model: Irrespective of risk threshold, incremental quality-adjusted life-years of prophylaxis strategies compared with current practice were positive but small compared with the incremental costs. Incremental net benefits of each prophylaxis strategy compared with current practice were all negative. Cost-effectiveness acceptability curves showed that current practice was the strategy most likely to be cost-effective. Across all parameters in the decision model, results indicated that the value of further research for the whole population of interest might be high relative to the research costs. CONCLUSIONS The results of the Fungal Infection Risk Evaluation (FIRE) Study, derived from a highly representative sample of adult general critical care units across the UK, indicated a low incidence of IFD among non-neutropenic, critically ill adult patients. IFD was associated with substantially higher mortality, more intensive organ support and longer length of stay. Risk modelling produced simple risk models that provided acceptable discrimination for identifying patients at 'high risk' of invasive Candida infection. Results of the economic model suggested that the current most cost-effective treatment strategy for prophylactic use of systemic antifungal agents among non-neutropenic, critically ill adult patients admitted to NHS adult general critical care units is a strategy of no risk assessment and no antifungal prophylaxis. FUNDING Funding for this study was provided by the Health Technology Assessment programme of the National Institute for Health Research.
Collapse
Affiliation(s)
- D Harrison
- Intensive Care National Audit and Research Centre, London, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Sadique Z, Grieve R, Harrison DA, Jit M, Allen E, Rowan KM. An integrated approach to evaluating alternative risk prediction strategies: a case study comparing alternative approaches for preventing invasive fungal disease. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2013; 16:1111-1122. [PMID: 24326164 DOI: 10.1016/j.jval.2013.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2012] [Revised: 07/15/2013] [Accepted: 09/22/2013] [Indexed: 06/03/2023]
Abstract
OBJECTIVES This article proposes an integrated approach to the development, validation, and evaluation of new risk prediction models illustrated with the Fungal Infection Risk Evaluation study, which developed risk models to identify non-neutropenic, critically ill adult patients at high risk of invasive fungal disease (IFD). METHODS Our decision-analytical model compared alternative strategies for preventing IFD at up to three clinical decision time points (critical care admission, after 24 hours, and end of day 3), followed with antifungal prophylaxis for those judged "high" risk versus "no formal risk assessment." We developed prognostic models to predict the risk of IFD before critical care unit discharge, with data from 35,455 admissions to 70 UK adult, critical care units, and validated the models externally. The decision model was populated with positive predictive values and negative predictive values from the best-fitting risk models. We projected lifetime cost-effectiveness and expected value of partial perfect information for groups of parameters. RESULTS The risk prediction models performed well in internal and external validation. Risk assessment and prophylaxis at the end of day 3 was the most cost-effective strategy at the 2% and 1% risk threshold. Risk assessment at each time point was the most cost-effective strategy at a 0.5% risk threshold. Expected values of partial perfect information were high for positive predictive values or negative predictive values (£11 million-£13 million) and quality-adjusted life-years (£11 million). CONCLUSIONS It is cost-effective to formally assess the risk of IFD for non-neutropenic, critically ill adult patients. This integrated approach to developing and evaluating risk models is useful for informing clinical practice and future research investment.
Collapse
Affiliation(s)
- Z Sadique
- Department of Health Services Research & Policy, London School of Hygiene and Tropical Medicine, London, UK.
| | | | | | | | | | | |
Collapse
|
48
|
Kastorini CM, Papadakis G, Milionis HJ, Kalantzi K, Puddu PE, Nikolaou V, Vemmos KN, Goudevenos JA, Panagiotakos DB. Comparative analysis of a-priori and a-posteriori dietary patterns using state-of-the-art classification algorithms: A case/case-control study. Artif Intell Med 2013; 59:175-83. [DOI: 10.1016/j.artmed.2013.08.005] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2012] [Revised: 08/18/2013] [Accepted: 08/31/2013] [Indexed: 12/22/2022]
|
49
|
Chia CC, Karam Z, Lee G, Rubinfeld I, Syed Z. Improving surgical models through one/two class learning. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2012:5098-101. [PMID: 23367075 DOI: 10.1109/embc.2012.6347140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Only a minority of patients undergoing in-patient surgical procedures experience complications. However, the large number of in-patient surgeries (over 48 million procedures each year in the U.S.) results in substantial overall mortality and morbidity due to these complications. This burden can be decreased through improvements in the ability to evaluate patients by the bedside, and to assess surgical quality and out-comes across hospitals. Unfortunately, the process of developing clinical models for surgical complications is made challenging by the availability of generally small datasets for model training, and by class imbalance due to the diminished prevalence of many important complications. In this paper, we address this issue and explore the idea of jointly leveraging the benefits of both supervised and unsupervised learning to model surgical complications that occur infrequently. In particular, we study an approach where the problems of supervised and unsupervised model development are treated as tasks that can be transferred. Focussing this work on support vector machine (SVM) classification, we describe a transfer learning algorithm that improves performance relative to both supervised (i.e., binary or 2-class SVM) and unsupervised (i.e., 1-class SVM) methods, as well as the use of cost-sensitive weighting techniques, for predicting different surgical complications within the American College of Surgeons National Surgical Quality Improvement Program registry.
Collapse
|
50
|
Lee G, Gurm HS, Syed Z. Predicting complications of percutaneous coronary intervention using a novel support vector method. J Am Med Inform Assoc 2013; 20:778-86. [PMID: 23599229 DOI: 10.1136/amiajnl-2012-001588] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE To explore the feasibility of a novel approach using an augmented one-class learning algorithm to model in-laboratory complications of percutaneous coronary intervention (PCI). MATERIALS AND METHODS Data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) multicenter registry for the years 2007 and 2008 (n=41 016) were used to train models to predict 13 different in-laboratory PCI complications using a novel one-plus-class support vector machine (OP-SVM) algorithm. The performance of these models in terms of discrimination and calibration was compared to the performance of models trained using the following classification algorithms on BMC2 data from 2009 (n=20 289): logistic regression (LR), one-class support vector machine classification (OC-SVM), and two-class support vector machine classification (TC-SVM). For the OP-SVM and TC-SVM approaches, variants of the algorithms with cost-sensitive weighting were also considered. RESULTS The OP-SVM algorithm and its cost-sensitive variant achieved the highest area under the receiver operating characteristic curve for the majority of the PCI complications studied (eight cases). Similar improvements were observed for the Hosmer-Lemeshow χ(2) value (seven cases) and the mean cross-entropy error (eight cases). CONCLUSIONS The OP-SVM algorithm based on an augmented one-class learning problem improved discrimination and calibration across different PCI complications relative to LR and traditional support vector machine classification. Such an approach may have value in a broader range of clinical domains.
Collapse
Affiliation(s)
- Gyemin Lee
- Department of Electronic and IT Media Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea.
| | | | | |
Collapse
|