1
|
Eddie D, Prindle J, Somodi P, Gerstmann I, Dilkina B, Saba SK, DiGuiseppi G, Dennis M, Davis JP. Exploring predictors of substance use disorder treatment engagement with machine learning: The impact of social determinants of health in the therapeutic landscape. JOURNAL OF SUBSTANCE USE AND ADDICTION TREATMENT 2024; 164:209435. [PMID: 38852819 PMCID: PMC11300147 DOI: 10.1016/j.josat.2024.209435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 03/15/2024] [Accepted: 05/21/2024] [Indexed: 06/11/2024]
Abstract
BACKGROUND Improved knowledge of factors that influence treatment engagement could help treatment providers and systems better engage patients. The present study used machine learning to explore associations between individual- and neighborhood-level factors, and SUD treatment engagement. METHODS This was a secondary analysis of the Global Appraisal of Individual Needs (GAIN) dataset and United States Census Bureau data utilizing random forest machine learning and generalized linear mixed modelling. Our sample (N = 15,873) included all people entering SUD treatment at GAIN sites from 2006 to 2012. Predictors included an array of demographic, psychosocial, treatment-specific, and clinical measures, as well as environment-level measures for the neighborhood in which patients received treatment. RESULTS Greater odds of treatment engagement were predicted by adolescent age and psychiatric comorbidity, and at the neighborhood-level, by low unemployment and high population density. Lower odds of treatment engagement were predicted by Black/African American race, and at the neighborhood-level by high rate of public assistance and high income inequality. Regardless of the degree of treatment engagement, individuals receiving treatment in areas with high unemployment, alcohol sale outlet concentration, and poverty had greater substance use and related problems at baseline. Although these differences reduced with treatment and over time, disparities remained. CONCLUSIONS Neighborhood-level factors appear to play an important role in SUD treatment engagement. Regardless of whether individuals engage with treatment, greater loading on social determinants of health such as unemployment, alcohol sale outlet density, and poverty in the therapeutic landscape are associated with worse SUD treatment outcomes.
Collapse
Affiliation(s)
- David Eddie
- Recovery Research Institute, Center for Addiction Medicine, Massachusetts General Hospital, USA; Department of Psychiatry, Harvard Medical School, USA.
| | - John Prindle
- Suzanne Dworak-Peck School of Social Work, University of Southern California, USA
| | - Paul Somodi
- Viterbi School of Engineering, Computer Science, University of Southern California, USA
| | - Isaac Gerstmann
- Viterbi School of Engineering, Computer Science, University of Southern California, USA
| | - Bistra Dilkina
- Viterbi School of Engineering, Computer Science, University of Southern California, USA
| | - Shaddy K Saba
- Suzanne Dworak-Peck School of Social Work, University of Southern California, USA
| | - Graham DiGuiseppi
- Suzanne Dworak-Peck School of Social Work, University of Southern California, USA
| | - Michael Dennis
- Lighthouse Institute, Chestnut Health Systems, Normal, IL, USA
| | | |
Collapse
|
2
|
Huang Y, Ma SF, Oldham JM, Adegunsoye A, Zhu D, Murray S, Kim JS, Bonham C, Strickland E, Linderholm AL, Lee CT, Paul T, Mannem H, Maher TM, Molyneaux PL, Strek ME, Martinez FJ, Noth I. Machine Learning of Plasma Proteomics Classifies Diagnosis of Interstitial Lung Disease. Am J Respir Crit Care Med 2024; 210:444-454. [PMID: 38422478 PMCID: PMC11351805 DOI: 10.1164/rccm.202309-1692oc] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 02/29/2024] [Indexed: 03/02/2024] Open
Abstract
Rationale: Distinguishing connective tissue disease-associated interstitial lung disease (CTD-ILD) from idiopathic pulmonary fibrosis (IPF) can be clinically challenging. Objectives: To identify proteins that separate and classify patients with CTD-ILD and those with IPF. Methods: Four registries with 1,247 patients with IPF and 352 patients with CTD-ILD were included in analyses. Plasma samples were subjected to high-throughput proteomics assays. Protein features were prioritized using recursive feature elimination to construct a proteomic classifier. Multiple machine learning models, including support vector machine, LASSO (least absolute shrinkage and selection operator) regression, random forest, and imbalanced Random Forest, were trained and tested in independent cohorts. The validated models were used to classify each case iteratively in external datasets. Measurements and Main Results: A classifier with 37 proteins (proteomic classifier 37 [PC37]) was enriched in the biological process of bronchiole development and smooth muscle proliferation and immune responses. Four machine learning models used PC37 with sex and age score to generate continuous classification values. Receiver operating characteristic curve analyses of these scores demonstrated consistent areas under the curve of 0.85-0.90 in the test cohort and 0.94-0.96 in the single-sample dataset. Binary classification demonstrated 78.6-80.4% sensitivity and 76-84.4% specificity in the test cohort and 93.5-96.1% sensitivity and 69.5-77.6% specificity in the single-sample classification dataset. Composite analysis of all machine learning models confirmed 78.2% (194 of 248) accuracy in the test cohort and 82.9% (208 of 251) in the single-sample classification dataset. Conclusions: Multiple machine learning models trained with large cohort proteomic datasets consistently distinguished CTD-ILD from IPF. Many of the identified proteins are involved in immune pathways. We further developed a novel approach for single-sample classification, which could facilitate honing the differential diagnosis of ILD in challenging cases and improve clinical decision making.
Collapse
Affiliation(s)
- Yong Huang
- Division of Pulmonary and Critical Care Medicine, University of Virginia, Charlottesville, Virginia
| | - Shwu-Fan Ma
- Division of Pulmonary and Critical Care Medicine, University of Virginia, Charlottesville, Virginia
| | - Justin M. Oldham
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Ayodeji Adegunsoye
- Section of Pulmonary and Critical Care Medicine, University of Chicago, Chicago, Illinois
| | - Daisy Zhu
- Division of Pulmonary and Critical Care Medicine, University of Virginia, Charlottesville, Virginia
| | - Susan Murray
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - John S. Kim
- Division of Pulmonary and Critical Care Medicine, University of Virginia, Charlottesville, Virginia
| | - Catherine Bonham
- Division of Pulmonary and Critical Care Medicine, University of Virginia, Charlottesville, Virginia
| | - Emma Strickland
- Division of Pulmonary and Critical Care Medicine, University of Virginia, Charlottesville, Virginia
| | - Angela L. Linderholm
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California, Davis, Davis, California
| | - Cathryn T. Lee
- Section of Pulmonary and Critical Care Medicine, University of Chicago, Chicago, Illinois
| | - Tessy Paul
- Division of Pulmonary and Critical Care Medicine, University of Virginia, Charlottesville, Virginia
| | - Hannah Mannem
- Division of Pulmonary and Critical Care Medicine, University of Virginia, Charlottesville, Virginia
| | - Toby M. Maher
- National Heart and Lung Institute, Imperial College, London, United Kingdom
- Keck Medicine of the University of Southern California, Los Angeles, California; and
| | | | - Mary E. Strek
- Section of Pulmonary and Critical Care Medicine, University of Chicago, Chicago, Illinois
| | | | - Imre Noth
- Division of Pulmonary and Critical Care Medicine, University of Virginia, Charlottesville, Virginia
| |
Collapse
|
3
|
Svensson LG, Blackstone EH, DiPaola L, Kramer BP, Ishwaran H. American Association for Thoracic Surgery Quality Gateway: A surgeon case study of its application in adult cardiac surgery for quality assurance. J Thorac Cardiovasc Surg 2024:S0022-5223(24)00678-0. [PMID: 39111691 DOI: 10.1016/j.jtcvs.2024.07.056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 07/15/2024] [Accepted: 07/21/2024] [Indexed: 10/26/2024]
Abstract
OBJECTIVE To demonstrate the application of American Association for Thoracic Surgery Quality Gateway (AQG) outcomes models to a Surgeon Case Study of quality assurance in adult cardiac surgery. METHODS The case study includes 6989 cardiac and thoracic aorta operations performed in adults at Cleveland Clinic by a single surgeon between 2001 and 2023. AQG models were used to predict expected probabilities for operative mortality and major morbidity and to compare hospital outcomes, surgery type, risk profile, and individual risk factor levels using virtual (digital) twin causal inference. These models were based on postoperative procedural outcomes after 52,792 cardiac operations performed in 19 hospitals of 3 high-performing hospital systems with overall hospital mortality of 2.0%, analyzed by advanced machine learning for rare events. RESULTS For individual surgeons, their patients, hospitals, and hospital systems, the Surgeon Case Study demonstrated that AQG provides expected outcomes across the entire spectrum of cardiac surgery, from single-component primary operations to complex multicomponent reoperations. Actionable opportunities for quality improvement based on virtual twins are illustrated for patients, surgeons, hospitals, risk profile groups, operations, and risk factors vis-à-vis other hospitals. CONCLUSIONS Using minimal data collection and models developed using advanced machine learning, this case study shows that probabilities can be generated for operative mortality and major morbidity after virtually all adult cardiac operations. It demonstrates the utility of 21st century causal inference (virtual [digital] twin) tools for assessing quality for surgeons asking "how am I doing?," their patients asking "what are my chances?," and the profession asking "how can we get better?"
Collapse
Affiliation(s)
- Lars G Svensson
- Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Eugene H Blackstone
- Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio; Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio.
| | - Linda DiPaola
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
| | - Benjamin P Kramer
- Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | | |
Collapse
|
4
|
Ishwaran H, Blackstone EH. Development of American Association for Thoracic Surgery Quality Gateway outcome models, analytics, and visualizations for quality assurance. J Thorac Cardiovasc Surg 2024:S0022-5223(24)00647-0. [PMID: 39069119 DOI: 10.1016/j.jtcvs.2024.07.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 07/09/2024] [Accepted: 07/14/2024] [Indexed: 07/30/2024]
Abstract
OBJECTIVE The study objective was to develop comprehensive quality assurance models for procedural outcomes after adult cardiac surgery. METHODS Based on 52,792 cardiac operations in adults performed in 19 hospitals of 3 high-performing hospital systems, models were developed for operative mortality (n = 1271), stroke (n = 895), deep sternal wound infection (n = 122), prolonged intubation (6182), renal failure (1265), prolonged postoperative stay (n = 5418), and reoperations (n = 1693). Random forest quantile classification, a method tailored for challenges of rare events, and model-free variable priority screening were used to identify predictors of events. RESULTS A small set of preoperative variables was sufficient to model procedural outcomes for virtually all cardiac operations, including older age; advanced symptoms; left ventricular, pulmonary, renal, and hepatic dysfunction; lower albumin; higher acuity; and greater complexity of the planned operation. Geometric mean performance ranged from .63 to .76. Calibration covered large areas of probability. Continuous risk factors provided high information content, and their association with outcomes was visualized with partial plots. These risk factors differed in strength and configuration among hospitals, as did their risk-adjusted outcomes according to patient risk as determined by counterfactual causal inference within a framework of virtual (digital) twins. CONCLUSIONS By using a small set of variables and contemporary machine-learning methods, comprehensive models for procedural operative mortality and major morbidity after adult cardiac surgery were developed based on data from 3 exemplary hospital systems. They provide surgeons, their patients, and hospital systems with 21st century tools for assessing their risks compared with these advanced hospital systems and improving cardiac surgery quality.
Collapse
Affiliation(s)
| | - Eugene H Blackstone
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio; Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio.
| |
Collapse
|
5
|
Movahedi F, Antaki JF. Improving the Prediction of 1-Year Right Ventricular Failure After Left Ventricular Assist Device Implantation. ASAIO J 2024; 70:495-501. [PMID: 38346283 PMCID: PMC11147739 DOI: 10.1097/mat.0000000000002152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2024] Open
Abstract
Previous predictive models for postimplant right heart failure (RHF) following left ventricular assist device (LVAD) implantation have demonstrated limited performance on validation datasets and are susceptible to overfitting. Thus, the objective of this study was to develop an improved predictive model with reduced overfitting and improved accuracy in predicting RHF in LVAD recipients. The study involved 11,967 patients who underwent continuous-flow LVAD implantation between 2008 and 2016, with an RHF incidence of 9% at 1 year. Using an eXtreme Gradient Boosting (XGBoost) algorithm, the training data were used to predict RHF at 1 year postimplantation, resulting in promising area under the curve (AUC)-receiver operating characteristic (ROC) of 0.8 and AUC-precision recall curve (PRC) of 0.24. The calibration plot showed that the predicted risk closely corresponded with the actual observed risk. However, the model based on data collected 48 hours before LVAD implantation exhibited high sensitivity but low precision, making it an excellent screening tool but not a diagnostic tool.
Collapse
Affiliation(s)
- Faezeh Movahedi
- Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA
| | - James F Antaki
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY
| |
Collapse
|
6
|
Ishwaran K, Abadie BQ, Chen PH, Bolen M, Karamlou T, Grimm R, Tang WHW, Nguyen C, Kwon D, Chen D. Pre-test Prediction of Non-ischemic Cardiomyopathies using Time-Series EHR Data. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:239-248. [PMID: 38827049 PMCID: PMC11141858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Clinical imaging is an important diagnostic test to diagnose non-ischemic cardiomyopathies (NICM). However, accurate interpretation of imaging studies often requires readers to review patient histories, a time consuming and tedious task. We propose to use time-series analysis to predict the most likely NICMs using longitudinal electronic health records (EHR) as a pseudo-summary of EHR records. Time-series formatted EHR data can provide temporality information important towards accurate prediction of disease. Specifically, we leverage ICD-10 codes and various recurrent neural network architectures for predictive modeling. We trained our models on a large cohort of NICM patients who underwent cardiac magnetic resonance imaging (CMR) and a smaller cohort undergoing echocardiogram. The performance of the proposed technique achieved good micro-area under the curve (0.8357), F1 score (0.5708) and precision at 3 (0.8078) across all models for cardiac magnetic resonance imaging (CMR) but only moderate performance for transthoracic echocardiogram (TTE) of 0.6938, 0.4399 and 0.5864 respectively. We show that our model has the potential to provide accurate pre-test differential diagnosis, thereby potentially reducing clerical burden on physicians.
Collapse
Affiliation(s)
| | | | | | - Michael Bolen
- Heart Vascular and Thoracic Institute
- Imaging Institute
| | | | | | - W H Wilson Tang
- Heart Vascular and Thoracic Institute
- Cardiovascular Innovations Research Center, Cleveland Clinic, Cleveland, OH, USA
| | - Christopher Nguyen
- Heart Vascular and Thoracic Institute
- Imaging Institute
- Cardiovascular Innovations Research Center, Cleveland Clinic, Cleveland, OH, USA
| | - Deborah Kwon
- Heart Vascular and Thoracic Institute
- Imaging Institute
- Cardiovascular Innovations Research Center, Cleveland Clinic, Cleveland, OH, USA
| | - David Chen
- Heart Vascular and Thoracic Institute
- Cardiovascular Innovations Research Center, Cleveland Clinic, Cleveland, OH, USA
| |
Collapse
|
7
|
Deng F, Zhao L, Yu N, Lin Y, Zhang L. Union With Recursive Feature Elimination: A Feature Selection Framework to Improve the Classification Performance of Multicategory Causes of Death in Colorectal Cancer. J Transl Med 2024; 104:100320. [PMID: 38158124 DOI: 10.1016/j.labinv.2023.100320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 12/05/2023] [Accepted: 12/20/2023] [Indexed: 01/03/2024] Open
Abstract
Despite the use of machine learning tools, it is challenging to properly model cause-specific deaths in colorectal cancer (CRC) patients and choose appropriate treatments. Here, we propose an interesting feature selection framework, namely union with recursive feature elimination (U-RFE), to select the union feature sets that are crucial in CRC progression-specific mortality using The Cancer Genome Atlas (TCGA) dataset. Based on the union feature sets, we compared the performance of 5 classification algorithms, including logistic regression (LR), support vector machines (SVM), random forest (RF), eXtreme gradient boosting (XGBoost), and Stacking, to identify the best model for classifying 4-category deaths. In the first stage of U-RFE, LR, SVM, and RF were used as base estimators to obtain subsets containing the same number of features but not exactly the same specific features. Union analysis of the subsets was then performed to determine the final union feature set, effectively combining the advantages of different algorithms. We found that the U-RFE framework could improve various models' performance. Stacking outperformed LR, SVM, RF, and XGBoost in most scenarios. When the target feature number of the RFE was set to 50 and the union feature set contained 298 deterministic features, the Stacking model achieved F1_weighted, Recall_weighted, Precision_weighted, Accuracy, and Matthews correlation coefficient of 0.851, 0.864, 0.854, 0.864, and 0.717, respectively. The performance of the minority categories was also significantly improved. Therefore, this recursive feature elimination-based approach of feature selection improves performances of classifying CRC deaths using clinical and omics data or those using other data with high feature redundancy and imbalance.
Collapse
Affiliation(s)
- Fei Deng
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.
| | - Lin Zhao
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Ning Yu
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Yuxiang Lin
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Lanjing Zhang
- Department of Biological Sciences, Rutgers University, Newark, New Jersey; Department of Pathology, Princeton Medical Center, Plainsboro, New Jersey; Rutgers Cancer Institute of New Jersey, New Brunswick, New Jersey; Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey.
| |
Collapse
|
8
|
Ocagli H, Bottigliengo D, Lorenzoni G, Fontana F, Negri C, Moise GM, Gregori D, Clemente L. Identifying Predictors of Anal HPV Status in HPV-Vaccinated MSM: A Machine Learning Approach. JOURNAL OF HOMOSEXUALITY 2024; 71:741-757. [PMID: 36332152 DOI: 10.1080/00918369.2022.2132574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Anal human papillomavirus (HPV) infection has a high prevalence in men who have sex with men (MSM), resulting in an increased risk for anal cancer. The present work aimed to identify factors associated with HPV in a prospective cohort of HPV-vaccinated MSM using a random forest (RF) approach. This observational study enrolled MSM patients admitted to an Italian (sexually transmitted infection) STI-AIDS Unit. For each patient, rectal swabs for 28 different HPV genotype detection were collected. Two RF algorithms were applied to evaluate predictors that were most associated with HPV. The cohort included 135 MSM, 49% of whom were HIV-positive with a median age of 39 years. In model 1 (baseline information), age, age sexual debut, HIV, number of lifetime sex partners, STIs, were most associated with the HPV. In model 2 (follow-up information), age, age sexual debut, HIV, STI class, and follow-up. The RF algorithm exhibited good performances with 61% and 83% accuracy for models 1 and 2, respectively. Traditional risk factors for anal HPV infection, such as drug use, receptive anal intercourse, and multiple sexual partner, were found to have low importance in predicting HPV status. The present results suggest the need to focus on HPV prevention campaigns.
Collapse
Affiliation(s)
- Honoria Ocagli
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences, and Public Health, University of Padova, Padova, Italy
| | - Daniele Bottigliengo
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences, and Public Health, University of Padova, Padova, Italy
| | - Giulia Lorenzoni
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences, and Public Health, University of Padova, Padova, Italy
| | - Francesco Fontana
- Division of Laboratory Medicine, University Hospital Giuliano Isontina (ASU GI), Trieste, Italy
| | - Camilla Negri
- STI-AIDS Unit, University Hospital Giuliano Isontina (ASU GI), Trieste, Italy
| | - Gian Michele Moise
- STI-AIDS Unit, University Hospital Giuliano Isontina (ASU GI), Trieste, Italy
| | - Dario Gregori
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences, and Public Health, University of Padova, Padova, Italy
| | - Libera Clemente
- Division of Laboratory Medicine, University Hospital Giuliano Isontina (ASU GI), Trieste, Italy
| |
Collapse
|
9
|
Patel SY, Baum A, Basu S. Prediction of non emergent acute care utilization and cost among patients receiving Medicaid. Sci Rep 2024; 14:824. [PMID: 38263373 PMCID: PMC10805799 DOI: 10.1038/s41598-023-51114-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 12/30/2023] [Indexed: 01/25/2024] Open
Abstract
Patients receiving Medicaid often experience social risk factors for poor health and limited access to primary care, leading to high utilization of emergency departments and hospitals (acute care) for non-emergent conditions. As programs proactively outreach Medicaid patients to offer primary care, they rely on risk models historically limited by poor-quality data. Following initiatives to improve data quality and collect data on social risk, we tested alternative widely-debated strategies to improve Medicaid risk models. Among a sample of 10 million patients receiving Medicaid from 26 states and Washington DC, the best-performing model tripled the probability of prospectively identifying at-risk patients versus a standard model (sensitivity 11.3% [95% CI 10.5, 12.1%] vs 3.4% [95% CI 3.0, 4.0%]), without increasing "false positives" that reduce efficiency of outreach (specificity 99.8% [95% CI 99.6, 99.9%] vs 99.5% [95% CI 99.4, 99.7%]), and with a ~ tenfold improved coefficient of determination when predicting costs (R2: 0.195-0.412 among population subgroups vs 0.022-0.050). Our best-performing model also reversed the lower sensitivity of risk prediction for Black versus White patients, a bias present in the standard cost-based model. Our results demonstrate a modeling approach to substantially improve risk prediction performance and equity for patients receiving Medicaid.
Collapse
Affiliation(s)
- Sadiq Y Patel
- Clinical Product Development, Waymark, San Francisco, CA, USA.
- School of Social Policy and Practice, University of Pennsylvania, 3701 Locust Walk, Philadelphia, PA, 19104, USA.
| | - Aaron Baum
- Clinical Product Development, Waymark, San Francisco, CA, USA
- Icahn School of Medicine at Mt Sinai, New York, NY, USA
| | - Sanjay Basu
- Clinical Product Development, Waymark, San Francisco, CA, USA
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
- Center for Vulnerable Populations, San Francisco General Hospital/University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
10
|
Esber AL, Dear NF, King D, Francisco LV, Sing'oei V, Owuoth J, Maswai J, Iroezindu M, Bahemana E, Kibuuka H, Shah N, Polyak CS, Ake JA, Crowell TA. Achieving the third 95 in sub-Saharan Africa: application of machine learning approaches to predict viral failure. AIDS 2023; 37:1861-1870. [PMID: 37418549 DOI: 10.1097/qad.0000000000003646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/09/2023]
Abstract
OBJECTIVE Viral failure in people with HIV (PWH) may be influenced by multiple sociobehavioral, clinical, and context-specific factors, and supervised learning approaches may identify novel predictors. We compared the performance of two supervised learning algorithms to predict viral failure in four African countries. DESIGN Cohort study. METHODS The African Cohort Study is an ongoing, longitudinal cohort enrolling PWH at 12 sites in Uganda, Kenya, Tanzania, and Nigeria. Participants underwent physical examination, medical history-taking, medical record extraction, sociobehavioral interviews, and laboratory testing. In cross-sectional analyses of enrollment data, viral failure was defined as a viral load at least 1000 copies/ml among participants on antiretroviral therapy (ART) for at least 6 months. We compared the performance of lasso-type regularized regression and random forests by calculating area under the curve (AUC) and used each to identify factors associated with viral failure; 94 explanatory variables were considered. RESULTS Between January 2013 and December 2020, 2941 PWH were enrolled, 1602 had been on antiretroviral therapy (ART) for at least 6 months, and 1571 participants with complete case data were included. At enrollment, 190 (12.0%) had viral failure. The lasso regression model was slightly superior to the random forest in its ability to identify PWH with viral failure (AUC: 0.82 vs. 0.75). Both models identified CD4 + count, ART regimen, age, self-reported ART adherence and duration on ART as important factors associated with viral failure. CONCLUSION These findings corroborate existing literature primarily based on hypothesis-testing statistical approaches and help to generate questions for future investigations that may impact viral failure.
Collapse
Affiliation(s)
- Allahna L Esber
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
| | - Nicole F Dear
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
| | - David King
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
| | - Leilani V Francisco
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
| | - Valentine Sing'oei
- U.S. Army Medical Research Directorate - Africa
- HJF Medical Research International, Kisumu
| | - John Owuoth
- U.S. Army Medical Research Directorate - Africa
- HJF Medical Research International, Kisumu
| | - Jonah Maswai
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- U.S. Army Medical Research Directorate - Africa, Kericho, Kenya
| | - Michael Iroezindu
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- HJF Medical Research International, Abuja, Nigeria
| | - Emmanuel Bahemana
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- HJF Medical Research International, Mbeya, Tanzania
| | - Hannah Kibuuka
- Makerere University-Walter Reed Project, Kampala, Uganda
| | - Neha Shah
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
| | - Christina S Polyak
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
| | - Julie A Ake
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
| | - Trevor A Crowell
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
| |
Collapse
|
11
|
Bhutada S, Tran-Lundmark K, Kramer B, Conner P, Lowry AM, Blackstone E, Frenckner B, Mesas-Burgos C, Apte SS. Identification of protein biomarkers associated with congenital diaphragmatic hernia in human amniotic fluid. Sci Rep 2023; 13:15483. [PMID: 37726509 PMCID: PMC10509251 DOI: 10.1038/s41598-023-42576-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 09/12/2023] [Indexed: 09/21/2023] Open
Abstract
Congenital diaphragmatic hernia (CDH) is a severe birth defect frequently associated with pulmonary hypoplasia, pulmonary hypertension, and heart failure. Since amniotic fluid comprises proteins of both fetal and maternal origin, its analysis could provide insights on mechanisms underlying CDH and provide biomarkers for early diagnosis, severity of pulmonary changes and treatment response. The study objective was to identify proteomic changes in amniotic fluid consistently associated with CDH. Amniotic fluid was obtained at term (37-39 weeks) from women with normal pregnancies (n = 5) or carrying fetuses with CDH (n = 5). After immuno-depletion of the highest abundance proteins, off-line fractionation and high-resolution tandem mass spectrometry were performed and quantitative differences between the proteomes of the groups were determined. Of 1036 proteins identified, 218 were differentially abundant. Bioinformatics analysis showed significant changes in GP6 signaling, in the MSP-RON signaling in macrophages pathway and in networks associated with cardiovascular system development and function, connective tissue disorders and dermatological conditions. Differences in selected proteins, namely pulmonary surfactant protein B, osteopontin, kallikrein 5 and galectin-3 were validated by orthogonal testing using ELISA in larger cohorts and showed statistically significant differences aiding in the diagnosis and prediction of CDH. The findings provide potential tools for clinical management of CDH.
Collapse
Affiliation(s)
- Sumit Bhutada
- Department of Biomedical Engineering-ND20, Cleveland Clinic Lerner Research Institute, 9500 Euclid Avenue, Cleveland, OH, 44195, USA
| | - Karin Tran-Lundmark
- Department of Experimental Medical Science and Wallenberg Center for Molecular Medicine, Lund University, Lund, Sweden
- The Pediatric Heart Center, Skane University Hospital, Lund, Sweden
| | - Benjamin Kramer
- Department of Thoracic and Cardiovascular Surgery, Heart, Vascular, and Thoracic Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Peter Conner
- Department of Women's and Children's Health, Karolinska Institute, Stockholm, Sweden
| | - Ashley M Lowry
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH, USA
| | - Eugene Blackstone
- Department of Thoracic and Cardiovascular Surgery, Heart, Vascular, and Thoracic Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH, USA
| | - Bjorn Frenckner
- Department of Women's and Children's Health, Karolinska Institute, Stockholm, Sweden
| | - Carmen Mesas-Burgos
- Department of Women's and Children's Health, Karolinska Institute, Stockholm, Sweden
| | - Suneel S Apte
- Department of Biomedical Engineering-ND20, Cleveland Clinic Lerner Research Institute, 9500 Euclid Avenue, Cleveland, OH, 44195, USA.
| |
Collapse
|
12
|
Raja S, Rice TW, Lu M, Semple ME, Blackstone EH, Murthy SC, Ahmad U, McNamara M, Toth AJ, Hemant I. Adjuvant Therapy After Neoadjuvant Therapy for Esophageal Cancer: Who Needs It? Ann Surg 2023; 278:e240-e249. [PMID: 35997269 PMCID: PMC10955553 DOI: 10.1097/sla.0000000000005679] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVE We hypothesized that, on average, patients do not benefit from additional adjuvant therapy after neoadjuvant therapy for locally advanced esophageal cancer, although subsets of patients might. Therefore, we sought to identify profiles of patients predicted to receive the most survival benefit or greatest detriment from adding adjuvant therapy. BACKGROUND Although neoadjuvant therapy has become the treatment of choice for locally advanced esophageal cancer, the value of adding adjuvant therapy is unknown. METHODS From 1970 to 2014, 22,123 patients were treated for esophageal cancer at 33 centers on 6 continents (Worldwide Esophageal Cancer Collaboration), of whom 7731 with adenocarcinoma or squamous cell carcinoma received neoadjuvant therapy; 1348 received additional adjuvant therapy. Random forests for survival and virtual-twin analyses were performed for all-cause mortality. RESULTS Patients received a small survival benefit from adjuvant therapy (3.2±10 months over the subsequent 10 years for adenocarcinoma, 1.8±11 for squamous cell carcinoma). Consistent benefit occurred in ypT3-4 patients without nodal involvement and those with ypN2-3 disease. The small subset of patients receiving most benefit had high nodal burden, ypT4, and positive margins. Patients with ypT1-2N0 cancers had either no benefit or a detriment in survival. CONCLUSIONS Adjuvant therapy after neoadjuvant therapy has value primarily for patients with more advanced esophageal cancer. Because the benefit is often small, patients considering adjuvant therapy should be counseled on benefits versus morbidity. In addition, given that the overall benefit was meaningful in a small number of patients, emerging modalities such as immunotherapy may hold more promise in the adjuvant setting.
Collapse
Affiliation(s)
- Siva Raja
- Heart, Vascular, and Thoracic Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Thomas W. Rice
- Heart, Vascular, and Thoracic Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Min Lu
- Department of Public Health Sciences, Division of Biostatistics, University of Miami, Miami, Florida
| | - Marie E. Semple
- Lerner Research Institute, Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
| | - Eugene H. Blackstone
- Heart, Vascular, and Thoracic Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
- Lerner Research Institute, Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
| | - Sudish C. Murthy
- Heart, Vascular, and Thoracic Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Usman Ahmad
- Heart, Vascular, and Thoracic Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Michael McNamara
- Taussig Cancer Institute, Department of Hematology and Medical Oncology, Cleveland Clinic, Cleveland, Ohio
| | - Andrew J. Toth
- Lerner Research Institute, Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
| | - Ishwaran Hemant
- Department of Public Health Sciences, Division of Biostatistics, University of Miami, Miami, Florida
| | | |
Collapse
|
13
|
Heltø ALK, Rosager EV, Aasbrenn M, Maule CF, Petersen J, Nielsen FE, Suetta C, Gregersen R. Predicting Short-Term Mortality in Older Patients Discharged from Acute Hospitalizations Lasting Less Than 24 Hours. Clin Epidemiol 2023; 15:707-719. [PMID: 37324726 PMCID: PMC10264096 DOI: 10.2147/clep.s405485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 04/03/2023] [Indexed: 06/17/2023] Open
Abstract
Purpose Over coming decades, a rise in the number of short, acute hospitalizations of older people is to be expected. To help physicians identify high-risk patients prior to discharge, we aimed to develop a model capable of predicting the risk of 30-day mortality for older patients discharged from short, acute hospitalizations and to examine how model performance changed with an increasing amount of information. Methods This registry-based study included acute hospitalizations in Denmark for 2016-2018 lasting ≤24 hours where patients were permanent residents, ≥65 years old, and discharged alive. Utilizing many different predictor variables, we developed random forest models with an increasing amount of information, compared their performance, and examined important variables. Results We included 107,132 patients with a median age of 75 years. Of these, 3.3% (n=3575) died within 30 days of discharge. Model performance improved especially with the addition of laboratory results and information on prior acute admissions (AUROC 0.835), and again with comorbidities and number of prescription drugs (AUROC 0.860). Model performance did not improve with the addition of sociodemographic variables (AUROC 0.861), apart from age and sex. Important variables included age, dementia, number of prescription drugs, C-reactive protein, and eGFR. Conclusion The best model accurately estimated the risk of short-term mortality for older patients following short, acute hospitalizations. Trained on a large and heterogeneous dataset, the model is applicable to most acute clinical settings and could be a useful tool for physicians prior to discharge.
Collapse
Affiliation(s)
- Amalia Lærke Kjær Heltø
- Department of Emergency Medicine, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark
- Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Emilie Vangsgaard Rosager
- Department of Emergency Medicine, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark
- Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Martin Aasbrenn
- Department of Geriatrics and Palliative Medicine, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark
| | - Cathrine Fox Maule
- Center of Clinical Research and Prevention, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark
| | - Janne Petersen
- Center of Clinical Research and Prevention, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark
- Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Finn Erland Nielsen
- Department of Emergency Medicine, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark
| | - Charlotte Suetta
- Department of Geriatrics and Palliative Medicine, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark
| | - Rasmus Gregersen
- Department of Emergency Medicine, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark
- Center of Clinical Research and Prevention, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark
- Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
14
|
Skinner EB, Glidden CK, MacDonald AJ, Mordecai EA. Human footprint is associated with shifts in the assemblages of major vector-borne diseases. NATURE SUSTAINABILITY 2023; 6:652-661. [PMID: 37538395 PMCID: PMC10399301 DOI: 10.1038/s41893-023-01080-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 02/01/2023] [Indexed: 08/05/2023]
Abstract
Predicting how increasing intensity of human-environment interactions affects pathogen transmission is essential to anticipate changing disease risks and identify appropriate mitigation strategies. Vector-borne diseases (VBDs) are highly responsive to environmental changes, but such responses are notoriously difficult to isolate because pathogen transmission depends on a suite of ecological and social responses in vectors and hosts that may differ across species. Here we use the emerging tools of cumulative pressure mapping and machine learning to better understand how the occurrence of six medically important VBDs, differing in ecology from sylvatic to urban, respond to multidimensional effects of human pressure. We find that not only is human footprint-an index of human pressure, incorporating built environments, energy and transportation infrastructure, agricultural lands and human population density-an important predictor of VBD occurrence, but there are clear thresholds governing the occurrence of different VBDs. Across a spectrum of human pressure, diseases associated with lower human pressure, including malaria, cutaneous leishmaniasis and visceral leishmaniasis, give way to diseases associated with high human pressure, such as dengue, chikungunya and Zika. These heterogeneous responses of VBDs to human pressure highlight thresholds of land-use transitions that may lead to abrupt shifts in infectious disease burdens and public health needs.
Collapse
Affiliation(s)
- Eloise B. Skinner
- Department of Biology, Stanford University, Stanford, CA, USA
- Centre for Planetary Health and Food Security, Griffith University, Southport, Queensland, Australia
| | | | - Andrew J. MacDonald
- Bren School of Environmental Science and Management, University of California, Santa Barbara, Santa Barbara, CA, USA
- Earth Research Institute, University of California, Santa Barbara, Santa Barbara, CA, USA
| | | |
Collapse
|
15
|
Zbinden ZD, Douglas MR, Chafin TK, Douglas ME. A community genomics approach to natural hybridization. Proc Biol Sci 2023; 290:20230768. [PMID: 37192670 PMCID: PMC10188237 DOI: 10.1098/rspb.2023.0768] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 04/26/2023] [Indexed: 05/18/2023] Open
Abstract
Hybridization is a complicated, oft-misunderstood process. Once deemed unnatural and uncommon, hybridization is now recognized as ubiquitous among species. But hybridization rates within and among communities are poorly understood despite the relevance to ecology, evolution and conservation. To clarify, we examined hybridization across 75 freshwater fish communities within the Ozarks of the North American Interior Highlands (USA) by single nucleotide polymorphism (SNP) genotyping 33 species (N = 2865 individuals; double-digest restriction site-associated DNA sequencing (ddRAD)). We found evidence of hybridization (70 putative hybrids; 2.4% of individuals) among 18 species-pairs involving 73% (24/33) of study species, with the majority being concentrated within one family (Leuciscidae/minnows; 15 species; 66 hybrids). Interspecific genetic exchange-or introgression-was evident from 24 backcrossed individuals (10/18 species-pairs). Hybrids occurred within 42 of 75 communities (56%). Four selected environmental variables (species richness, protected area extent, precipitation (May and annually)) exhibited 73-78% accuracy in predicting hybrid occurrence via random forest classification. Our community-level assessment identified hybridization as spatially widespread and environmentally dependent (albeit predominantly within one diverse, omnipresent family). Our approach provides a more holistic survey of natural hybridization by testing a wide range of species-pairs, thus contrasting with more conventional evaluations.
Collapse
Affiliation(s)
- Zachery D. Zbinden
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| | - Marlis R. Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| | - Tyler K. Chafin
- Biomathematics and Statistics Scotland, Edinburgh, Scotland, UK
| | - Michael E. Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| |
Collapse
|
16
|
Leha A, Huber C, Friede T, Bauer T, Beckmann A, Bekeredjian R, Bleiziffer S, Herrmann E, Möllmann H, Walther T, Beyersdorf F, Hamm C, Künzi A, Windecker S, Stortecky S, Kutschka I, Hasenfuß G, Ensminger S, Frerker C, Seidler T. Development and validation of explainable machine learning models for risk of mortality in transcatheter aortic valve implantation: TAVI risk machine scores. EUROPEAN HEART JOURNAL. DIGITAL HEALTH 2023; 4:225-235. [PMID: 37265865 PMCID: PMC10232286 DOI: 10.1093/ehjdh/ztad021] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/28/2023] [Accepted: 03/16/2023] [Indexed: 06/03/2023]
Abstract
Aims Identification of high-risk patients and individualized decision support based on objective criteria for rapid discharge after transcatheter aortic valve implantation (TAVI) are key requirements in the context of contemporary TAVI treatment. This study aimed to predict 30-day mortality following TAVI based on machine learning (ML) using data from the German Aortic Valve Registry. Methods and results Mortality risk was determined using a random forest ML model that was condensed in the newly developed TAVI Risk Machine (TRIM) scores, designed to represent clinically meaningful risk modelling before (TRIMpre) and in particular after (TRIMpost) TAVI. Algorithm was trained and cross-validated on data of 22 283 patients (729 died within 30 days post-TAVI) and generalisation was examined on data of 5864 patients (146 died). TRIMpost demonstrated significantly better performance than traditional scores [C-statistics value, 0.79; 95% confidence interval (CI)] [0.74; 0.83] compared to Society of Thoracic Surgeons (STS) with C-statistics value 0.69; 95%-CI [0.65; 0.74]). An abridged (aTRIMpost) score comprising 25 features (calculated using a web interface) exhibited significantly higher performance than traditional scores (C-statistics value, 0.74; 95%-CI [0.70; 0.78]). Validation on external data of 6693 patients (205 died within 30 days post-TAVI) of the Swiss TAVI Registry confirmed significantly better performance for the TRIMpost (C-statistics value 0.75, 95%-CI [0.72; 0.79]) compared to STS (C-statistics value 0.67, CI [0.63; 0.70]). Conclusion TRIM scores demonstrate good performance for risk estimation before and after TAVI. Together with clinical judgement, they may support standardised and objective decision-making before and after TAVI.
Collapse
Affiliation(s)
- Andreas Leha
- Department of Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, 37073 Göttingen, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Göttingen, Robert-Koch str. 40, 37075 Göttingen, Germany
| | - Cynthia Huber
- Department of Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, 37073 Göttingen, Germany
| | - Tim Friede
- Department of Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, 37073 Göttingen, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Göttingen, Robert-Koch str. 40, 37075 Göttingen, Germany
| | - Timm Bauer
- Department of Cardiology, Sana Klinikum Offenbach, Starkenburgring 66, 63069 Offenbach am Main, Germany
| | - Andreas Beckmann
- German Society for Thoracic and Cardiovascular Surgery, Langenbeck-Virchow-Haus, Luisenstraße 58/59, 10117 Berlin, Germany
- Department for cardiac and pediatric cardiac surgery, Heart Center Duisburg, EVKLN, Gerrickstr. 21, 47137 Duisburg, Germany
| | - Raffi Bekeredjian
- Department of Cardiology, Robert-Bosch-Krankenhaus, Auerbachstraße 110, 70376 Stuttgart, Germany
| | - Sabine Bleiziffer
- Clinic for Thoracic and Cardiovascular Surgery, Heart and Diabetes Center Northrhine-Westphalia, Georgstr 11, 32545 Bad Oeynhausen, Germany
| | - Eva Herrmann
- Goethe University Frankfurt, Department of Medicine, Institute of Biostatistics and Mathematical Modelling, Theodor-Stern-Kai 7, 60590 Frankfurt Main, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Rhine/Main, Theodor-Stern-Kai 7, 60590 Frankfurt Main, Germany
| | - Helge Möllmann
- Department of Cardiology, St.-Johannes-Hospital Dortmund, Johannesstrasse 9-17, 44137 Dortmund, Germany
| | - Thomas Walther
- Department of Cardiothoracic Surgery, University Hospital Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
| | - Friedhelm Beyersdorf
- Medical Faculty of the Albert-Ludwigs-University Freiburg, University Hospital Freiburg, Hugstetterstr. 55, 79106 Freiburg, Germany
- Department of Cardiovascular Surgery, Heart Centre Freiburg University, Freiburg, Germany
| | - Christian Hamm
- Department of Cardiology and Angiology, University Hospital Gießen, Klinikstr. 33, 35392 Gießen, Germany
- Department of Cardiology, Kerckhoff Heart and Thorax Center, Benekestraße 2-8, D-61231 Bad Nauheim, Germany
| | - Arnaud Künzi
- CTU Bern, University of Bern, Mittelstrasse 43, 3012 Bern, Switzerland
| | - Stephan Windecker
- Department of Cardiology, Inselspital, Bern University Hospital, University of Bern, 3010 Bern, Switzerland
| | - Stefan Stortecky
- Department of Cardiology, Inselspital, Bern University Hospital, University of Bern, 3010 Bern, Switzerland
| | - Ingo Kutschka
- Clinic for Cardiothoracic and Vascular Surgery/Heart Center, University Medical Center Göttingen, Robert-Koch Str. 40, 37075 Göttingen, Germany
| | - Gerd Hasenfuß
- DZHK (German Center for Cardiovascular Research), Partner Site Göttingen, Robert-Koch str. 40, 37075 Göttingen, Germany
- Clinic for Cardiology and Pulmonology, Heart Center, University Medical Center Göttingen, Robert-Koch Str. 40, 37075 Göttingen, Germany
| | - Stephan Ensminger
- Department of Cardiac and Thoracic Vascular Surgery, University Heart Center Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany
- DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Lübeck, Germany
| | - Christian Frerker
- Department of Cardiology, University Heart Center Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany
- DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Lübeck, Germany
| | - Tim Seidler
- Corresponding author. Tel: +49 (0) 551/39-63907, Fax: +49(0)551/39-63906,
| |
Collapse
|
17
|
Ong CS, Reinertsen E, Sun H, Moonsamy P, Mohan N, Funamoto M, Kaneko T, Shekar PS, Schena S, Lawton JS, D'Alessandro DA, Westover MB, Aguirre AD, Sundt TM. Prediction of operative mortality for patients undergoing cardiac surgical procedures without established risk scores. J Thorac Cardiovasc Surg 2023; 165:1449-1459.e15. [PMID: 34607725 PMCID: PMC8918430 DOI: 10.1016/j.jtcvs.2021.09.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 08/11/2021] [Accepted: 09/03/2021] [Indexed: 10/20/2022]
Abstract
OBJECTIVE Current cardiac surgery risk models do not address a substantial fraction of procedures. We sought to create models to predict the risk of operative mortality for an expanded set of cases. METHODS Four supervised machine learning models were trained using preoperative variables present in the Society of Thoracic Surgeons (STS) data set of the Massachusetts General Hospital to predict and classify operative mortality in procedures without STS risk scores. A total of 424 (5.5%) mortality events occurred out of 7745 cases. Models included logistic regression with elastic net regularization (LogReg), support vector machine, random forest (RF), and extreme gradient boosted trees (XGBoost). Model discrimination was assessed via area under the receiver operating characteristic curve (AUC), and calibration was assessed via calibration slope and expected-to-observed event ratio. External validation was performed using STS data sets from Brigham and Women's Hospital (BWH) and the Johns Hopkins Hospital (JHH). RESULTS Models performed comparably with the highest mean AUC of 0.83 (RF) and expected-to-observed event ratio of 1.00. On external validation, the AUC was 0.81 in BWH (RF) and 0.79 in JHH (LogReg/RF). Models trained and applied on the same institution's data achieved AUCs of 0.81 (BWH: LogReg/RF/XGBoost) and 0.82 (JHH: LogReg/RF/XGBoost). CONCLUSIONS Machine learning models trained on preoperative patient data can predict operative mortality at a high level of accuracy for cardiac surgical procedures without established risk scores. Such procedures comprise 23% of all cardiac surgical procedures nationwide. This work also highlights the value of using local institutional data to train new prediction models that account for institution-specific practices.
Collapse
Affiliation(s)
- Chin Siang Ong
- Division of Cardiac Surgery, Massachusetts General Hospital and Corrigan Minehan Heart Center, Boston, Mass
| | - Erik Reinertsen
- Division of Cardiology, Massachusetts General Hospital and Corrigan Minehan Heart Center, Boston, Mass; Center for Systems Biology, Massachusetts General Hospital, Boston, Mass; Research Laboratory for Electronics, Massachusetts Institute of Technology, Cambridge, Mass
| | - Haoqi Sun
- Division of Clinical Neurophysiology, Department of Neurology, Massachusetts General Hospital, Boston, Mass
| | - Philicia Moonsamy
- Division of Cardiac Surgery, Massachusetts General Hospital and Corrigan Minehan Heart Center, Boston, Mass
| | - Navyatha Mohan
- Division of Cardiac Surgery, Massachusetts General Hospital and Corrigan Minehan Heart Center, Boston, Mass
| | - Masaki Funamoto
- Division of Cardiac Surgery, Massachusetts General Hospital and Corrigan Minehan Heart Center, Boston, Mass
| | - Tsuyoshi Kaneko
- Division of Cardiac Surgery, Brigham and Women's Hospital, Boston, Mass
| | - Prem S Shekar
- Division of Cardiac Surgery, Brigham and Women's Hospital, Boston, Mass
| | - Stefano Schena
- Division of Cardiac Surgery, Johns Hopkins Hospital, Baltimore, Md
| | | | - David A D'Alessandro
- Division of Cardiac Surgery, Massachusetts General Hospital and Corrigan Minehan Heart Center, Boston, Mass
| | - M Brandon Westover
- Division of Clinical Neurophysiology, Department of Neurology, Massachusetts General Hospital, Boston, Mass; Clinical Data AI Center, Massachusetts General Hospital, Boston, Mass
| | - Aaron D Aguirre
- Division of Cardiology, Massachusetts General Hospital and Corrigan Minehan Heart Center, Boston, Mass; Center for Systems Biology, Massachusetts General Hospital, Boston, Mass; Wellman Center for Photomedicine, Massachusetts General Hospital and Harvard Medical School, Boston, Mass; Healthcare Transformation Lab, Massachusetts General Hospital, Boston, Mass.
| | - Thoralf M Sundt
- Division of Cardiac Surgery, Massachusetts General Hospital and Corrigan Minehan Heart Center, Boston, Mass
| |
Collapse
|
18
|
Movahedi F, Padman R, Antaki JF. Limitations of receiver operating characteristic curve on imbalanced data: Assist device mortality risk scores. J Thorac Cardiovasc Surg 2023; 165:1433-1442.e2. [PMID: 34446286 PMCID: PMC8800945 DOI: 10.1016/j.jtcvs.2021.07.041] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 07/20/2021] [Accepted: 07/23/2021] [Indexed: 02/01/2023]
Abstract
OBJECTIVE In the left ventricular assist device domain, the receiver operating characteristic is a commonly applied metric of performance of classifiers. However, the receiver operating characteristic can provide a distorted view of classifiers' ability to predict short-term mortality due to the overwhelmingly greater proportion of patients who survive, that is, imbalanced data. This study illustrates the ambiguity of the receiver operating characteristic in evaluating 2 classifiers of 90-day left ventricular assist device mortality and introduces the precision recall curve as a supplemental metric that is more representative of left ventricular assist device classifiers in predicting the minority class. METHODS This study compared the receiver operating characteristic and precision recall curve for 2 classifiers for 90-day left ventricular assist device mortality, HeartMate Risk Score and Random Forest for 800 patients (test group) recorded in the Interagency Registry for Mechanically Assisted Circulatory Support who received a continuous-flow left ventricular assist device between 2006 and 2016 (mean age, 59 years; 146 female vs 654 male patients), in whom 90-day mortality rate is only 8%. RESULTS The receiver operating characteristic indicates similar performance of Random Forest and HeartMate Risk Score classifiers with respect to area under the curve of 0.77 and Random Forest 0.63, respectively. This is in contrast to their precision recall curve with area under the curve of 0.43 versus 0.16 for Random Forest and HeartMate Risk Score, respectively. The precision recall curve for HeartMate Risk Score showed the precision rapidly decreased to only 10% with slightly increasing sensitivity. CONCLUSIONS The receiver operating characteristic can portray an overly optimistic performance of a classifier or risk score when applied to imbalanced data. The precision recall curve provides better insight about the performance of a classifier by focusing on the minority class.
Collapse
Affiliation(s)
- Faezeh Movahedi
- Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pa
| | - Rema Padman
- Heinz College, Carnegie Mellon University, Pittsburgh, Pa
| | - James F Antaki
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY.
| |
Collapse
|
19
|
Lyu J, Ishwaran H. Commentary: To classify means to choose a threshold. J Thorac Cardiovasc Surg 2023; 165:1443-1445. [PMID: 34426008 PMCID: PMC8821726 DOI: 10.1016/j.jtcvs.2021.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 07/30/2021] [Accepted: 08/03/2021] [Indexed: 11/29/2022]
Abstract
Classification requires a threshold; however, methods like C-statistic and AUC obfuscate this. Luckily, there is a sensible strategy for imbalanced data thresholding. The prevalence threshold yields accurate classification without dangerous data snooping.
Collapse
Affiliation(s)
- Jiangnan Lyu
- Division of Biostatistics, Miller School of Medicine, University of Miami, Miami, Fla
| | - Hemant Ishwaran
- Division of Biostatistics, Miller School of Medicine, University of Miami, Miami, Fla.
| |
Collapse
|
20
|
Ortega Vázquez C, vanden Broucke S, De Weerdt J. Hellinger distance decision trees for PU learning in imbalanced data sets. Mach Learn 2023. [DOI: 10.1007/s10994-023-06323-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
|
21
|
Moon J, Lee JH, Roh J, Lee DH, Ha EJ. Contrast-enhanced CT-based Radiomics for the Differentiation of Anaplastic or Poorly Differentiated Thyroid Carcinoma from Differentiated Thyroid Carcinoma: A Pilot Study. Sci Rep 2023; 13:4562. [PMID: 36941287 PMCID: PMC10027684 DOI: 10.1038/s41598-023-31212-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 03/08/2023] [Indexed: 03/23/2023] Open
Abstract
Differential diagnosis of anaplastic thyroid carcinoma/poorly differentiated thyroid carcinoma (ATC/PDTC) from differentiated thyroid carcinoma (DTC) is crucial in patients with large thyroid malignancies. This study creates a predictive model using radiomics feature analysis to differentiate ATC/PDTC from DTC. We compared the clinicoradiological characteristics and radiomics features extracted from a volume of interest on contrast-enhanced computed tomography (CT) between the groups. Estimations of variable importance were performed via modeling using the random forest quantile classifier. The diagnostic performance of the model with radiomics features alone had the area under the receiver operating characteristic (AUROC) curve value of 0.883. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were 81.7%, 93.3%, 97.7%, 64.5%, and 84.6%, respectively, for the differential diagnosis of ATC/PDTC and DTC. The model with both radiomics and clinicoradiological information showed the AUROC of 0.908, with sensitivity, specificity, PPV, NPV, and accuracy of 82.9%, 97.6%, 99.2%, 67.1%, and 86.5% respectively. Distant metastasis, moment, shape, age, and gray-level size zone matrix features were the most useful factors for differential diagnosis. Therefore, we concluded that a radiomics approach based on contrast-enhanced CT features can potentially differentiate ATC/PDTC from DTC in patients with large thyroid malignancies.
Collapse
Affiliation(s)
- Jayoung Moon
- Department of Radiology, Ajou University School of Medicine, Wonchon-dong, Yeongtong-gu, Suwon, 16499, Korea
| | - Jeong Hoon Lee
- Department of Radiology, Ajou University School of Medicine, Wonchon-dong, Yeongtong-gu, Suwon, 16499, Korea
| | - Jin Roh
- Department of Pathology, Ajou University School of Medicine, Wonchon-dong, Yeongtong-gu, Suwon, 16499, Korea
| | - Da Hyun Lee
- Department of Radiology, Ajou University School of Medicine, Wonchon-dong, Yeongtong-gu, Suwon, 16499, Korea
| | - Eun Ju Ha
- Department of Radiology, Ajou University School of Medicine, Wonchon-dong, Yeongtong-gu, Suwon, 16499, Korea.
| |
Collapse
|
22
|
Chen Z, Li J, Sun Y, Wang C, Yang W, Ma M, Luo Z, Yang K, Chen L. A novel predictive model for poor in-hospital outcomes in patients with acute kidney injury after cardiac surgery. J Thorac Cardiovasc Surg 2023; 165:1180-1191.e7. [PMID: 34112503 DOI: 10.1016/j.jtcvs.2021.04.085] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 04/13/2021] [Accepted: 04/20/2021] [Indexed: 12/31/2022]
Abstract
OBJECTIVE Patients with cardiac surgery-associated acute kidney injury are at risk of renal replacement therapy and in-hospital death. We aimed to develop and validate a novel predictive model for poor in-hospital outcomes among patients with cardiac surgery-associated acute kidney injury. METHODS A total of 196 patients diagnosed with cardiac surgery-associated acute kidney injury were enrolled in this study as the training cohort, and 32 blood cytokines were measured. Least absolute shrinkage and selection operator regression and random forest quantile-classifier were performed to identify the key blood predictors for in-hospital composite outcomes (requiring renal replacement therapy or in-hospital death). The logistic regression model incorporating the selected predictors was validated internally using bootstrapping and externally in an independent cohort (n = 52). RESULTS A change in serum creatinine (delta serum creatinine) and interleukin 16 and interleukin 8 were selected as key predictors for composite outcomes. The logistic regression model incorporating interleukin 16, interleukin 8, and delta serum creatinine yielded the optimal performance, with decent discrimination (area under the receiver operating characteristic curve: 0.947; area under the precision-recall curve: 0.809) and excellent calibration (Brier score: 0.056, Hosmer-Lemeshow test P = .651). Application of the model in the validation cohort yielded good discrimination. A nomogram was generated for clinical use, and decision curve analysis demonstrated that the new model adds more net benefit than delta serum creatinine. CONCLUSIONS We developed and validated a promising predictive model for in-hospital composite outcomes among patients with cardiac surgery-associated acute kidney injury and demonstrated interleukin-16 and interleukin-8 as useful predictors to improve risk stratification for poor in-hospital outcomes among those with cardiac surgery-associated acute kidney injury.
Collapse
Affiliation(s)
- Zhongli Chen
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China; Department of Vascular & Cardiology, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Jiawei Li
- Department of Intensive Care Medicine, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Yiping Sun
- Department of Cardiac Surgery, Beijing Institute of Heart, Lung and Blood Vessel Diseases, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Chuangshi Wang
- Medical Research and Biometrics Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Wenbo Yang
- Department of Vascular & Cardiology, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Mingyang Ma
- National Computer System Engineering Research Institute of China, Beijing, China
| | - Zhe Luo
- Department of Intensive Care Medicine, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Ke Yang
- Department of Vascular & Cardiology, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Liang Chen
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| |
Collapse
|
23
|
Gholamzadeh M, Abtahi H, Safdari R. Comparison of different machine learning algorithms to classify patients suspected of having sepsis infection in the intensive care unit. INFORMATICS IN MEDICINE UNLOCKED 2023. [DOI: 10.1016/j.imu.2023.101236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023] Open
|
24
|
Langenbucher A, Szentmáry N, Cayless A, Wendelstein J, Hoffmann P. Preconditioning of clinical data for intraocular lens formula constant optimisation using Random Forest Quantile Regression Trees. Z Med Phys 2023:S0939-3889(22)00129-5. [PMID: 36813595 DOI: 10.1016/j.zemedi.2022.11.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 10/31/2022] [Accepted: 11/21/2022] [Indexed: 02/22/2023]
Abstract
PURPOSE To implement a fully data driven strategy for identifying outliers in clinical datasets used for formula constant optimisation, in order to achieve proper formula predicted refraction after cataract surgery, and to assess the capabilities of this outlier detection method. METHODS 2 clinical datasets (DS1/DS2: N = 888/403) of eyes treated with a monofocal aspherical intraocular lens (Hoya XY1/Johnson&Johnson Vision Z9003) containing preoperative biometric data, power of the lens implant and postoperative spherical equivalent (SEQ) were transferred to us for formula constant optimisation. Original datasets were used to generate baseline formula constants. A random forest quantile regression algorithm was set up using bootstrap resampling with replacement. Quantile regression trees were grown and the 25% and 75% quantile, and the interquartile range were extracted from SEQ and formula predicted refraction REF for the SRKT, Haigis and Castrop formulae. Fences were defined from the quantiles and data points outside the fences were marked and removed as outliers before recalculating the formula constants. RESULTS NB = 1000 bootstrap samples were derived from both datasets, and random forest quantile regression trees were grown to model SEQ versus REF and to estimate the median and 25% and 75% quantiles. The fence boundaries were defined as being from 25% quantile - 1.5·IQR to 75% quantile + 1.5·IQR, with data points outside the fence being marked as outliers. In total, for DS1 and DS2, 25/27/32 and 4/5/4 data points were identified as outliers for the SRKT/Haigis/Castrop formulae respectively. The respective root mean squared formula prediction errors for the three formulae were slightly reduced from: 0.4370 dpt;0.4449 dpt/0.3625 dpt;0.4056 dpt/and 0.3376 dpt;0.3532 dpt to: 0.4271 dpt;0.4348 dpt/0.3528 dpt;0.3952 dpt/0.3277 dpt;0.3432 dpt for DS1;DS2. CONCLUSION We were able to prove that with random forest quantile regression trees a fully data driven outlier identification strategy acting in the response space is achievable. In a real life scenario this strategy has to be complemented by an outlier identification method acting in the parameter space for a proper qualification of datasets prior to formula constant optimisation.
Collapse
Affiliation(s)
- Achim Langenbucher
- Department of Experimental Ophthalmology, Saarland University, Homburg/Saar, Germany.
| | - Nóra Szentmáry
- Dr. Rolf M. Schwiete Center for Limbal Stem Cell and Aniridia Research, Saarland University, Homburg/Saar, Germany; Department of Ophthalmology, Semmelweis-University, Budapest, Hungary
| | - Alan Cayless
- School of Physical Sciences, The Open University, Milton Keynes, United Kingdom
| | - Jascha Wendelstein
- Department of Experimental Ophthalmology, Saarland University, Homburg/Saar, Germany; Department of Ophthalmology, Johannes Kepler University Linz, Austria
| | - Peter Hoffmann
- Augen- und Laserklinik Castrop-Rauxel, Castrop-Rauxel, Germany
| |
Collapse
|
25
|
Ha EJ, Lee JH, Lee DH, Na DG, Kim JH. Development of a machine learning-based fine-grained risk stratification system for thyroid nodules using predefined clinicoradiological features. Eur Radiol 2023; 33:3211-3221. [PMID: 36600122 DOI: 10.1007/s00330-022-09376-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 09/07/2022] [Accepted: 12/11/2022] [Indexed: 01/05/2023]
Abstract
OBJECTIVE We constructed and validated a machine learning-based malignancy risk estimation model using predefined clinicoradiological features, and evaluated its clinical utility for the management of thyroid nodules. METHODS In total, 5708 benign (n = 4597) and malignant (n = 1111) thyroid nodules were collected from 5081 consecutive patients treated in 26 institutions. Seventeen experienced radiologists evaluated nodule characteristics on ultrasonographic images. Eight predictive models were used to stratify the thyroid nodules according to malignancy risk; model performance was assessed via nested 10-fold cross-validation. The best-performing algorithm was externally validated using data for 454 thyroid nodules from a tertiary hospital, then compared to the Thyroid Imaging Reporting and Data System (TIRADS)-based interpretations of radiologists (American College of Radiology, European and Korean TIRADS, and AACE/ACE/AME guidelines). RESULTS The area under the receiver operating characteristic (AUROC) curves of the algorithms ranged from 0.773 to 0.862. The sensitivities, specificities, positive predictive values, and negative predictive values of the best-performing models were 74.1-76.6%, 80.9-83.4%, 49.2-51.9%, and 93.0-93.5%, respectively. For the external validation set, the ElasticNet values were 83.2%, 89.2%, 81.8%, and 90.1%, respectively. The corresponding TIRADS values were 66.5-85.0%, 61.3-80.8%, 45.9-72.1%, and 81.5-90.3%, respectively. The new model exhibited a significantly higher AUROC and specificity than did the TIRADS risk stratification, although its sensitivity was similar. CONCLUSION We developed a reliable machine learning-based predictive model that demonstrated enhanced specificity when stratifying thyroid nodules according to malignancy risk. This system will contribute to improved personalized management of thyroid nodules. KEY POINTS • The area under the receiver operating characteristic (AUROC) curve, sensitivity, and specificity of our model were 0.914, 83.2%, and 89.2%, respectively (derived using the validation dataset). • Compared to the TIRADS values, the AUROC and specificity are significantly higher, while the sensitivity is similar. • An interactive version of our AI algorithm is at http://tirads.cdss.co.kr .
Collapse
Affiliation(s)
- Eun Ju Ha
- Department of Radiology, Ajou University School of Medicine, Wonchon-Dong, Yeongtong-Gu, Suwon, 16499, South Korea
| | - Jeong Hoon Lee
- Department of Radiology, Ajou University School of Medicine, Wonchon-Dong, Yeongtong-Gu, Suwon, 16499, South Korea
| | - Da Hyun Lee
- Department of Radiology, Ajou University School of Medicine, Wonchon-Dong, Yeongtong-Gu, Suwon, 16499, South Korea
| | - Dong Gyu Na
- Department of Radiology, GangNeung Asan Hospital, University of Ulsan College of Medicine, Gangneung-si, Gangwon-do, 25440, South Korea
| | - Ji-Hoon Kim
- Department of Radiology, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, South Korea.
| |
Collapse
|
26
|
Dai Q, Liu J, Yang J. Multi‐armed bandit heterogeneous ensemble learning for imbalanced data. Comput Intell 2022. [DOI: 10.1111/coin.12566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Qi Dai
- Department of Automation, College of Information Science and Engineering Beijing China
| | - Jian‐wei Liu
- Department of Automation, College of Information Science and Engineering Beijing China
| | - Jiapeng Yang
- College of Science North China University of Science and Technology Tangshan China
| |
Collapse
|
27
|
A statistical learning framework for predicting left ventricular ejection fraction based on glutathione peroxidase-3 level in ischemic heart disease. Comput Biol Med 2022; 149:105929. [DOI: 10.1016/j.compbiomed.2022.105929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 07/10/2022] [Accepted: 07/30/2022] [Indexed: 11/18/2022]
|
28
|
A Hybrid Algorithm-Level Ensemble Model for Imbalanced Credit Default Prediction in the Energy Industry. ENERGIES 2022. [DOI: 10.3390/en15145206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Credit default prediction for the energy industry is essential to promoting the healthy development of the energy industry in China. While previous studies have constructed various credit default prediction models with brilliant performance, the class-imbalance problem in the credit default dataset cannot be ignored, where the numbers of credit default cases are usually much smaller than the number of non-default ones. To address the class-imbalance problem, we proposed a novel CT-XGBoost model, which adds to XGBoost with two algorithm-level methods for class imbalance, including the cost-sensitive strategy and threshold method. Based on the credit default dataset consisting of energy corporates in western China, which suffers from the class-imbalance problem, the CT-XGBoost model achieves better performance than the conventional models. The results indicate that the proposed model can efficiently alleviate the inherent class-imbalance problem in the credit default dataset. Moreover, we analyze how the prediction performance is influenced by different parameter settings in the cost-sensitive strategy and threshold method. This study can help market investors and regulators precisely assess the credit risk in the energy industry and provides theoretical guidance to solving the class-imbalance problem in credit default prediction.
Collapse
|
29
|
Lee JH, Ha EJ, Lee DH, Han M, Park JH, Kim JH. Clinicoradiological Characteristics in the Differential Diagnosis of Follicular-Patterned Lesions of the Thyroid: A Multicenter Cohort Study. Korean J Radiol 2022; 23:763-772. [PMID: 35695317 PMCID: PMC9240300 DOI: 10.3348/kjr.2022.0079] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 04/20/2022] [Accepted: 04/26/2022] [Indexed: 11/15/2022] Open
Abstract
OBJECTIVE Preoperative differential diagnosis of follicular-patterned lesions is challenging. This multicenter cohort study investigated the clinicoradiological characteristics relevant to the differential diagnosis of such lesions. MATERIALS AND METHODS From June to September 2015, 4787 thyroid nodules (≥ 1.0 cm) with a final diagnosis of benign follicular nodule (BN, n = 4461), follicular adenoma (FA, n = 136), follicular carcinoma (FC, n = 62), or follicular variant of papillary thyroid carcinoma (FVPTC, n = 128) collected from 26 institutions were analyzed. The clinicoradiological characteristics of the lesions were compared among the different histological types using multivariable logistic regression analyses. The relative importance of the characteristics that distinguished histological types was determined using a random forest algorithm. RESULTS Compared to BN (as the control group), the distinguishing features of follicular-patterned neoplasms (FA, FC, and FVPTC) were patient's age (odds ratio [OR], 0.969 per 1-year increase), lesion diameter (OR, 1.054 per 1-mm increase), presence of solid composition (OR, 2.255), presence of hypoechogenicity (OR, 2.181), and presence of halo (OR, 1.761) (all p < 0.05). Compared to FA (as the control), FC differed with respect to lesion diameter (OR, 1.040 per 1-mm increase) and rim calcifications (OR, 17.054), while FVPTC differed with respect to patient age (OR, 0.966 per 1-year increase), lesion diameter (OR, 0.975 per 1-mm increase), macrocalcifications (OR, 3.647), and non-smooth margins (OR, 2.538) (all p < 0.05). The five important features for the differential diagnosis of follicular-patterned neoplasms (FA, FC, and FVPTC) from BN are maximal lesion diameter, composition, echogenicity, orientation, and patient's age. The most important features distinguishing FC and FVPTC from FA are rim calcifications and macrocalcifications, respectively. CONCLUSION Although follicular-patterned lesions have overlapping clinical and radiological features, the distinguishing features identified in our large clinical cohort may provide valuable information for preoperative distinction between them and decision-making regarding their management.
Collapse
Affiliation(s)
- Jeong Hoon Lee
- Department of Radiology, Ajou University School of Medicine, Suwon, Korea
| | - Eun Ju Ha
- Department of Radiology, Ajou University School of Medicine, Suwon, Korea.
| | - Da Hyun Lee
- Department of Radiology, Ajou University School of Medicine, Suwon, Korea
| | - Miran Han
- Department of Radiology, Ajou University School of Medicine, Suwon, Korea
| | - Jung Hyun Park
- Department of Radiology, Ajou University School of Medicine, Suwon, Korea
| | - Ji-Hoon Kim
- Department of Radiology, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| |
Collapse
|
30
|
Zhou F, Gao S, Ni L, Pavlovski M, Dong Q, Obradovic Z, Qian W. Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification. Data Min Knowl Discov 2022. [DOI: 10.1007/s10618-022-00838-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
31
|
Hayer SS, Casanova-Higes A, Paladino E, Elnekave E, Nault A, Johnson T, Bender J, Perez A, Alvarez J. Global Distribution of Extended Spectrum Cephalosporin and Carbapenem Resistance and Associated Resistance Markers in Escherichia coli of Swine Origin - A Systematic Review and Meta-Analysis. Front Microbiol 2022; 13:853810. [PMID: 35620091 PMCID: PMC9127762 DOI: 10.3389/fmicb.2022.853810] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 04/20/2022] [Indexed: 11/13/2022] Open
Abstract
Third generation cephalosporins and carbapenems are considered critically important antimicrobials in human medicine. Food animals such as swine can act as reservoirs of antimicrobial resistance (AMR) genes/bacteria resistant to these antimicrobial classes, and potential dissemination of AMR genes or resistant bacteria from pigs to humans is an ongoing public health threat. The objectives of this systematic review and meta-analysis were to: (1) estimate global proportion and animal-level prevalence of swine E. coli phenotypically resistant to third generation cephalosporins (3GCs) and carbapenems at a country level; and (2) measure abundances and global distribution of the genetic mechanisms that confer resistance to these antimicrobial classes in these E. coli isolates. Articles from four databases (CAB Abstracts, PubMed/MEDLINE, PubAg, and Web of Science) were screened to extract relevant data. Overall, proportion of E. coli resistant to 3GCs was lower in Australia, Europe, and North America compared to Asian countries. Globally, <5% of all E. coli were carbapenem-resistant. Fecal carriage rates (animal-level prevalence) were consistently manifold higher as compared to pooled proportion of resistance in E. coli isolates. blaCTX–M were the most common 3GC resistance genes globally, with the exception of North America where blaCMY were the predominant 3GC resistance genes. There was not a single dominant blaCTX–M gene subtype globally and several blaCTX–M subtypes were dominant depending on the continent. A wide variety of carbapenem-resistance genes (blaNDM–, VIM–, IMP–, OXA–48, andKPC–) were identified to be circulating in pig populations globally, albeit at very-low frequencies. However, great statistical heterogeneity and a critical lack of metadata hinders the true estimation of prevalence of phenotypic and genotypic resistance to these antimicrobials. Comparatively frequent occurrence of 3GC resistance and emergence of carbapenem resistance in certain countries underline the urgent need for improved AMR surveillance in swine production systems in these countries.
Collapse
Affiliation(s)
- Shivdeep Singh Hayer
- Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota-Twin Cities, St. Paul, MN, United States.,Department of Biology, College of Arts and Sciences, University of Nebraska Omaha, Omaha, NE, United States
| | - Alejandro Casanova-Higes
- Departamento de Patología Animal, Facultad de Veterinaria, Universidad de Zaragoza, Zaragoza, Spain
| | - Eliana Paladino
- Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota-Twin Cities, St. Paul, MN, United States
| | - Ehud Elnekave
- Koret School of Veterinary Medicine, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Andre Nault
- Health Sciences Library, University of Minnesota-Twin Cities, Minneapolis, MN, United States
| | - Timothy Johnson
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota-Twin Cities, St. Paul, MN, United States
| | - Jeff Bender
- School of Public Health, University of Minnesota-Twin Cities, Minneapolis, MN, United States
| | - Andres Perez
- Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota-Twin Cities, St. Paul, MN, United States
| | - Julio Alvarez
- Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota-Twin Cities, St. Paul, MN, United States.,VISAVET Health Surveillance Centre, Universidad Complutense Madrid, Madrid, Spain.,Department of Animal Health, Facultad de Veterinaria, Universidad Complutense Madrid, Madrid, Spain
| |
Collapse
|
32
|
Balbuena LD, Baetz M, Sexton JA, Harder D, Feng CX, Boctor K, LaPointe C, Letwiniuk E, Shamloo A, Ishwaran H, John A, Brantsæter AL. Identifying long-term and imminent suicide predictors in a general population and a clinical sample with machine learning. BMC Psychiatry 2022; 22:120. [PMID: 35168594 PMCID: PMC8848909 DOI: 10.1186/s12888-022-03702-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Accepted: 01/12/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Machine learning (ML) is increasingly used to predict suicide deaths but their value for suicide prevention has not been established. Our first objective was to identify risk and protective factors in a general population. Our second objective was to identify factors indicating imminent suicide risk. METHODS We used survival and ML models to identify lifetime predictors using the Cohort of Norway (n=173,275) and hospital diagnoses in a Saskatoon clinical sample (n=12,614). The mean follow-up times were 17 years and 3 years for the Cohort of Norway and Saskatoon respectively. People in the clinical sample had a longitudinal record of hospital visits grouped in six-month intervals. We developed models in a training set and these models predicted survival probabilities in held-out test data. RESULTS In the general population, we found that a higher proportion of low-income residents in a county, mood symptoms, and daily smoking increased the risk of dying from suicide in both genders. In the clinical sample, the only predictors identified were male gender and older age. CONCLUSION Suicide prevention probably requires individual actions with governmental incentives. The prediction of imminent suicide remains highly challenging, but machine learning can identify early prevention targets.
Collapse
Affiliation(s)
- Lloyd D Balbuena
- Department of Psychiatry, University of Saskatchewan, Saskatoon, Canada.
| | - Marilyn Baetz
- College of Medicine, University of Saskatchewan, Saskatoon, Canada
| | | | - Douglas Harder
- Mental Health & Addictions Services, Saskatchewan Health Authority, Saskatoon, Canada
| | - Cindy Xin Feng
- Department of Community Health and Epidemiology, Dalhousie University, Halifax, Canada
| | - Kerstina Boctor
- Department of Psychiatry, University of Saskatchewan, Saskatoon, Canada
| | - Candace LaPointe
- Mental Health & Addictions Services, Saskatchewan Health Authority, Saskatoon, Canada
| | - Elizabeth Letwiniuk
- Mental Health & Addictions Services, Saskatchewan Health Authority, Saskatoon, Canada
| | - Arash Shamloo
- Department of Psychiatry, University of Saskatchewan, Saskatoon, Canada
| | | | - Ann John
- Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Anne Lise Brantsæter
- Department of Environmental Health, Norwegian Institute of Public Health, Oslo, Norway
| |
Collapse
|
33
|
Data Preprocessing Combination to Improve the Performance of Quality Classification in the Manufacturing Process. ELECTRONICS 2022. [DOI: 10.3390/electronics11030477] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The recent introduction of smart manufacturing, also called the ‘smart factory’, has made it possible to collect a significant number of multi-variate data from Internet of Things devices or sensors. Quality control using these data in the manufacturing process can play a major role in preventing unexpected time and economic losses. However, the extraction of information about the manufacturing process is limited when there are missing values in the data and a data imbalance set. In this study, we improve the quality classification performance by solving the problem of missing values and data imbalances that can occur in the manufacturing process. This study proceeds with data cleansing, data substitution, data scaling, a data balancing model methodology, and evaluation. Five data balancing methods and a Generative Adversarial Network (GAN) were used to proceed with data imbalance processing. The proposed schemes achieved an F1 score that was 0.5 higher than the F1 score of previous studies that used the same data. The data preprocessing combination proposed in this study is intended to be used to solve the problem of missing values and imbalances that occur in the manufacturing process.
Collapse
|
34
|
|
35
|
Rout S, Mallick PK, Mishra D. DRBF-DS: Double RBF Kernel-Based Deep Sampling with CNNs to Handle Complex Imbalanced Datasets. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2022. [DOI: 10.1007/s13369-021-06480-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
36
|
Polineni S, Shastri O, Bagchi A, Gnanakumar G, Rasamsetti S, Sundaravadivel P. MOSQUITO EDGE: An Edge-Intelligent Real-Time Mosquito Threat Prediction Using an IoT-Enabled Hardware System. SENSORS 2022; 22:s22020695. [PMID: 35062653 PMCID: PMC8780188 DOI: 10.3390/s22020695] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 01/06/2022] [Accepted: 01/07/2022] [Indexed: 11/22/2022]
Abstract
Species distribution models (SDMs) that use climate variables to make binary predictions are effective tools for niche prediction in current and future climate scenarios. In this study, a Hutchinson hypervolume is defined with temperature, humidity, air pressure, precipitation, and cloud cover climate vectors collected from the National Oceanic and Atmospheric Administration (NOAA) that were matched to mosquito presence and absence points extracted from NASA’s citizen science platform called GLOBE Observer and the National Ecological Observatory Network. An 86% accurate Random Forest model that operates on binary classification was created to predict mosquito threat. Given a location and date input, the model produces a threat level based on the number of decision trees that vote for a presence label. The feature importance chart and regression show a positive, linear correlation between humidity and mosquito threat and between temperature and mosquito threat below a threshold of 28 °C. In accordance with the statistical analysis and ecological wisdom, high threat clusters in warm, humid regions and low threat clusters in cold, dry regions were found. With the model running on the cloud and within ArcGIS Dashboard, accurate and granular real-time threat level predictions can be made at any latitude and longitude. A device leveraging Global Positioning System (GPS) smartphone technology and the Internet of Things (IoT) to collect and analyze data on the edge was developed. The data from the edge device along with its respective date and location collected are automatically inputted into the aforementioned Random Forest model to provide users with a real-time threat level prediction. This inexpensive hardware can be used in developing countries that are threatened by vector-borne diseases or in remote areas without cloud connectivity. Such devices can be linked with citizen science mosquito data platforms to build training datasets for machine learning based SDMs.
Collapse
Affiliation(s)
- Shyam Polineni
- STEM Enhancement in Earth Sciences, NASA Center for Space Research, Austin, TX 78723, USA; (S.P.); (O.S.); (A.B.); (G.G.); (S.R.)
| | - Om Shastri
- STEM Enhancement in Earth Sciences, NASA Center for Space Research, Austin, TX 78723, USA; (S.P.); (O.S.); (A.B.); (G.G.); (S.R.)
| | - Avi Bagchi
- STEM Enhancement in Earth Sciences, NASA Center for Space Research, Austin, TX 78723, USA; (S.P.); (O.S.); (A.B.); (G.G.); (S.R.)
| | - Govind Gnanakumar
- STEM Enhancement in Earth Sciences, NASA Center for Space Research, Austin, TX 78723, USA; (S.P.); (O.S.); (A.B.); (G.G.); (S.R.)
| | - Sujay Rasamsetti
- STEM Enhancement in Earth Sciences, NASA Center for Space Research, Austin, TX 78723, USA; (S.P.); (O.S.); (A.B.); (G.G.); (S.R.)
| | - Prabha Sundaravadivel
- Department of Electrical Engineering, The University of Texas at Tyler, Tyler, TX 75702, USA
- Correspondence: ; Tel.: +1-903-566-6118
| |
Collapse
|
37
|
Alnajar A, Ibrahim W, Mendoza CE. Are the outcomes of TAVR significantly riskier for solid organ transplant recipients than for the general population? J Card Surg 2022; 37:608-609. [PMID: 35000216 DOI: 10.1111/jocs.16207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 12/21/2021] [Indexed: 11/30/2022]
Affiliation(s)
- Ahmed Alnajar
- Department of Cardiothoracic Surgery, Jackson Memorial Hospital/University of Miami, Miami, Florida, USA
| | - Walid Ibrahim
- Department of Cardiology, Jackson Memorial Hospital/University of Miami, Miami, Florida, USA
| | - Cesar E Mendoza
- Department of Cardiology, Jackson Memorial Hospital/University of Miami, Miami, Florida, USA
| |
Collapse
|
38
|
Bohannan ZS, Coffman F, Mitrofanova A. Random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia. Comput Struct Biotechnol J 2022; 20:583-597. [PMID: 35116134 PMCID: PMC8777142 DOI: 10.1016/j.csbj.2022.01.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 12/30/2021] [Accepted: 01/01/2022] [Indexed: 12/16/2022] Open
Abstract
High-risk pediatric B-ALL patients experience 5-year negative event rates up to 25%. Although some biomarkers of relapse are utilized in the clinic, their ability to predict outcomes in high-risk patients is limited. Here, we propose a random survival forest (RSF) machine learning model utilizing interpretable genomic inputs to predict relapse/death in high-risk pediatric B-ALL patients. We utilized whole exome sequencing profiles from 156 patients in the TARGET-ALL study (with samples collected at presentation) further stratified into training and test cohorts (109 and 47 patients, respectively). To avoid overfitting and facilitate the interpretation of machine learning results, input genomic variables were engineered using a stepwise approach involving univariable Cox models to select variables directly associated with outcomes, genomic coordinate-based analysis to select mutational hotspots, and correlation analysis to eliminate feature co-linearity. Model training identified 7 genomic regions most predictive of relapse/death-free survival. The test cohort error rate was 12.47%, and a polygenic score based on the sum of the top 7 variables effectively stratified patients into two groups, with significant differences in time to relapse/death (log-rank P = 0.001, hazard ratio = 5.41). Our model outperformed other EFS modeling approaches including an RSF using gold-standard prognostic variables (error rate = 24.35%). Validation in 174 standard-risk patients and 3 patients who failed to respond to induction therapy confirmed that our RSF model and polygenic score were specific to high-risk disease. We propose that our feature selection/engineering approach can increase the clinical interpretability of RSF, and our polygenic score could be utilized for enhance clinical decision-making in high-risk B-ALL.
Collapse
Affiliation(s)
- Zachary S. Bohannan
- Rutgers, The State University of New Jersey, School of Health Professions, Department of Health Informatics, 65 Bergen Street, Suite 120, Newark, NJ 07107-1709, United States
| | - Frederick Coffman
- Rutgers, The State University of New Jersey, School of Health Professions, Department of Health Informatics, 65 Bergen Street, Suite 120, Newark, NJ 07107-1709, United States
| | - Antonina Mitrofanova
- Rutgers, The State University of New Jersey, School of Health Professions, Department of Health Informatics, 65 Bergen Street, Suite 120, Newark, NJ 07107-1709, United States
| |
Collapse
|
39
|
Aranda-Michel E, Sultan I, Kilic A, Bianco V, Brown JA, Serna-Gallegos D. A machine learning approach to model for end-stage liver disease score in cardiac surgery. J Card Surg 2021; 37:29-38. [PMID: 34796544 DOI: 10.1111/jocs.16076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 09/10/2021] [Accepted: 10/05/2021] [Indexed: 01/08/2023]
Abstract
OBJECTIVE Model for end-stage liver disease (MELD) likely has nonlinear effects on operative outcomes. We use machine learning to evaluate the nonlinear (dependent variable may not correlate one to one with an increased risk in the outcome) relationship between MELD and outcomes of cardiac surgery. METHODS Society of Thoracic Surgery indexed elective cardiac operations between 2011 and 2018 were included. MELD was retrospectively calculated. Logistic regression models and an imbalanced random forest classifier were created on operative mortality. Cox regression models and random forest survival models evaluated survival. Variable importance analysis (VIMP) ranked variables by predictive power. Linear and machine-learned models were compared with receiver operator characteristic (ROC) and Brier score. RESULTS We included 3872 patients. Operative mortality was 1.7% and 5-year survival was 82.1%. MELD was the fourth largest positive predictor on VIMP analysis for operative long-term survival and the strongest negative predictor for operative mortality. MELD was not a significant predictor for operative mortality or long-term survival in the logistic or Cox regressions. The logistic model ROC area was 0.762, compared to the random forest classifier ROC of 0.674. The Brier score of the random forest survival model was larger than the Cox regression starting at 2 years and continuing throughout the study period. Bootstrap estimation on linear regression demonstrated machine-learned models were superior. CONCLUSIONS MELD and mortality are nonlinear. MELD was insignificant in the Cox multivariable regression but was strongly important in the random forest survival model and when using bootstrapping, the superior utility was demonstrated of the machine-learned models.
Collapse
Affiliation(s)
- Edgar Aranda-Michel
- Division of Cardiac Surgery, Department of Cardiothoracic Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Ibrahim Sultan
- Division of Cardiac Surgery, Department of Cardiothoracic Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.,Division of Cardiac Surgery, Heart and Vascular Institute, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA
| | - Arman Kilic
- Division of Cardiac Surgery, Department of Cardiothoracic Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.,Division of Cardiac Surgery, Heart and Vascular Institute, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA
| | - Valentino Bianco
- Division of Cardiac Surgery, Department of Cardiothoracic Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - James A Brown
- Division of Cardiac Surgery, Department of Cardiothoracic Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Derek Serna-Gallegos
- Division of Cardiac Surgery, Department of Cardiothoracic Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.,Division of Cardiac Surgery, Heart and Vascular Institute, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
40
|
Selecting the Suitable Resampling Strategy for Imbalanced Data Classification Regarding Dataset Properties. An Approach Based on Association Models. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11188546] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Thus, the prediction model is unreliable although the overall model accuracy can be acceptable. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class. However, their effectiveness depends on several factors mainly related to data intrinsic characteristics, such as imbalance ratio, dataset size and dimensionality, overlapping between classes or borderline examples. In this work, the impact of these factors is analyzed through a comprehensive comparative study involving 40 datasets from different application areas. The objective is to obtain models for automatic selection of the best resampling strategy for any dataset based on its characteristics. These models allow us to check several factors simultaneously considering a wide range of values since they are induced from very varied datasets that cover a broad spectrum of conditions. This differs from most studies that focus on the individual analysis of the characteristics or cover a small range of values. In addition, the study encompasses both basic and advanced resampling strategies that are evaluated by means of eight different performance metrics, including new measures specifically designed for imbalanced data classification. The general nature of the proposal allows the choice of the most appropriate method regardless of the domain, avoiding the search for special purpose techniques that could be valid for the target data.
Collapse
|
41
|
Juez-Gil M, Arnaiz-González Á, Rodríguez JJ, García-Osorio C. Experimental evaluation of ensemble classifiers for imbalance in Big Data. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107447] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
42
|
Applying random forest in a health administrative data context: a conceptual guide. HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY 2021. [DOI: 10.1007/s10742-021-00255-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
43
|
O'Brien RC, Ishwaran H, Szczotka-Flynn LB, Lass JH. Random Survival Forests Analysis of Intraoperative Complications as Predictors of Descemet Stripping Automated Endothelial Keratoplasty Graft Failure in the Cornea Preservation Time Study. JAMA Ophthalmol 2021; 139:191-197. [PMID: 33355637 DOI: 10.1001/jamaophthalmol.2020.5743] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Importance A new analytic method can evaluate factors of interest associated with graft failure after Descemet stripping automated endothelial keratoplasty (DSAEK) or more generally in any ophthalmic surgical setting with a time-to-event outcome. Objective To reanalyze types of intraoperative complications associated with DSAEK graft failure in the Cornea Preservation Time Study using random survival forests. Design, Setting, and Participants This cohort study, initially conceived in April 2019, used a prediction model to conduct a post hoc secondary analysis of data collected in a multicenter, double-masked, randomized clinical trial. Forty US clinical sites with 70 surgeons participated, with donor corneas provided by 23 US eye banks. The study included 1090 participants, representing 1330 eyes, undergoing DSAEK for Fuchs dystrophy (1255 eyes [94.4%]) or pseudophakic or aphakic corneal edema (75 eyes [5.6%]). Enrollment occurred between April 16, 2012, and February 20, 2014, and follow-up ended June 5, 2017. Statistical analysis was performed from July 10, 2019, to May 29, 2020. Intervention Descemet stripping automated endothelial keratoplasty with random assignment of a donor cornea with preservation time of 7 days or less or 8 to 14 days. Main Outcomes and Measures Ranked variable importance for intraoperative complications among 50 donor, recipient, and eye bank variables and restricted mean survival time through 47 months (1434 days) after DSAEK were examined. Random survival forests, a nonparametric method (with less restrictive model assumptions) that is far more flexible in its ability to model nonlinear effects and interactions, was used to analyze the data. Results This study included 1090 participants (663 women [60.8%]; median age, 70 years [range, 42-90 years]), representing 1330 eyes. Random survival forests ranked a DSAEK intraoperative complication as the third most predictive factor of graft failure, after surgeon and eye bank, in the final model with 5 predictors. In the first 47 months after DSAEK, the estimated mean difference in restricted mean survival time for grafts that experienced a DSAEK intraoperative complication vs those that did not was -227 days (99% CI, -352 to -70 days) based on the final RSF model. Conclusions and Relevance These findings, while post hoc, support the hypothesis that random survival forests allow for an improved analytic approach for identifying factors predictive of graft failure and for obtaining adjusted graft survival estimates. Random survival forests offer the opportunity to guide the development of future population-based cohort ophthalmic surgical studies, establishing definitive factors for procedural success.
Collapse
Affiliation(s)
- Robert C O'Brien
- Department of Data Science, University of Mississippi Medical Center, Jackson
| | - Hemant Ishwaran
- Division of Biostatistics, Department of Public Health Sciences, University of Miami, Miami, Florida
| | - Loretta B Szczotka-Flynn
- Department of Ophthalmology and Visual Sciences, Case Western Reserve University, Cleveland, Ohio.,University Hospitals Cleveland Medical Center, Cleveland, Ohio
| | - Jonathan H Lass
- Department of Ophthalmology and Visual Sciences, Case Western Reserve University, Cleveland, Ohio.,University Hospitals Cleveland Medical Center, Cleveland, Ohio
| | | |
Collapse
|
44
|
Byeon H. Comparing Ensemble-Based Machine Learning Classifiers Developed for Distinguishing Hypokinetic Dysarthria from Presbyphonia. APPLIED SCIENCES 2021; 11:2235. [DOI: 10.3390/app11052235] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/29/2023]
Abstract
It is essential to understand the voice characteristics in the normal aging process to accurately distinguish presbyphonia from neurological voice disorders. This study developed the best ensemble-based machine learning classifier that could distinguish hypokinetic dysarthria from presbyphonia using classification and regression tree (CART), random forest, gradient boosting algorithm (GBM), and XGBoost and compared the prediction performance of models. The subjects of this study were 76 elderly patients diagnosed with hypokinetic dysarthria and 174 patients with presbyopia. This study developed prediction models for distinguishing hypokinetic dysarthria from presbyphonia by using CART, GBM, XGBoost, and random forest and compared the accuracy, sensitivity, and specificity of the development models to identify the prediction performance of them. The results of this study showed that random forest had the best prediction performance when it was tested with the test dataset (accuracy = 0.83, sensitivity = 0.90, and specificity = 0.80, and area under the curve (AUC) = 0.85). The main predictors for detecting hypokinetic dysarthria were Cepstral peak prominence (CPP), jitter, shimmer, L/H ratio, L/H ratio_SD, CPP max (dB), CPP min (dB), and CPPF0 in the order of magnitude. Among them, CPP was the most important predictor for identifying hypokinetic dysarthria.
Collapse
|
45
|
SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features. APPLIED SYSTEM INNOVATION 2021. [DOI: 10.3390/asi4010018] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Real-world datasets are heavily skewed where some classes are significantly outnumbered by the other classes. In these situations, machine learning algorithms fail to achieve substantial efficacy while predicting these underrepresented instances. To solve this problem, many variations of synthetic minority oversampling methods (SMOTE) have been proposed to balance datasets which deal with continuous features. However, for datasets with both nominal and continuous features, SMOTE-NC is the only SMOTE-based oversampling technique to balance the data. In this paper, we present a novel minority oversampling method, SMOTE-ENC (SMOTE—Encoded Nominal and Continuous), in which nominal features are encoded as numeric values and the difference between two such numeric values reflects the amount of change of association with the minority class. Our experiments show that classification models using the SMOTE-ENC method offer better prediction than models using SMOTE-NC when the dataset has a substantial number of nominal features and also when there is some association between the categorical features and the target class. Additionally, our proposed method addressed one of the major limitations of the SMOTE-NC algorithm. SMOTE-NC can be applied only on mixed datasets that have features consisting of both continuous and nominal features and cannot function if all the features of the dataset are nominal. Our novel method has been generalized to be applied to both mixed datasets and nominal-only datasets.
Collapse
|
46
|
Classification and Prediction of Natural Streamflow Regimes in Arid Regions of the USA. WATER 2021. [DOI: 10.3390/w13030380] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Understanding how natural variation in flow regimes influences stream ecosystem structure and function is critical to the development of effective stream management policies. Spatial variation in flow regimes among streams is reasonably well understood for streams in mesic regions, but a more robust characterization of flow regimes in arid regions is needed, especially to support biological monitoring and assessment programs. In this paper, we used long-term (41 years) records of mean daily streamflow from 287 stream reaches in the arid and semi-arid western USA to develop and compare several alternative flow-regime classifications. We also evaluated how accurately we could predict the flow-regime classes of ungauged reaches. Over the 41-year record examined (water years 1972–2013), the gauged reaches varied continuously from always having flow > zero to seldom having flow. We predicted ephemeral and perennial reaches with less error than reaches with an intermediate number of zero-flow days or years. We illustrate application of our approach by predicting the flow-regime classes at ungauged reaches in Arizona, USA. Maps based on these predictions were generally consistent with qualitative expectations of how flow regimes vary spatially across Arizona. These results represent a promising step toward more effective assessment and management of streams in arid regions.
Collapse
|
47
|
A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106689] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
48
|
Huang KY, Hung FY, Kao HJ, Lau HH, Weng SL. iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features. BMC Bioinformatics 2020; 21:568. [PMID: 33297954 PMCID: PMC7727188 DOI: 10.1186/s12859-020-03916-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 11/30/2020] [Indexed: 11/24/2022] Open
Abstract
Background Protein phosphoglycerylation, the addition of a 1,3-bisphosphoglyceric acid (1,3-BPG) to a lysine residue of a protein and thus to form a 3-phosphoglyceryl-lysine, is a reversible and non-enzymatic post-translational modification (PTM) and plays a regulatory role in glucose metabolism and glycolytic process. As the number of experimentally verified phosphoglycerylated sites has increased significantly, statistical or machine learning methods are imperative for investigating the characteristics of phosphoglycerylation sites. Currently, research into phosphoglycerylation is very limited, and only a few resources are available for the computational identification of phosphoglycerylation sites.
Result We present a bioinformatics investigation of phosphoglycerylation sites based on sequence-based features. The TwoSampleLogo analysis reveals that the regions surrounding the phosphoglycerylation sites contain a high relatively of positively charged amino acids, especially in the upstream flanking region. Additionally, the non-polar and aliphatic amino acids are more abundant surrounding phosphoglycerylated lysine following the results of PTM-Logo, which may play a functional role in discriminating between phosphoglycerylation and non-phosphoglycerylation sites. Many types of features were adopted to build the prediction model on the training dataset, including amino acid composition, amino acid pair composition, positional weighted matrix and position-specific scoring matrix. Further, to improve the predictive power, numerous top features ranked by F-score were considered as the final combination for classification, and thus the predictive models were trained using DT, RF and SVM classifiers. Evaluation by five-fold cross-validation showed that the selected features was most effective in discriminating between phosphoglycerylated and non-phosphoglycerylated sites. Conclusion The SVM model trained with the selected sequence-based features performed well, with a sensitivity of 77.5%, a specificity of 73.6%, an accuracy of 74.9%, and a Matthews Correlation Coefficient value of 0.49. Furthermore, the model also consistently provides the effective performance in independent testing set, yielding sensitivity of 75.7% and specificity of 64.9%. Finally, the model has been implemented as a web-based system, namely iDPGK, which is now freely available at http://mer.hc.mmh.org.tw/iDPGK/.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan.,Department of Medicine, Mackay Medical College, New Taipei City 252, Taiwan
| | - Fang-Yu Hung
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan
| | - Hui-Ju Kao
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan
| | - Hui-Hsuan Lau
- Department of Medicine, Mackay Medical College, New Taipei City 252, Taiwan. .,Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan. .,Department of Obstetrics and Gynecology, Mackay Memorial Hospital, Taipei City 104, Taiwan.
| | - Shun-Long Weng
- Department of Medicine, Mackay Medical College, New Taipei City 252, Taiwan. .,Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan. .,Mackay Junior College of Medicine, Medicine, Nursing and Management College, Taipei City 112, Taiwan.
| |
Collapse
|
49
|
Looking beyond the eyeball test: A novel vitality index to predict recovery after esophagectomy. J Thorac Cardiovasc Surg 2020; 161:822-832.e6. [PMID: 33451846 DOI: 10.1016/j.jtcvs.2020.10.122] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 10/02/2020] [Accepted: 10/18/2020] [Indexed: 12/21/2022]
Abstract
OBJECTIVES To (1) measure 4 physiologic metrics before esophagectomy, (2) use these in an index to predict composite postoperative outcome after esophagectomy, and (3) compare predictive accuracy of this index to that of the Fried Frailty Index and Modified Frailty Index. METHODS Grip strength (kilograms), 30-second chair sit-stands (number), 6-minute walk distance (meters), and normalized psoas muscle area (cm2/m) were measured for 77 consenting patients from January 1, 2018, to April 1, 2019. Imbalanced random forest classification estimated probability of a composite postoperative outcome, which included mortality, respiratory complications, anastomotic leak, delirium, length of stay ≥14 days, discharge to nursing facility, and readmission. G-mean error was used to compare predictive accuracy among indexes. RESULTS Median grip strength was 38 kg (25th-75th percentiles, 31-44), number of sit-stands 11 (10-14), psoas muscle area to height ratio 6.9 cm2/m (6.0-8.2), and 6-minute walk distance 407 m (368-451). There was generally weak correlation between these metrics, with the highest between 30-second sit-stands and 6-minute walk distance (r = 0.57). Age, degree of patient-reported exhaustion, and the 4 objective metrics comprised the Esophageal Vitality Index, which had a lower G-mean error of 32% (31-33) than the Fried Frailty Index, 37% (37-38), and the Modified Frailty Index, 48% (47-48). CONCLUSIONS The Esophageal Vitality Index, an objective, simple assessment consisting of grip strength, 30-second chair sit-stands, 6-minute walk, and psoas muscle area to height ratio outperformed commonly used frailty indexes in predicting postesophagectomy mortality and morbidity. The index provides a robust picture of patients' fitness for surgery beyond the qualitative "eyeball" test.
Collapse
|
50
|
A ternary bitwise calculator based genetic algorithm for improving error correcting output codes. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.05.088] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|