1
|
Yang L, Leynes C, Pawelka A, Lorenzo I, Chou A, Lee B, Heaney JD. Machine learning in time-lapse imaging to differentiate embryos from young vs old mice. Biol Reprod 2024:ioae056. [PMID: 38685607 DOI: 10.1093/biolre/ioae056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 02/15/2024] [Indexed: 05/02/2024] Open
Abstract
Time-lapse microscopy for embryos is a non-invasive technology used to characterize early embryo development. This study employs time-lapse microscopy and machine learning to elucidate changes in embryonic growth kinetics with maternal aging. We analyzed morphokinetic parameters of embryos from young and aged C57BL6/NJ mice via continuous imaging. Our findings show that aged embryos accelerated through cleavage stages (from 5-cells) to morula compared to younger counterparts, with no significant differences observed in later stages of blastulation. Unsupervised machine learning identified two distinct clusters comprising of embryos from aged or young donors. Moreover, in supervised learning, the XGBoost (extreme gradient boosting) algorithm successfully predicted the age-related phenotype with 0.78 accuracy, 0.81 precision, and 0.83 recall following hyperparameter tuning. These results highlight two main scientific insights: maternal aging affects embryonic development pace, and that AI can differentiate between embryos from aged and young maternal mice by a non-invasive approach. Thus, machine learning can be used to identify morphokinetics phenotypes for further studies. This study has potential for future applications in selecting human embryos for embryo transfer, without or in complement with preimplantation genetic testing.
Collapse
Affiliation(s)
- Liubin Yang
- Division of Reproductive Endocrinology and Infertility, Department of Obstetrics and Gynecology, Baylor College of Medicine, Houston, Texas
- Division of Reproductive Endocrinology and Infertility, Department of Obstetrics, Gynecology, and Reproductive Sciences, Yale School of Medicine, New Haven, Connecticut
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Carolina Leynes
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Ashley Pawelka
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Isabel Lorenzo
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Andrew Chou
- Pain Research, Informatics, Multi-morbidities, and Education (PRIME) Center, VA Connecticut Healthcare System, West Haven, Connecticut
- Section of Infectious Diseases, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut
| | - Brendan Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Jason D Heaney
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| |
Collapse
|
2
|
Johnson K, Kuhn M. What they forgot to tell you about machine learning with an application to pharmaceutical manufacturing. Pharm Stat 2024. [PMID: 38415497 DOI: 10.1002/pst.2366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 12/31/2023] [Indexed: 02/29/2024]
Abstract
Predictive models (a.k.a. machine learning models) are ubiquitous in all stages of drug research, safety, development, manufacturing, and marketing. The results of these models are used inside and outside of pharmaceutical companies for the purpose of understanding scientific processes and for predicting characteristics of new samples or patients. While there are many resources that describe such models, there are few that explain how to develop a robust model that extracts the highest possible performance from the available data, especially in support of pharmaceutical applications. This tutorial will describe pitfalls and best practices for developing and validating predictive models with a specific application to a monitoring a pharmaceutical manufacturing process. The pitfalls and best practices will be highlighted to call attention to specific points that are not generally discussed in other resources.
Collapse
Affiliation(s)
| | - Max Kuhn
- Posit PBC, Boston, Massachusetts, USA
| |
Collapse
|
3
|
Gupta V, Li Y, Peltekian A, Kilic MNT, Liao WK, Choudhary A, Agrawal A. Simultaneously improving accuracy and computational cost under parametric constraints in materials property prediction tasks. J Cheminform 2024; 16:17. [PMID: 38365691 PMCID: PMC10870658 DOI: 10.1186/s13321-024-00811-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 02/08/2024] [Indexed: 02/18/2024] Open
Abstract
Modern data mining techniques using machine learning (ML) and deep learning (DL) algorithms have been shown to excel in the regression-based task of materials property prediction using various materials representations. In an attempt to improve the predictive performance of the deep neural network model, researchers have tried to add more layers as well as develop new architectural components to create sophisticated and deep neural network models that can aid in the training process and improve the predictive ability of the final model. However, usually, these modifications require a lot of computational resources, thereby further increasing the already large model training time, which is often not feasible, thereby limiting usage for most researchers. In this paper, we study and propose a deep neural network framework for regression-based problems comprising of fully connected layers that can work with any numerical vector-based materials representations as model input. We present a novel deep regression neural network, iBRNet, with branched skip connections and multiple schedulers, which can reduce the number of parameters used to construct the model, improve the accuracy, and decrease the training time of the predictive model. We perform the model training using composition-based numerical vectors representing the elemental fractions of the respective materials and compare their performance against other traditional ML and several known DL architectures. Using multiple datasets with varying data sizes for training and testing, We show that the proposed iBRNet models outperform the state-of-the-art ML and DL models for all data sizes. We also show that the branched structure and usage of multiple schedulers lead to fewer parameters and faster model training time with better convergence than other neural networks. Scientific contribution: The combination of multiple callback functions in deep neural networks minimizes training time and maximizes accuracy in a controlled computational environment with parametric constraints for the task of materials property prediction.
Collapse
Affiliation(s)
- Vishu Gupta
- Department of Electrical and Computer Engineering, Northwestern University, Evanston, USA
| | - Youjia Li
- Department of Electrical and Computer Engineering, Northwestern University, Evanston, USA
| | - Alec Peltekian
- Department of Computer Science, Northwestern University, Evanston, USA
| | | | - Wei-Keng Liao
- Department of Electrical and Computer Engineering, Northwestern University, Evanston, USA
| | - Alok Choudhary
- Department of Electrical and Computer Engineering, Northwestern University, Evanston, USA
| | - Ankit Agrawal
- Department of Electrical and Computer Engineering, Northwestern University, Evanston, USA.
| |
Collapse
|
4
|
Rezaeitaleshmahalleh M, Mu N, Lyu Z, Zhou W, Zhang X, Rasmussen TE, McBane RD, Jiang J. Radiomic-based Textural Analysis of Intraluminal Thrombus in Aortic Abdominal Aneurysms: A Demonstration of Automated Workflow. J Cardiovasc Transl Res 2023; 16:1123-1134. [PMID: 37407866 DOI: 10.1007/s12265-023-10404-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 06/09/2023] [Indexed: 07/07/2023]
Abstract
Our main objective is to investigate how the structural information of intraluminal thrombus (ILT) can be used to predict abdominal aortic aneurysms (AAA) growth status through an automated workflow. Fifty-four human subjects with ILT in their AAAs were identified from our database; those AAAs were categorized as slowly- (< 5 mm/year) or fast-growing (≥ 5 mm/year) AAAs. In-house deep-learning image segmentation models were used to generate 3D geometrical AAA models, followed by automated analysis. All features were fed into a support vector machine classifier to predict AAA's growth status.The most accurate prediction model was achieved through four geometrical parameters measuring the extent of ILT, two parameters quantifying the constitution of ILT, antihypertensive medication, and the presence of co-existing coronary artery disease. The predictive model achieved an AUROC of 0.89 and a total accuracy of 83%. When ILT was not considered, our prediction's AUROC decreased to 0.75 (P-value < 0.001).
Collapse
Affiliation(s)
- Mostafa Rezaeitaleshmahalleh
- Department of Biomedical Engineering, Michigan Technological University, 1400 Townsend Drive, Houghton, MI, USA
- Joint Center for Biocomputing and Digital Health, Health Research Institute and Institute of Computing and Cybernetics, Michigan Technological University, Houghton, MI, USA
| | - Nan Mu
- Department of Biomedical Engineering, Michigan Technological University, 1400 Townsend Drive, Houghton, MI, USA
- Joint Center for Biocomputing and Digital Health, Health Research Institute and Institute of Computing and Cybernetics, Michigan Technological University, Houghton, MI, USA
| | - Zonghan Lyu
- Department of Biomedical Engineering, Michigan Technological University, 1400 Townsend Drive, Houghton, MI, USA
- Joint Center for Biocomputing and Digital Health, Health Research Institute and Institute of Computing and Cybernetics, Michigan Technological University, Houghton, MI, USA
| | - Weihua Zhou
- Department of Applied Computing, Michigan Technological University, Houghton, MI, USA
| | - Xiaoming Zhang
- Department of Radiology, Mayo Clinic, Rochester, MN, USA
| | - Todd E Rasmussen
- Division of Vascular and Endovascular Surgery, Mayo Clinic, Rochester, MN, USA
| | - Robert D McBane
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Jingfeng Jiang
- Department of Biomedical Engineering, Michigan Technological University, 1400 Townsend Drive, Houghton, MI, USA.
- Joint Center for Biocomputing and Digital Health, Health Research Institute and Institute of Computing and Cybernetics, Michigan Technological University, Houghton, MI, USA.
- Department of Radiology, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
5
|
Stalter N, Ma S, Simon G, Pruinelli L. Psychosocial problems and high amount of opioid administration are associated with opioid dependence and abuse after first exposure for chronic pain patients. Addict Behav 2023; 141:107657. [PMID: 36796176 DOI: 10.1016/j.addbeh.2023.107657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 12/29/2022] [Accepted: 02/06/2023] [Indexed: 02/11/2023]
Abstract
Controversy surrounding the use of opioids for the treatment and the unique characteristics of chronic pain heighten the risks for abuse and dependence; however, it's unclear if higher doses of opioids and first exposure are associated with dependence and abuse. This study aimed to identify patients who developed dependence or opioid abuse after exposed to opioids for the first time and what were the risks factors associated with the outcome. A retrospective observational cohort study analyzed 2,411 patients between 2011 and 2017 who had a diagnosis of chronic pain and received opioids for the first time. A logistic regression model was used to estimate the likelihood of opioid dependence/abuse after the first exposure based on their mental health conditions, prior substance abuse disorders, demographics, and the amount of MME per day patients received. From 2,411 patients, 5.5 % of the patients had a diagnosis of dependence or abuse after the first exposure. Patients who were depressed (OR = 2.09), previous non-opioid substance dependence or abuse (OR = 1.59) or received greater than 50 MME per day (OR = 1.03) showed statistically significant relationship with developing opioid dependence or abuse, while age (OR = -1.03) showed to be a protective factor. Further studies should stratify chronic pain patients into groups who is in higher risk in developing opioid dependence or abuse and develop alternative strategies for pain management and treatments beyond opioids. This study reinforces the psychosocial problems as determinants of opioid dependence or abuse and risk factors, and the need for safer opioid prescribing practices.
Collapse
Affiliation(s)
- Nicholas Stalter
- Department of Computer Sciences and Engineering, University of Minnesota, Minneapolis, MN, United States
| | - Sisi Ma
- School of Medicine and Institute for Health Informatics, University of Minnesota, Minneapolis, MN, United States
| | - Gyorgy Simon
- School of Medicine and Institute for Health Informatics, University of Minnesota, Minneapolis, MN, United States
| | - Lisiane Pruinelli
- College of Nursing, University of Florida, Gainesville, FL, United States.
| |
Collapse
|
6
|
Chen VL, Oliveri A, Miller MJ, Wijarnpreecha K, Du X, Chen Y, Cushing KC, Lok AS, Speliotes EK. PNPLA3 Genotype and Diabetes Identify Patients With Nonalcoholic Fatty Liver Disease at High Risk of Incident Cirrhosis. Gastroenterology 2023; 164:966-977.e17. [PMID: 36758837 PMCID: PMC10550206 DOI: 10.1053/j.gastro.2023.01.040] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 01/08/2023] [Accepted: 01/29/2023] [Indexed: 02/11/2023]
Abstract
BACKGROUND & AIMS Non-alcoholic fatty liver disease (NAFLD) can progress to cirrhosis and hepatic decompensation, but whether genetic variants influence the rate of progression to cirrhosis or are useful in risk stratification among patients with NAFLD is uncertain. METHODS We included participants from 2 independent cohorts, they Michigan Genomics Initiative (MGI) and UK Biobank (UKBB), who had NAFLD defined by elevated alanine aminotransferase (ALT) levels in the absence of alternative chronic liver disease. The primary predictors were genetic variants and metabolic comorbidities associated with cirrhosis. We conducted time-to-event analyses using Fine-Gray competing risk models. RESULTS We included 7893 and 46,880 participants from MGI and UKBB, respectively. In univariable analysis, PNPLA3-rs738409-GG genotype, diabetes, obesity, and ALT of ≥2× upper limit of normal were associated with higher incidence rate of cirrhosis in both MGI and UKBB. PNPLA3-rs738409-GG had additive effects with clinical risk factors including diabetes, obesity, and ALT elevations. Among patients with indeterminate fibrosis-4 (FIB4) scores (1.3-2.67), those with diabetes and PNPLA3-rs738409-GG genotype had an incidence rate of cirrhosis comparable to that of patients with high-risk FIB4 scores (>2.67) and 2.9-4.8 times that of patients with diabetes but CC/CG genotypes. In contrast, FIB4 <1.3 was associated with an incidence rate of cirrhosis significantly lower than that of FIB4 of >2.67, even in the presence of clinical risk factors and high-risk PNPLA3 genotype. CONCLUSIONS PNPLA3-rs738409 genotype and diabetes identified patients with NAFLD currently considered indeterminate risk (FIB4 1.3-2.67) who had a similar risk of cirrhosis as those considered high-risk (FIB4 >2.67). PNPLA3 genotyping may improve prognostication and allow for prioritization of intensive intervention.
Collapse
Affiliation(s)
- Vincent L Chen
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan.
| | - Antonino Oliveri
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Matthew J Miller
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Karn Wijarnpreecha
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Xiaomeng Du
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Yanhua Chen
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Kelly C Cushing
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Anna S Lok
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Elizabeth K Speliotes
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
7
|
Price BS, Saldanha JP, Drake D, Kopp K. Lessons from West Virginia's Pandemic Response. J Comput Graph Stat 2022; 32:763-764. [PMID: 37790240 PMCID: PMC10545325 DOI: 10.1080/10618600.2022.2126481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 05/31/2022] [Indexed: 10/14/2022]
Abstract
In this editorial discussion we describe our experience developing and implementing predictive models during the pandemic response in the state of West Virginia. We provide insights the on the importance of communication and the dynamic environment that exists that impacts predictive modeling in situations such as those that we faced. It is our hope that this work brings insight to those who may experience similar challenges while working in public health policy.
Collapse
Affiliation(s)
- Bradley S Price
- John Chambers College of Business and Economics, West Virginia University
| | - John P Saldanha
- John Chambers College of Business and Economics, West Virginia University
| | - Dariane Drake
- John Chambers College of Business and Economics, West Virginia University
| | - Katherine Kopp
- John Chambers College of Business and Economics, West Virginia University
| |
Collapse
|
8
|
Fujarski M, Porschen C, Plagwitz L, Brenner A, Ghoreishi N, Thoral P, de Grooth HJ, Elbers P, Weiss R, Meersch M, Zarbock A, von Groote TC, Varghese J. Prediction of Acute Kidney Injury in the Intensive Care Unit: Preliminary Findings in a European Open Access Database. Stud Health Technol Inform 2022; 294:139-140. [PMID: 35612039 DOI: 10.3233/shti220419] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Acute kidney injury (AKI) is a common complication in critically ill patients and is associated with long-term complications and an increased mortality. This work presents preliminary findings from the first freely available European intensive care database released by Amsterdam UMC. A machine learning (ML) model was developed to predict AKI in the intensive care unit 12 hours before the actual event. Main features of the model included medications and hemodynamic parameters. Our models perform with an accuracy of 81.8% on moderate to severe AKI and 79.8% on all AKI patients. Those results can compete with models reported in the literature and introduce an ML model for AKI based on European patient data.
Collapse
Affiliation(s)
- Michael Fujarski
- Institute of Medical Informatics, University of Münster, Germany
| | - Christian Porschen
- Department of Anesthesiology, Intensive Care and Pain Medicine, University Hospital Münster, Münster, Germany
| | - Lucas Plagwitz
- Institute of Medical Informatics, University of Münster, Germany
| | | | | | - Patrick Thoral
- Department of Intensive Care Medicine, Laboratory for Critical Care Computational Intelligence. Vrije Universiteit, Amsterdam, The Netherlands
| | - Harm-Jan de Grooth
- Department of Intensive Care Medicine, Laboratory for Critical Care Computational Intelligence. Vrije Universiteit, Amsterdam, The Netherlands
| | - Paul Elbers
- Department of Intensive Care Medicine, Laboratory for Critical Care Computational Intelligence. Vrije Universiteit, Amsterdam, The Netherlands
| | - Raphael Weiss
- Department of Anesthesiology, Intensive Care and Pain Medicine, University Hospital Münster, Münster, Germany
| | - Melanie Meersch
- Department of Anesthesiology, Intensive Care and Pain Medicine, University Hospital Münster, Münster, Germany
| | - Alexander Zarbock
- Department of Anesthesiology, Intensive Care and Pain Medicine, University Hospital Münster, Münster, Germany
| | - Thilo Caspar von Groote
- Department of Anesthesiology, Intensive Care and Pain Medicine, University Hospital Münster, Münster, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, Germany
| |
Collapse
|
9
|
Kuo TT, Pham A. Detecting model misconducts in decentralized healthcare federated learning. Int J Med Inform 2021; 158:104658. [PMID: 34923447 PMCID: PMC10017272 DOI: 10.1016/j.ijmedinf.2021.104658] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 11/23/2021] [Accepted: 12/05/2021] [Indexed: 10/19/2022]
Abstract
BACKGROUND To accelerate healthcare/genomic medicine research and facilitate quality improvement, researchers have started cross-institutional collaborations to use artificial intelligence on clinical/genomic data. However, there are real-world risks of incorrect models being submitted to the learning process, due to either unforeseen accidents or malicious intent. This may reduce the incentives for institutions to participate in the federated modeling consortium. Existing methods to deal with this "model misconduct" issue mainly focus on modifying the learning methods, and therefore are more specifically tied with the algorithm. BASIC PROCEDURES In this paper, we aim at solving the problem in an algorithm-agnostic way by (1) designing a simulator to generate various types of model misconduct, (2) developing a framework to detect the model misconducts, and (3) providing a generalizable approach to identify model misconducts for federated learning. We considered the following three categories: Plagiarism, Fabrication, and Falsification, and then developed a detection framework with three components: Auditing, Coefficient, and Performance detectors, with greedy parameter tuning. MAIN FINDINGS We generated 10 types of misconducts from models learned on three datasets to evaluate our detection method. Our experiments showed high recall with low added computational cost. Our proposed detection method can best identify the misconduct on specific sites from any learning iteration, whereas it is more challenging to precisely detect misconducts for a specific site and at a specific iteration. PRINCIPAL CONCLUSIONS We anticipate our study can support the enhancement of the integrity and reliability of federated machine learning on genomic/healthcare data.
Collapse
Affiliation(s)
- Tsung-Ting Kuo
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA.
| | - Anh Pham
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
10
|
Usher MG, Tourani R, Simon G, Tignanelli C, Jarabek B, Strauss CE, Waring SC, Klyn NAM, Kealey BT, Tambyraja R, Pandita D, Baum KD. Overcoming gaps: regional collaborative to optimize capacity management and predict length of stay of patients admitted with COVID-19. JAMIA Open 2021; 4:ooab055. [PMID: 34350391 PMCID: PMC8327377 DOI: 10.1093/jamiaopen/ooab055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 05/12/2021] [Accepted: 07/06/2021] [Indexed: 11/21/2022] Open
Abstract
Objective Ensuring an efficient response to COVID-19 requires a degree of inter-system coordination and capacity management coupled with an accurate assessment of hospital utilization including length of stay (LOS). We aimed to establish optimal practices in inter-system data sharing and LOS modeling to support patient care and regional hospital operations. Materials and Methods We completed a retrospective observational study of patients admitted with COVID-19 followed by 12-week prospective validation, involving 36 hospitals covering the upper Midwest. We developed a method for sharing de-identified patient data across systems for analysis. From this, we compared 3 approaches, generalized linear model (GLM) and random forest (RF), and aggregated system level averages to identify features associated with LOS. We compared model performance by area under the ROC curve (AUROC). Results A total of 2068 patients were included and used for model derivation and 597 patients for validation. LOS overall had a median of 5.0 days and mean of 8.2 days. Consistent predictors of LOS included age, critical illness, oxygen requirement, weight loss, and nursing home admission. In the validation cohort, the RF model (AUROC 0.890) and GLM model (AUROC 0.864) achieved good to excellent prediction of LOS, but only marginally better than system averages in practice. Conclusion Regional sharing of patient data allowed for effective prediction of LOS across systems; however, this only provided marginal improvement over hospital averages at the aggregate level. A federated approach of sharing aggregated system capacity and average LOS will likely allow for effective capacity management at the regional level.
Collapse
Affiliation(s)
- Michael G Usher
- Division of General Internal Medicine, Department of Medicine, University of Minnesota Medical School, Minneapolis, Minnesota, USA
| | - Roshan Tourani
- Department of Medicine, Institute for Health Informatics, University of Minnesota Medical School, Minneapolis, Minnesota, USA
| | - Gyorgy Simon
- Department of Medicine, Institute for Health Informatics, University of Minnesota Medical School, Minneapolis, Minnesota, USA
| | - Christopher Tignanelli
- Department of Medicine, Institute for Health Informatics, University of Minnesota Medical School, Minneapolis, Minnesota, USA.,Division of Acute Care Surgery, Department of Surgery, University of Minnesota Medical School, Minneapolis, Minnesota, USA
| | - Bryan Jarabek
- Department of Informatics, M Health Fairview, Minneapolis, Minnesota, USA
| | - Craig E Strauss
- Minneapolis Heart Institute Center for Healthcare Delivery Innovation, Minneapolis Heart Institute, Allina Health, Minneapolis, Minnesota, USA
| | - Stephen C Waring
- Essentia Institute of Rural Health, Essential Health, Duluth, Minnesota, USA
| | - Niall A M Klyn
- Information Services, Essentia Health, Duluth, Minnesota, USA
| | - Burke T Kealey
- Internal Medicine, HealthPartners, St. Paul, Minnesota, USA
| | - Rabindra Tambyraja
- Children's Hospitals and Clinics of Minnesota, Minneapolis, Minnesota, USA
| | - Deepti Pandita
- Department of Medicine, Hennepin Healthcare, Minneapolis, Minnesota, USA
| | - Karyn D Baum
- Division of General Internal Medicine, Department of Medicine, University of Minnesota Medical School, Minneapolis, Minnesota, USA
| |
Collapse
|
11
|
Abstract
In recent years, mass spectrometry (MS)-based metabolomics has been extensively applied to characterize biochemical mechanisms, and study physiological processes and phenotypic changes associated with disease. Metabolomics has also been important for identifying biomarkers of interest suitable for clinical diagnosis. For the purpose of predictive modeling, in this chapter, we will review various supervised learning algorithms such as random forest (RF), support vector machine (SVM), and partial least squares-discriminant analysis (PLS-DA). In addition, we will also review feature selection methods for identifying the best combination of metabolites for an accurate predictive model. We conclude with best practices for reproducibility by including internal and external replication, reporting metrics to assess performance, and providing guidelines to avoid overfitting and to deal with imbalanced classes. An analysis of an example data will illustrate the use of different machine learning methods and performance metrics.
Collapse
Affiliation(s)
- Tusharkanti Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Weiming Zhang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| |
Collapse
|
12
|
Derevitskii IV, Kovalchuk SV. Machine Learning-Based Factor Analysis of Carbohydrate Metabolism Compensation for TDM2 Patients. Stud Health Technol Inform 2020; 273:123-128. [PMID: 33087601 DOI: 10.3233/shti200626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Type 2 diabetes is one of the most common chronic diseases in the world. World Diabetes Federation experts predict that the diabetes patients' number by 2035 will increase by 205 million to reach 592 million. For health care, this diabetes type is one of the highest priority problems. This disease is associated with many concomitant diseases leading to early disability and high cardiovascular risk. A severity disease indicator is the degree of carbohydrate metabolism compensation. Decompensated and subcompensated carbohydrate metabolism patients have increased cardiovascular risks. Therefore, it is important to be able to select the right therapy to control carbohydrate metabolism. In this study, we propose a new method for selecting the optimal therapy automatically. The method includes creating personal optimal therapies. This kind of therapy has the highest probability of compensating carbohydrate metabolism for a patient within a six-month. The method includes models for predicting the results of different therapies. It is based on data from the previous medical history and current medical indicators of patients. This method provides high-quality predictions and medical recommendations. Therefore, medical professionals can use this method as part of the Support and Decision-Making Systems for working with T2DM patients.
Collapse
|
13
|
Coleman BC, Fodeh S, Lisi AJ, Goulet JL, Corcoran KL, Bathulapalli H, Brandt CA. Exploring supervised machine learning approaches to predicting Veterans Health Administration chiropractic service utilization. Chiropr Man Therap 2020; 28:47. [PMID: 32680545 PMCID: PMC7368704 DOI: 10.1186/s12998-020-00335-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 07/02/2020] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Chronic spinal pain conditions affect millions of US adults and carry a high healthcare cost burden, both direct and indirect. Conservative interventions for spinal pain conditions, including chiropractic care, have been associated with lower healthcare costs and improvements in pain status in different clinical populations, including veterans. Little is currently known about predicting healthcare service utilization in the domain of conservative interventions for spinal pain conditions, including the frequency of use of chiropractic services. The purpose of this retrospective cohort study was to explore the use of supervised machine learning approaches to predicting one-year chiropractic service utilization by veterans receiving VA chiropractic care. METHODS We included 19,946 veterans who entered the Musculoskeletal Diagnosis Cohort between October 1, 2003 and September 30, 2013 and utilized VA chiropractic services within one year of cohort entry. The primary outcome was one-year chiropractic service utilization following index chiropractic visit, split into quartiles represented by the following classes: 1 visit, 2 to 3 visits, 4 to 6 visits, and 7 or greater visits. We compared the performance of four multiclass classification algorithms (gradient boosted classifier, stochastic gradient descent classifier, support vector classifier, and artificial neural network) in predicting visit quartile using 158 sociodemographic and clinical features. RESULTS The selected algorithms demonstrated poor prediction capabilities. Subset accuracy was 42.1% for the gradient boosted classifier, 38.6% for the stochastic gradient descent classifier, 41.4% for the support vector classifier, and 40.3% for the artificial neural network. The micro-averaged area under the precision-recall curve for each one-versus-rest classifier was 0.43 for the gradient boosted classifier, 0.38 for the stochastic gradient descent classifier, 0.43 for the support vector classifier, and 0.42 for the artificial neural network. Performance of each model yielded only a small positive shift in prediction probability (approximately 15%) compared to naïve classification. CONCLUSIONS Using supervised machine learning to predict chiropractic service utilization remains challenging, with only a small shift in predictive probability over naïve classification and limited clinical utility. Future work should examine mechanisms to improve model performance.
Collapse
Affiliation(s)
- Brian C Coleman
- Pain Research, Informatics, Multimorbidities, and Education (PRIME) Center, VA Connecticut Healthcare System, 11-ACSL-G, 950 Campbell Avenue, West Haven, CT, 06516, USA.
- Yale School of Medicine, Yale University, New Haven, CT, USA.
| | - Samah Fodeh
- Pain Research, Informatics, Multimorbidities, and Education (PRIME) Center, VA Connecticut Healthcare System, 11-ACSL-G, 950 Campbell Avenue, West Haven, CT, 06516, USA
- Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Anthony J Lisi
- Pain Research, Informatics, Multimorbidities, and Education (PRIME) Center, VA Connecticut Healthcare System, 11-ACSL-G, 950 Campbell Avenue, West Haven, CT, 06516, USA
- Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Joseph L Goulet
- Pain Research, Informatics, Multimorbidities, and Education (PRIME) Center, VA Connecticut Healthcare System, 11-ACSL-G, 950 Campbell Avenue, West Haven, CT, 06516, USA
- Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Kelsey L Corcoran
- Pain Research, Informatics, Multimorbidities, and Education (PRIME) Center, VA Connecticut Healthcare System, 11-ACSL-G, 950 Campbell Avenue, West Haven, CT, 06516, USA
- Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Harini Bathulapalli
- Pain Research, Informatics, Multimorbidities, and Education (PRIME) Center, VA Connecticut Healthcare System, 11-ACSL-G, 950 Campbell Avenue, West Haven, CT, 06516, USA
- Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Cynthia A Brandt
- Pain Research, Informatics, Multimorbidities, and Education (PRIME) Center, VA Connecticut Healthcare System, 11-ACSL-G, 950 Campbell Avenue, West Haven, CT, 06516, USA
- Yale School of Medicine, Yale University, New Haven, CT, USA
| |
Collapse
|
14
|
Zimmet AM, Sullivan BA, Moorman JR, Lake DE, Ratcliffe SJ. Trajectories of the heart rate characteristics index, a physiomarker of sepsis in premature infants, predict Neonatal ICU mortality. JRSM Cardiovasc Dis 2020; 9:2048004020945142. [PMID: 33240492 PMCID: PMC7675854 DOI: 10.1177/2048004020945142] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 06/25/2020] [Accepted: 07/02/2020] [Indexed: 11/16/2022] Open
Abstract
OBJECTIVE Trajectories of physiomarkers over time can be useful to define phenotypes of disease progression and as predictors of clinical outcomes. The aim of this study was to identify phenotypes of the time course of late-onset sepsis in premature infants in Neonatal Intensive Care Units. METHODS We examined the trajectories of a validated continuous physiomarker, abnormal heart rate characteristics, using functional data analysis and clustering techniques. PARTICIPANTS We analyzed continuous heart rate characteristics data from 2989 very low birth weight infants (<1500 grams) from nine NICUs from 2004-2010. RESULT Despite the relative homogeneity of the patients, we found extreme variability in the physiomarker trajectories. We identified phenotypes that were indicative of seven and 30 day mortality beyond that predicted by individual heart rate characteristics values or baseline demographic information. CONCLUSION Time courses of a heart rate characteristics physiomarker reveal snapshots of illness patterns, some of which were more deadly than others.
Collapse
|
15
|
Tayob N, Christie I, Richardson P, Feng Z, White DL, Davila J, Corley DA, Kanwal F, El-Serag HB. Validation of the Hepatocellular Carcinoma Early Detection Screening (HES) Algorithm in a Cohort of Veterans With Cirrhosis. Clin Gastroenterol Hepatol 2019; 17:1886-1893.e5. [PMID: 30557738 DOI: 10.1016/j.cgh.2018.12.005] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 11/06/2018] [Accepted: 12/10/2018] [Indexed: 02/07/2023]
Abstract
BACKGROUND & AIMS Early detection of hepatocellular carcinoma (HCC) through surveillance reduces mortality associated with this cancer. Guidelines recommend HCC surveillance every 6 months for patients with cirrhosis, via ultrasonography, with or without measurement of serum level of alpha fetoprotein (AFP). METHODS We previously developed and internally validated an HCC early detection screening (HES) algorithm that included patient's current level of AFP, rate of AFP change, age, level of alanine aminotransferase, and platelet count in a department of Veterans affairs (VA) cohort with active hepatitis C virus-related cirrhosis. HES score was associated with 3.84% absolute improvement in sensitivity of detection of HCC compared with AFP alone, at 90% specificity, within 6 months prior to diagnosis of this cancer. We externally validated the HES algorithm in a cohort of 38,431 patients with cirrhosis of any etiology evaluated at a VA medical center from 2010 through 2015. RESULTS A total of 4804 cases of HCC developed during a median follow-up time of 3.12 years. At 90% specificity, the HES algorithm identified patients with HCC with 52.56% sensitivity, compared to 48.13% sensitivity for the AFP assay alone, within 6 months prior to diagnosis; this was an absolute improvement of 4.43% (P < .0005). In HCC screening, a positive result leads to follow-up evaluation by computed tomography or magnetic resonance imaging. We estimated that the number of HCC cases detected per 1000 imaging analyses was 198.57 for the HES algorithm vs 185.52 for the AFP assay alone, or detection of 13 additional cases of HCC (P < .0005). CONCLUSION We validated the HES algorithm in detection of HCC in patients with cirrhosis of any etiology evaluated at VA medical centers. The algorithm offers a modest but useful advantage over AFP alone in HCC surveillance.
Collapse
|
16
|
Song X, Waitman LR, Hu Y, Yu ASL, Robbins D, Liu M. An exploration of ontology-based EMR data abstraction for diabetic kidney disease prediction. AMIA Jt Summits Transl Sci Proc 2019; 2019:704-713. [PMID: 31259027 PMCID: PMC6568123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Diabetic Kidney Disease (DKD) is a critical and morbid complication of diabetes and the leading cause of chronic kidney disease in the developed world. Electronic medical records (EMRs) hold promise for supporting clinical decision-making with its nationwide adoption as well as rich information characterizing patients' health care experience. However, few retrospective studies have fully utilized the EMR data to model DKD risk. This study examines the effectiveness of an unbiased data driven approach in identifying potential DKD patients in 6 months prior to onset by utilizing EMR on a broader spectrum. Meanwhile, we evaluate how different levels of data granularity of Medications and Diagnoses observations would affect prediction performance and knowledge discovery. The experimental results suggest that different data granularity may not necessarily influence the prediction accuracy, but it would dramatically change the internal structure of the predictive models.
Collapse
Affiliation(s)
- Xing Song
- University of Kansas Medical Center, Department of Internal Medicine, Division of Medical Informatics, Kansas City, KS, USA
| | - Lemuel R Waitman
- University of Kansas Medical Center, Department of Internal Medicine, Division of Medical Informatics, Kansas City, KS, USA
| | - Yong Hu
- Jinan University, Big Data Decision Institute, Guangzhou, PRC
| | - Alan S L Yu
- University of Kansas Medical Center, Division of Nephrology and Hypertension and the Kidney Institute, Kansas City, KS, USA
| | - David Robbins
- University of Kansas Medical Center, Diabetes Institute, Kansas City, KS, USA
| | - Mei Liu
- University of Kansas Medical Center, Department of Internal Medicine, Division of Medical Informatics, Kansas City, KS, USA
| |
Collapse
|
17
|
Cronin PR, Greenwald JL, Crevensten GC, Chueh HC, Zai AH. Development and implementation of a real-time 30-day readmission predictive model. AMIA Annu Symp Proc 2014; 2014:424-431. [PMID: 25954346 PMCID: PMC4419988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Hospitals are under great pressure to reduce readmissions of patients. Being able to reliably predict patients at increased risk for rehospitalization would allow for tailored interventions to be offered to them. This requires the creation of a functional predictive model specifically designed to support real-time clinical operations. A predictive model for readmissions within 30 days of discharge was developed using retrospective data from 45,924 MGH admissions between 2/1/2012 and 1/31/2013 only including factors that would be available by the day after admission. It was then validated prospectively in a real-time implementation for 3,074 MGH admissions between 10/1/2013 and 10/31/2013. The model developed retrospectively had an AUC of 0.705 with good calibration. The real-time implementation had an AUC of 0.671 although the model was overestimating readmission risk. A moderately discriminative real-time 30-day readmission predictive model can be developed and implemented in a large academic hospital.
Collapse
Affiliation(s)
- Patrick R Cronin
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA
| | | | | | - Henry C Chueh
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA
| | - Adrian H Zai
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA
| |
Collapse
|