1
|
Beam K, Wang C, Beam A, Clark R, Tolia V, Ahmad K. National Needs Assessment of Utilization of Common Newborn Clinical Decision Support Tools. Am J Perinatol 2024; 41:e1982-e1988. [PMID: 37207674 DOI: 10.1055/a-2096-2168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
OBJECTIVE Clinical decision support tools (CDSTs) are common in neonatology, but utilization is rarely examined. We examined the utilization of four CDSTs in newborn care. STUDY DESIGN A 72-field needs assessment was developed. It was distributed to listservs encompassing trainees, nurse practitioners, hospitalists, and attendings. At the conclusion of data collection, responses were downloaded and analyzed. RESULTS We received 339 fully completed questionnaires. BiliTool and the Early-Onset Sepsis (EOS) tool were used by > 90% of respondents, the Bronchopulmonary Dysplasia tool by 39%, and the Extremely Preterm Birth tool by 72%. Common reasons CDSTs did not impact clinical care included lack of electronic health record integration, lack of confidence in prediction accuracy, and unhelpful predictions. CONCLUSION From a national sample of neonatal care providers, there is frequent but variable use of four CDSTs. Understanding the factors that contribute to tool utility is vital prior to development and implementation. KEY POINTS · Clinical decision support tools are common in medicine.. · There is a varied use of neonatal CDST.. · Understanding the use of CDST is vital for future development..
Collapse
Affiliation(s)
- Kristyn Beam
- Department of Neonatology, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| | - Cindy Wang
- Department of Statistics, Harvard University, Cambridge, Massachusetts
| | - Andrew Beam
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Reese Clark
- The Pediatrix Center for Research, Education, Quality and Safety, Sunrise, Florida
| | - Veeral Tolia
- The Pediatrix Center for Research, Education, Quality and Safety, Sunrise, Florida
- Department of Pediatrics, Baylor University Medical Center, Dallas, Texas
| | - Kaashif Ahmad
- The Pediatrix Center for Research, Education, Quality and Safety, Sunrise, Florida
- Department of Pediatrics, The Woman's Hospital of Texas, Houston, Texas
| |
Collapse
|
2
|
Imrie F, Cebere B, McKinney EF, van der Schaar M. AutoPrognosis 2.0: Democratizing diagnostic and prognostic modeling in healthcare with automated machine learning. PLOS DIGITAL HEALTH 2023; 2:e0000276. [PMID: 37347752 DOI: 10.1371/journal.pdig.0000276] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 05/17/2023] [Indexed: 06/24/2023]
Abstract
Diagnostic and prognostic models are increasingly important in medicine and inform many clinical decisions. Recently, machine learning approaches have shown improvement over conventional modeling techniques by better capturing complex interactions between patient covariates in a data-driven manner. However, the use of machine learning introduces technical and practical challenges that have thus far restricted widespread adoption of such techniques in clinical settings. To address these challenges and empower healthcare professionals, we present an open-source machine learning framework, AutoPrognosis 2.0, to facilitate the development of diagnostic and prognostic models. AutoPrognosis leverages state-of-the-art advances in automated machine learning to develop optimized machine learning pipelines, incorporates model explainability tools, and enables deployment of clinical demonstrators, without requiring significant technical expertise. To demonstrate AutoPrognosis 2.0, we provide an illustrative application where we construct a prognostic risk score for diabetes using the UK Biobank, a prospective study of 502,467 individuals. The models produced by our automated framework achieve greater discrimination for diabetes than expert clinical risk scores. We have implemented our risk score as a web-based decision support tool, which can be publicly accessed by patients and clinicians. By open-sourcing our framework as a tool for the community, we aim to provide clinicians and other medical practitioners with an accessible resource to develop new risk scores, personalized diagnostics, and prognostics using machine learning techniques. Software: https://github.com/vanderschaarlab/AutoPrognosis.
Collapse
Affiliation(s)
- Fergus Imrie
- Department of Electrical and Computer Engineering, University of California, Los Angeles, California, United States of America
| | - Bogdan Cebere
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, United Kingdom
| | - Eoin F McKinney
- Department of Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Mihaela van der Schaar
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, United Kingdom
- The Alan Turing Institute, London, United Kingdom
| |
Collapse
|
3
|
A machine learning analysis of correlates of mortality among patients hospitalized with COVID-19. Sci Rep 2023; 13:4080. [PMID: 36906638 PMCID: PMC10007654 DOI: 10.1038/s41598-023-31251-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 03/08/2023] [Indexed: 03/13/2023] Open
Abstract
It is vital to determine how patient characteristics that precede COVID-19 illness relate to COVID-19 mortality. This is a retrospective cohort study of patients hospitalized with COVID-19 across 21 healthcare systems in the US. All patients (N = 145,944) had COVID-19 diagnoses and/or positive PCR tests and completed their hospital stays from February 1, 2020 through January 31, 2022. Machine learning analyses revealed that age, hypertension, insurance status, and healthcare system (hospital site) were especially predictive of mortality across the full sample. However, multiple variables were especially predictive in subgroups of patients. The nested effects of risk factors such as age, hypertension, vaccination, site, and race accounted for large differences in mortality likelihood with rates ranging from about 2-30%. Subgroups of patients are at heightened risk of COVID-19 mortality due to combinations of preadmission risk factors; a finding of potential relevance to outreach and preventive actions.
Collapse
|
4
|
Tschoellitsch T, Krummenacker S, Dünser MW, Stöger R, Meier J. The Value of the First Clinical Impression as Assessed by 18 Observations in Patients Presenting to the Emergency Department. J Clin Med 2023; 12:jcm12020724. [PMID: 36675651 PMCID: PMC9862625 DOI: 10.3390/jcm12020724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/11/2023] [Accepted: 01/13/2023] [Indexed: 01/18/2023] Open
Abstract
The first clinical impression of emergency patients conveys a myriad of information that has been incompletely elucidated. In this prospective, observational study, the value of the first clinical impression, assessed by 18 observations, to predict the need for timely medical attention, the need for hospital admission, and in-hospital mortality in 1506 adult patients presenting to the triage desk of an emergency department was determined. Machine learning models were used for statistical analysis. The first clinical impression could predict the need for timely medical attention [area under the receiver operating characteristic curve (AUC ROC), 0.73; p = 0.01] and hospital admission (AUC ROC, 0.8; p = 0.004), but not in-hospital mortality (AUC ROC, 0.72; p = 0.13). The five most important features informing the prediction models were age, ability to walk, admission by emergency medical services, lying on a stretcher, breathing pattern, and bringing a suitcase. The inability to walk at triage presentation was highly predictive of both the need for timely medical attention (p < 0.001) and the need for hospital admission (p < 0.001). In conclusion, the first clinical impression of emergency patients presenting to the triage desk can predict the need for timely medical attention and hospital admission. Important components of the first clinical impression were identified.
Collapse
Affiliation(s)
- Thomas Tschoellitsch
- Department of Anesthesiology and Critical Care Medicine, Kepler University Hospital, Johannes Kepler University Linz, 4020 Linz, Austria
| | - Stefan Krummenacker
- Kepler University Hospital, Johannes Kepler University Linz, 4020 Linz, Austria
| | - Martin W. Dünser
- Department of Anesthesiology and Critical Care Medicine, Kepler University Hospital, Johannes Kepler University Linz, 4020 Linz, Austria
| | - Roland Stöger
- Praxis für Allgemein- und Familienmedizin, 4262 Leopoldschlag, Austria
| | - Jens Meier
- Department of Anesthesiology and Critical Care Medicine, Kepler University Hospital, Johannes Kepler University Linz, 4020 Linz, Austria
- Correspondence:
| |
Collapse
|
5
|
Randall JR, DuPai CD, Cole TJ, Davidson G, Groover KE, Slater SL, Mavridou DA, Wilke CO, Davies BW. Designing and identifying β-hairpin peptide macrocycles with antibiotic potential. SCIENCE ADVANCES 2023; 9:eade0008. [PMID: 36630516 PMCID: PMC9833666 DOI: 10.1126/sciadv.ade0008] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 12/09/2022] [Indexed: 06/17/2023]
Abstract
Peptide macrocycles are a rapidly emerging class of therapeutic, yet the design of their structure and activity remains challenging. This is especially true for those with β-hairpin structure due to weak folding properties and a propensity for aggregation. Here, we use proteomic analysis and common antimicrobial features to design a large peptide library with macrocyclic β-hairpin structure. Using an activity-driven high-throughput screen, we identify dozens of peptides killing bacteria through selective membrane disruption and analyze their biochemical features via machine learning. Active peptides contain a unique constrained structure and are highly enriched for cationic charge with arginine in their turn region. Our results provide a synthetic strategy for structured macrocyclic peptide design and discovery while also elucidating characteristics important for β-hairpin antimicrobial peptide activity.
Collapse
Affiliation(s)
- Justin R. Randall
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Cory D. DuPai
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
| | - T. Jeffrey Cole
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
| | - Gillian Davidson
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Kyra E. Groover
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Sabrina L. Slater
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | | | - Claus O. Wilke
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
| | - Bryan W. Davies
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
6
|
Vagliano I, Schut MC, Abu-Hanna A, Dongelmans DA, de Lange DW, Gommers D, Cremer OL, Bosman RJ, Rigter S, Wils EJ, Frenzel T, de Jong R, Peters MAA, Kamps MJA, Ramnarain D, Nowitzky R, Nooteboom FGCA, de Ruijter W, Urlings-Strop LC, Smit EGM, Mehagnoul-Schipper DJ, Dormans T, de Jager CPC, Hendriks SHA, Achterberg S, Oostdijk E, Reidinga AC, Festen-Spanjer B, Brunnekreef GB, Cornet AD, van den Tempel W, Boelens AD, Koetsier P, Lens J, Faber HJ, Karakus A, Entjes R, de Jong P, Rettig TCD, Reuland MC, Arbous S, Fleuren LM, Dam TA, Thoral PJ, Lalisang RCA, Tonutti M, de Bruin DP, Elbers PWG, de Keizer NF. Assess and validate predictive performance of models for in-hospital mortality in COVID-19 patients: A retrospective cohort study in the Netherlands comparing the value of registry data with high-granular electronic health records. Int J Med Inform 2022; 167:104863. [PMID: 36162166 PMCID: PMC9492397 DOI: 10.1016/j.ijmedinf.2022.104863] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 08/19/2022] [Accepted: 09/03/2022] [Indexed: 11/17/2022]
Abstract
PURPOSE To assess, validate and compare the predictive performance of models for in-hospital mortality of COVID-19 patients admitted to the intensive care unit (ICU) over two different waves of infections. Our models were built with high-granular Electronic Health Records (EHR) data versus less-granular registry data. METHODS Observational study of all COVID-19 patients admitted to 19 Dutch ICUs participating in both the national quality registry National Intensive Care Evaluation (NICE) and the EHR-based Dutch Data Warehouse (hereafter EHR). Multiple models were developed on data from the first 24 h of ICU admissions from February to June 2020 (first COVID-19 wave) and validated on prospective patients admitted to the same ICUs between July and December 2020 (second COVID-19 wave). We assessed model discrimination, calibration, and the degree of relatedness between development and validation population. Coefficients were used to identify relevant risk factors. RESULTS A total of 1533 patients from the EHR and 1563 from the registry were included. With high granular EHR data, the average AUROC was 0.69 (standard deviation of 0.05) for the internal validation, and the AUROC was 0.75 for the temporal validation. The registry model achieved an average AUROC of 0.76 (standard deviation of 0.05) in the internal validation and 0.77 in the temporal validation. In the EHR data, age, and respiratory-system related variables were the most important risk factors identified. In the NICE registry data, age and chronic respiratory insufficiency were the most important risk factors. CONCLUSION In our study, prognostic models built on less-granular but readily-available registry data had similar performance to models built on high-granular EHR data and showed similar transportability to a prospective COVID-19 population. Future research is needed to verify whether this finding can be confirmed for upcoming waves.
Collapse
Affiliation(s)
- Iacopo Vagliano
- Department of Medical Informatics, Amsterdam UMC, University of Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands.
| | - Martijn C Schut
- Department of Medical Informatics, Amsterdam UMC, University of Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Ameen Abu-Hanna
- Department of Medical Informatics, Amsterdam UMC, University of Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Dave A Dongelmans
- National Intensive Care Evaluation (NICE) foundation, Amsterdam, The Netherlands; Department of Intensive Care Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Dylan W de Lange
- National Intensive Care Evaluation (NICE) foundation, Amsterdam, The Netherlands; Department of Intensive Care Medicine, University Medical Center Utrecht, University Utrecht, Utrecht, The Netherlands
| | - Diederik Gommers
- Department of Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Olaf L Cremer
- Intensive Care, UMC Utrecht, Utrecht, The Netherlands
| | | | - Sander Rigter
- Department of Anesthesiology and Intensive Care, St. Antonius Hospital, Nieuwegein, The Netherlands
| | - Evert-Jan Wils
- Department of Intensive Care, Franciscus Gasthuis & Vlietland, Rotterdam, The Netherlands
| | - Tim Frenzel
- Department of Intensive Care Medicine, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Remko de Jong
- Intensive Care, Bovenij Ziekenhuis, Amsterdam, The Netherlands
| | - Marco A A Peters
- Intensive Care, Canisius Wilhelmina Ziekenhuis, Nijmegen, The Netherlands
| | - Marlijn J A Kamps
- Intensive Care, Catharina Ziekenhuis Eindhoven, Eindhoven, The Netherlands
| | | | - Ralph Nowitzky
- Intensive Care, Haga Ziekenhuis, Den Haag, The Netherlands
| | | | - Wouter de Ruijter
- Department of Intensive Care Medicine, Northwest Clinics, Alkmaar, The Netherlands
| | | | - Ellen G M Smit
- Intensive Care, Spaarne Gasthuis, Haarlem en Hoofddorp, The Netherlands
| | | | - Tom Dormans
- Intensive care, Zuyderland MC, Heerlen, The Netherlands
| | | | | | | | | | - Auke C Reidinga
- ICU, SEH, BWC, Martiniziekenhuis, Groningen, The Netherlands
| | | | - Gert B Brunnekreef
- Department of Intensive Care, Ziekenhuisgroep Twente, Almelo, The Netherlands
| | - Alexander D Cornet
- Department of Intensive Care, Medisch Spectrum Twente, Enschede, The Netherlands
| | - Walter van den Tempel
- Department of Intensive Care, Ikazia Ziekenhuis Rotterdam, Rotterdam, The Netherlands
| | - Age D Boelens
- Anesthesiology, Antonius Ziekenhuis Sneek, Sneek, The Netherlands
| | - Peter Koetsier
- Intensive Care, Medisch Centrum Leeuwarden, Leeuwarden, The Netherlands
| | - Judith Lens
- ICU, IJsselland Ziekenhuis, Capelle aan den IJssel, The Netherlands
| | | | - A Karakus
- Department of Intensive Care, Diakonessenhuis Hospital, Utrecht, The Netherlands
| | - Robert Entjes
- Department of Intensive Care, Adrz, Goes, The Netherlands
| | - Paul de Jong
- Department of Anesthesia and Intensive Care, Slingeland Ziekenhuis, Doetinchem, The Netherlands
| | - Thijs C D Rettig
- Department of Anesthesiology, Intensive Care and Pain Medicine, Amphia Ziekenhuis, Breda, The Netherlands
| | - M C Reuland
- Department of Medical Informatics, Amsterdam UMC, University of Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | | | - Lucas M Fleuren
- Department of Intensive Care Medicine, Laboratory for Critical Care Computational Intelligence, Amsterdam Medical Data Science, Amsterdam UMC, Vrije Universiteit, Amsterdam, The Netherlands
| | - Tariq A Dam
- Department of Intensive Care Medicine, Laboratory for Critical Care Computational Intelligence, Amsterdam Medical Data Science, Amsterdam UMC, Vrije Universiteit, Amsterdam, The Netherlands
| | - Patrick J Thoral
- Department of Intensive Care Medicine, Laboratory for Critical Care Computational Intelligence, Amsterdam Medical Data Science, Amsterdam UMC, Vrije Universiteit, Amsterdam, The Netherlands
| | | | | | | | - Paul W G Elbers
- Department of Intensive Care Medicine, Laboratory for Critical Care Computational Intelligence, Amsterdam Medical Data Science, Amsterdam UMC, Vrije Universiteit, Amsterdam, The Netherlands
| | - Nicolette F de Keizer
- Department of Medical Informatics, Amsterdam UMC, University of Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands; National Intensive Care Evaluation (NICE) foundation, Amsterdam, The Netherlands
| | | |
Collapse
|
7
|
A Romero RA, Y Deypalan MN, Mehrotra S, Jungao JT, Sheils NE, Manduchi E, Moore JH. Benchmarking AutoML frameworks for disease prediction using medical claims. BioData Min 2022; 15:15. [PMID: 35883154 PMCID: PMC9327416 DOI: 10.1186/s13040-022-00300-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 06/27/2022] [Indexed: 11/10/2022] Open
Abstract
Objectives Ascertain and compare the performances of Automated Machine Learning (AutoML) tools on large, highly imbalanced healthcare datasets. Materials and Methods We generated a large dataset using historical de-identified administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics. Results The AutoML tools showed improvement from the baseline random forest model but did not differ significantly from each other. All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high. Model performance was not directly related to prevalence. We provide a specific use-case to illustrate how to select a threshold that gives the best balance between true and false positive rates, as this is an important consideration in medical applications. Discussion Healthcare datasets present several challenges for AutoML tools, including large sample size, high imbalance, and limitations in the available features. Improvements in scalability, combinations of imbalance-learning resampling and ensemble approaches, and curated feature selection are possible next steps to achieve better performance. Conclusion Among the three explored, no AutoML tool consistently outperforms the rest in terms of predictive performance. The performances of the models in this study suggest that there may be room for improvement in handling medical claims data. Finally, selection of the optimal prediction threshold should be guided by the specific practical application. Supplementary Information The online version contains supplementary material available at (10.1186/s13040-022-00300-2).
Collapse
Affiliation(s)
| | | | | | | | | | - Elisabetta Manduchi
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center Suite G540, West Hollywood, 90069, CA, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center Suite G540, West Hollywood, 90069, CA, USA.
| |
Collapse
|
8
|
No-Code Platform-Based Deep-Learning Models for Prediction of Colorectal Polyp Histology from White-Light Endoscopy Images: Development and Performance Verification. J Pers Med 2022; 12:jpm12060963. [PMID: 35743748 PMCID: PMC9225479 DOI: 10.3390/jpm12060963] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 05/27/2022] [Accepted: 06/10/2022] [Indexed: 12/17/2022] Open
Abstract
Background: The authors previously developed deep-learning models for the prediction of colorectal polyp histology (advanced colorectal cancer, early cancer/high-grade dysplasia, tubular adenoma with or without low-grade dysplasia, or non-neoplasm) from endoscopic images. While the model achieved 67.3% internal-test accuracy and 79.2% external-test accuracy, model development was labour-intensive and required specialised programming expertise. Moreover, the 240-image external-test dataset included only three advanced and eight early cancers, so it was difficult to generalise model performance. These limitations may be mitigated by deep-learning models developed using no-code platforms. Objective: To establish no-code platform-based deep-learning models for the prediction of colorectal polyp histology from white-light endoscopy images and compare their diagnostic performance with traditional models. Methods: The same 3828 endoscopic images used to establish previous models were used to establish new models based on no-code platforms Neuro-T, VLAD, and Create ML-Image Classifier. A prospective multicentre validation study was then conducted using 3818 novel images. The primary outcome was the accuracy of four-category prediction. Results: The model established using Neuro-T achieved the highest internal-test accuracy (75.3%, 95% confidence interval: 71.0–79.6%) and external-test accuracy (80.2%, 76.9–83.5%) but required the longest training time. In contrast, the model established using Create ML-Image Classifier required only 3 min for training and still achieved 72.7% (70.8–74.6%) external-test accuracy. Attention map analysis revealed that the imaging features used by the no-code deep-learning models were similar to those used by endoscopists during visual inspection. Conclusion: No-code deep-learning tools allow for the rapid development of models with high accuracy for predicting colorectal polyp histology.
Collapse
|
9
|
Dong Q, Zhang X, Luo G. Improving the Accuracy of Progress Indication for Constructing Deep Learning Models. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2022; 10:63754-63781. [PMID: 35873900 PMCID: PMC9302923 DOI: 10.1109/access.2022.3181493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
For many machine learning tasks, deep learning greatly outperforms all other existing learning algorithms. However, constructing a deep learning model on a big data set often takes days or months. During this long process, it is preferable to provide a progress indicator that keeps predicting the model construction time left and the percentage of model construction work done. Recently, we developed the first method to do this that permits early stopping. That method revises its predicted model construction cost using information gathered at the validation points, where the model's error rate is computed on the validation set. Due to the sparsity of validation points, the resulting progress indicators often have a long delay in gathering information from enough validation points and obtaining relatively accurate progress estimates. In this paper, we propose a new progress indication method to overcome this shortcoming by judiciously inserting extra validation points between the original validation points. We implemented this new method in TensorFlow. Our experiments show that compared with using our prior method, using this new method reduces the progress indicator's prediction error of the model construction time left by 57.5% on average. Also, with a low overhead, this new method enables us to obtain relatively accurate progress estimates faster.
Collapse
Affiliation(s)
- Qifei Dong
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195, USA
| | - Xiaoyi Zhang
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195, USA
| | - Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
10
|
Khashu M, Dame C, Lavoie PM, De Plaen IG, Garg PM, Sampath V, Malhotra A, Caplan MD, Kumar P, Agrawal PB, Buonocore G, Christensen RD, Maheshwari A. Current Understanding of Transfusion-associated Necrotizing Enterocolitis: Review of Clinical and Experimental Studies and a Call for More Definitive Evidence. NEWBORN 2022; 1:201-208. [PMID: 35746957 PMCID: PMC9217573 DOI: 10.5005/jp-journals-11002-0005] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Affiliation(s)
| | | | - Pascal M Lavoie
- University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Parvesh M Garg
- University of Mississippi, Jackson, Mississippi, United States of America
| | - Venkatesh Sampath
- University of Missouri–Kansas City, Kansas, United States of America
| | | | - Michael D Caplan
- University of Chicago, Chicago, Illinois, United States of America
| | - Praveen Kumar
- Postgraduate Institute of Medical Education and Research, Chandigarh, Punjab, India
| | - Pankaj B Agrawal
- Boston Children’s Hospital, Harvard University, Boston, Massachusetts, United States of America
| | | | | | - Akhil Maheshwari
- Global Newborn Society, Baltimore, Maryland, United States of America
| |
Collapse
|
11
|
Luo G, Stone BL, Sheng X, He S, Koebnick C, Nkoy FL. Using Computational Methods to Improve Integrated Disease Management for Asthma and Chronic Obstructive Pulmonary Disease: Protocol for a Secondary Analysis. JMIR Res Protoc 2021; 10:e27065. [PMID: 34003134 PMCID: PMC8170556 DOI: 10.2196/27065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 04/12/2021] [Accepted: 04/19/2021] [Indexed: 12/05/2022] Open
Abstract
Background Asthma and chronic obstructive pulmonary disease (COPD) impose a heavy burden on health care. Approximately one-fourth of patients with asthma and patients with COPD are prone to exacerbations, which can be greatly reduced by preventive care via integrated disease management that has a limited service capacity. To do this well, a predictive model for proneness to exacerbation is required, but no such model exists. It would be suboptimal to build such models using the current model building approach for asthma and COPD, which has 2 gaps due to rarely factoring in temporal features showing early health changes and general directions. First, existing models for other asthma and COPD outcomes rarely use more advanced temporal features, such as the slope of the number of days to albuterol refill, and are inaccurate. Second, existing models seldom show the reason a patient is deemed high risk and the potential interventions to reduce the risk, making already occupied clinicians expend more time on chart review and overlook suitable interventions. Regular automatic explanation methods cannot deal with temporal data and address this issue well. Objective To enable more patients with asthma and patients with COPD to obtain suitable and timely care to avoid exacerbations, we aim to implement comprehensible computational methods to accurately predict proneness to exacerbation and recommend customized interventions. Methods We will use temporal features to accurately predict proneness to exacerbation, automatically find modifiable temporal risk factors for every high-risk patient, and assess the impact of actionable warnings on clinicians’ decisions to use integrated disease management to prevent proneness to exacerbation. Results We have obtained most of the clinical and administrative data of patients with asthma from 3 prominent American health care systems. We are retrieving other clinical and administrative data, mostly of patients with COPD, needed for the study. We intend to complete the study in 6 years. Conclusions Our results will help make asthma and COPD care more proactive, effective, and efficient, improving outcomes and saving resources. International Registered Report Identifier (IRRID) PRR1-10.2196/27065
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - Bryan L Stone
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| | - Xiaoming Sheng
- College of Nursing, University of Utah, Salt Lake City, UT, United States
| | - Shan He
- Care Transformation and Information Systems, Intermountain Healthcare, West Valley City, UT, United States
| | - Corinna Koebnick
- Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States
| | - Flory L Nkoy
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| |
Collapse
|
12
|
Bang CS, Lim H, Jeong HM, Hwang SH. Use of Endoscopic Images in the Prediction of Submucosal Invasion of Gastric Neoplasms: Automated Deep Learning Model Development and Usability Study. J Med Internet Res 2021; 23:e25167. [PMID: 33856356 PMCID: PMC8085753 DOI: 10.2196/25167] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 12/09/2020] [Accepted: 03/16/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND In a previous study, we examined the use of deep learning models to classify the invasion depth (mucosa-confined versus submucosa-invaded) of gastric neoplasms using endoscopic images. The external test accuracy reached 77.3%. However, model establishment is labor intense, requiring high performance. Automated deep learning (AutoDL) models, which enable fast searching of optimal neural architectures and hyperparameters without complex coding, have been developed. OBJECTIVE The objective of this study was to establish AutoDL models to classify the invasion depth of gastric neoplasms. Additionally, endoscopist-artificial intelligence interactions were explored. METHODS The same 2899 endoscopic images that were employed to establish the previous model were used. A prospective multicenter validation using 206 and 1597 novel images was conducted. The primary outcome was external test accuracy. Neuro-T, Create ML Image Classifier, and AutoML Vision were used in establishing the models. Three doctors with different levels of endoscopy expertise were asked to classify the invasion depth of gastric neoplasms for each image without AutoDL support, with faulty AutoDL support, and with best performance AutoDL support in sequence. RESULTS The Neuro-T-based model reached 89.3% (95% CI 85.1%-93.5%) external test accuracy. For the model establishment time, Create ML Image Classifier showed the fastest time of 13 minutes while reaching 82.0% (95% CI 76.8%-87.2%) external test accuracy. While the expert endoscopist's decisions were not influenced by AutoDL, the faulty AutoDL misled the endoscopy trainee and the general physician. However, this was corrected by the support of the best performance AutoDL model. The trainee gained the most benefit from the AutoDL support. CONCLUSIONS AutoDL is deemed useful for the on-site establishment of customized deep learning models. An inexperienced endoscopist with at least a certain level of expertise can benefit from AutoDL support.
Collapse
Affiliation(s)
- Chang Seok Bang
- Department of Internal Medicine, Hallym University College of Medicine, Chuncheon, Republic of Korea.,Institute for Liver and Digestive Diseases, Hallym University, Chuncheon, Republic of Korea.,Institute of New Frontier Research, Hallym University College of Medicine, Chuncheon, Republic of Korea.,Division of Big Data and Artificial Intelligence, Chuncheon Sacred Heart Hospital, Chuncheon, Republic of Korea
| | - Hyun Lim
- Department of Internal Medicine, Hallym University College of Medicine, Chuncheon, Republic of Korea
| | - Hae Min Jeong
- Department of Internal Medicine, Hallym University College of Medicine, Chuncheon, Republic of Korea
| | - Sung Hyeon Hwang
- Department of Internal Medicine, Hallym University College of Medicine, Chuncheon, Republic of Korea
| |
Collapse
|
13
|
Abstract
Machine learning (ML) has been slowly entering every aspect of our lives and its positive impact has been astonishing. To accelerate embedding ML in more applications and incorporating it in real-world scenarios, automated machine learning (AutoML) is emerging. The main purpose of AutoML is to provide seamless integration of ML in various industries, which will facilitate better outcomes in everyday tasks. In healthcare, AutoML has been already applied to easier settings with structured data such as tabular lab data. However, there is still a need for applying AutoML for interpreting medical text, which is being generated at a tremendous rate. For this to happen, a promising method is AutoML for clinical notes analysis, which is an unexplored research area representing a gap in ML research. The main objective of this paper is to fill this gap and provide a comprehensive survey and analytical study towards AutoML for clinical notes. To that end, we first introduce the AutoML technology and review its various tools and techniques. We then survey the literature of AutoML in the healthcare industry and discuss the developments specific to clinical settings, as well as those using general AutoML tools for healthcare applications. With this background, we then discuss challenges of working with clinical notes and highlight the benefits of developing AutoML for medical notes processing. Next, we survey relevant ML research for clinical notes and analyze the literature and the field of AutoML in the healthcare industry. Furthermore, we propose future research directions and shed light on the challenges and opportunities this emerging field holds. With this, we aim to assist the community with the implementation of an AutoML platform for medical notes, which if realized can revolutionize patient outcomes.
Collapse
|
14
|
Yang F, Elmer J, Zadorozhny VI. SmartPrognosis: Automatic ensemble classification for quantitative EEG analysis in patients resuscitated from cardiac arrest. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106579] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
15
|
A Unified Framework for Automatic Detection of Wound Infection with Artificial Intelligence. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10155353] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Background: The surgical wound is a unique problem requiring continuous postoperative care, and mobile health technology is implemented to bridge the care gap. Our study aim was to design an integrated framework to support the diagnosis of wound infection. Methods: We used a computer-vision approach based on supervised learning techniques and machine learning algorithms, to help detect the wound region of interest (ROI) and classify wound infection features. The intersection-union test (IUT) was used to evaluate the accuracy of the detection of color card and wound ROI. The area under the receiver operating characteristic curve (AUC) of our model was adopted in comparison with different machine learning approaches. Results: 480 wound photographs were taken from 100 patients for analysis. The average value of IUT on the validation set with fivefold stratification to detect wound ROI was 0.775. For prediction of wound infection, our model achieved a significantly higher AUC score (83.3%) than the other three methods (kernel support vector machines, 44.4%; random forest, 67.1%; gradient boosting classifier, 66.9%). Conclusions: Our evaluation of a prospectively collected wound database demonstrates the effectiveness and reliability of the proposed system, which has been developed for automatic detection of wound infections in patients undergoing surgical procedures.
Collapse
|
16
|
Dong Q, Luo G. Progress Indication for Deep Learning Model Training: A Feasibility Demonstration. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:79811-79843. [PMID: 32483518 PMCID: PMC7263346 DOI: 10.1109/access.2020.2989684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Deep learning is the state-of-the-art learning algorithm for many machine learning tasks. Yet, training a deep learning model on a large data set is often time-consuming, taking several days or even months. During model training, it is desirable to offer a non-trivial progress indicator that can continuously project the remaining model training time and the fraction of model training work completed. This makes the training process more user-friendly. In addition, we can use the information given by the progress indicator to assist in workload management. In this paper, we present the first set of techniques to support non-trivial progress indicators for deep learning model training when early stopping is allowed. We report an implementation of these techniques in TensorFlow and our evaluation results for both convolutional and recurrent neural networks. Our experiments show that our progress indicator can offer useful information even if the run-time system load varies over time. In addition, the progress indicator can self-correct its initial estimation errors, if any, over time.
Collapse
Affiliation(s)
- Qifei Dong
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195, USA
| | - Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
17
|
Nelson CR, Ekberg J, Fridell K. Prostate Cancer Detection in Screening Using Magnetic Resonance Imaging and Artificial Intelligence. ACTA ACUST UNITED AC 2020. [DOI: 10.2174/1874061802006010001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Prostate cancer is a leading cause of death among men who do not participate in a screening programme. MRI forms a possible alternative for prostate analysis of a higher level of sensitivity than the PSA test or biopsy. Magnetic resonance is a non-invasive method and magnetic resonance tomography produces a large amount of data. If a screening programme were implemented, a dramatic increase in radiologist workload and patient waiting time will follow. Computer Aided-Diagnose (CAD) could assist radiologists to decrease reading times and cost, and increase diagnostic effectiveness. CAD mimics radiologist and imaging guidelines to detect prostate cancer.
Aim:
The purpose of this study was to analyse and describe current research in MRI prostate examination with the aid of CAD. The aim was to determine if CAD systems form a reliable method for use in prostate screening.
Methods:
This study was conducted as a systematic literature review of current scientific articles. Selection of articles was carried out using the “Preferred Reporting Items for Systematic Reviews and for Meta-Analysis” (PRISMA). Summaries were created from reviewed articles and were then categorised into relevant data for results.
Results:
CAD has shown that its capability concerning sensitivity or specificity is higher than a radiologist. A CAD system can reach a peak sensitivity of 100% and two CAD systems showed a specificity of 100%. CAD systems are highly specialised and chiefly focus on the peripheral zone, which could mean missing cancer in the transition zone. CAD systems can segment the prostate with the same effectiveness as a radiologist.
Conclusion:
When CAD analysed clinically-significant tumours with a Gleason score greater than 6, CAD outperformed radiologists. However, their focus on the peripheral zone would require the use of more than one CAD system to analyse the entire prostate.
Collapse
|
18
|
Wang HL, Hsu WY, Lee MH, Weng HH, Chang SW, Yang JT, Tsai YH. Automatic Machine-Learning-Based Outcome Prediction in Patients With Primary Intracerebral Hemorrhage. Front Neurol 2019; 10:910. [PMID: 31496988 PMCID: PMC6713018 DOI: 10.3389/fneur.2019.00910] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Accepted: 08/06/2019] [Indexed: 12/27/2022] Open
Abstract
Background: A predictive model can provide physicians, relatives, and patients the accurate information regarding the severity of disease and its predicted outcome. In this study, we used an automated machine-learning-based approach to construct a prognostic model to predict the functional outcome in patients with primary intracerebral hemorrhage (ICH). Methods: We retrospectively collected data on demographic characteristics, laboratory studies and imaging findings of 333 patients with primary ICH. The functional outcomes at the 1st and 6th months after ICH were defined by the modified Rankin scale. All of the attributes were used for preprocessing and for automatic model selection with Automatic Waikato Environment for Knowledge Analysis. Confusion matrix and areas under the receiver operating characteristic curves (AUC) were used to test the predictive performance. Results: Among the models tested, the random forest provided the best predictive performance for functional outcome. The overall accuracy for predicting the 1st month outcome was 83.1%, with 77.4% sensitivity and 86.9% specificity, and the AUC was 0.899. The overall accuracy for predicting the 6th month outcome was 83.9%, with 72.5% sensitivity and 90.6% specificity, and the AUC was 0.917. Conclusions: Using an automatic machine learning technique to predict functional outcome after ICH is feasible, and the random forest model provides the best predictive performance across all tested models. This prediction model may provide information regarding functional outcome for clinicians that will help provide appropriate medical care for patients and information for their caregivers.
Collapse
Affiliation(s)
- Hsueh-Lin Wang
- Department of Diagnostic Radiology, Chang Gung Memorial Hospital, Chiayi, Taiwan
| | - Wei-Yen Hsu
- Department of Information Management, National Chung Cheng University, Chiayi, Taiwan
| | - Ming-Hsueh Lee
- Department of Neurosurgery, Chang Gung Memorial Hospital, Chiayi, Taiwan.,Chang Gung University College of Medicine, Taoyuan, Taiwan
| | - Hsu-Huei Weng
- Department of Diagnostic Radiology, Chang Gung Memorial Hospital, Chiayi, Taiwan
| | - Sheng-Wei Chang
- Department of Diagnostic Radiology, Chang Gung Memorial Hospital, Chiayi, Taiwan
| | - Jen-Tsung Yang
- Department of Neurosurgery, Chang Gung Memorial Hospital, Chiayi, Taiwan.,Chang Gung University College of Medicine, Taoyuan, Taiwan
| | - Yuan-Hsiung Tsai
- Department of Diagnostic Radiology, Chang Gung Memorial Hospital, Chiayi, Taiwan
| |
Collapse
|
19
|
Luo G, Stone BL, Koebnick C, He S, Au DH, Sheng X, Murtaugh MA, Sward KA, Schatz M, Zeiger RS, Davidson GH, Nkoy FL. Using Temporal Features to Provide Data-Driven Clinical Early Warnings for Chronic Obstructive Pulmonary Disease and Asthma Care Management: Protocol for a Secondary Analysis. JMIR Res Protoc 2019; 8:e13783. [PMID: 31199308 PMCID: PMC6592592 DOI: 10.2196/13783] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2019] [Revised: 05/13/2019] [Accepted: 05/14/2019] [Indexed: 01/19/2023] Open
Abstract
Background Both chronic obstructive pulmonary disease (COPD) and asthma incur heavy health care burdens. To support tailored preventive care for these 2 diseases, predictive modeling is widely used to give warnings and to identify patients for care management. However, 3 gaps exist in current modeling methods owing to rarely factoring in temporal aspects showing trends and early health change: (1) existing models seldom use temporal features and often give late warnings, making care reactive. A health risk is often found at a relatively late stage of declining health, when the risk of a poor outcome is high and resolving the issue is difficult and costly. A typical model predicts patient outcomes in the next 12 months. This often does not warn early enough. If a patient will actually be hospitalized for COPD next week, intervening now could be too late to avoid the hospitalization. If temporal features were used, this patient could potentially be identified a few weeks earlier to institute preventive therapy; (2) existing models often miss many temporal features with high predictive power and have low accuracy. This makes care management enroll many patients not needing it and overlook over half of the patients needing it the most; (3) existing models often give no information on why a patient is at high risk nor about possible interventions to mitigate risk, causing busy care managers to spend more time reviewing charts and to miss suited interventions. Typical automatic explanation methods cannot handle longitudinal attributes and fully address these issues. Objective To fill these gaps so that more COPD and asthma patients will receive more appropriate and timely care, we will develop comprehensible data-driven methods to provide accurate early warnings of poor outcomes and to suggest tailored interventions, making care more proactive, efficient, and effective. Methods By conducting a secondary data analysis and surveys, the study will: (1) use temporal features to provide accurate early warnings of poor outcomes and assess the potential impact on prediction accuracy, risk warning timeliness, and outcomes; (2) automatically identify actionable temporal risk factors for each patient at high risk for future hospital use and assess the impact on prediction accuracy and outcomes; and (3) assess the impact of actionable information on clinicians’ acceptance of early warnings and on perceived care plan quality. Results We are obtaining clinical and administrative datasets from 3 leading health care systems’ enterprise data warehouses. We plan to start data analysis in 2020 and finish our study in 2025. Conclusions Techniques to be developed in this study can boost risk warning timeliness, model accuracy, and generalizability; improve patient finding for preventive care; help form tailored care plans; advance machine learning for many clinical applications; and be generalized for many other chronic diseases. International Registered Report Identifier (IRRID) PRR1-10.2196/13783
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - Bryan L Stone
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| | - Corinna Koebnick
- Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States
| | - Shan He
- Care Transformation, Intermountain Healthcare, Salt Lake City, UT, United States
| | - David H Au
- Center of Innovation for Veteran-Centered & Value-Driven Care, VA Puget Sound Health Care System, Seattle, WA, United States.,Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Washington, Seattle, WA, United States
| | - Xiaoming Sheng
- College of Nursing, University of Utah, Salt Lake City, UT, United States
| | - Maureen A Murtaugh
- Department of Family and Preventive Medicine, University of Utah, Salt Lake City, UT, United States
| | - Katherine A Sward
- College of Nursing, University of Utah, Salt Lake City, UT, United States
| | - Michael Schatz
- Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States.,Department of Allergy, Kaiser Permanente Southern California, San Diego, CA, United States
| | - Robert S Zeiger
- Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States.,Department of Allergy, Kaiser Permanente Southern California, San Diego, CA, United States
| | - Giana H Davidson
- Department of Surgery, University of Washington, Seattle, WA, United States
| | - Flory L Nkoy
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| |
Collapse
|
20
|
Luo G. A roadmap for semi-automatically extracting predictive and clinically meaningful temporal features from medical data for predictive modeling. GLOBAL TRANSITIONS 2019; 1:61-82. [PMID: 31032483 PMCID: PMC6482973 DOI: 10.1016/j.glt.2018.11.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Predictive modeling based on machine learning with medical data has great potential to improve healthcare and reduce costs. However, two hurdles, among others, impede its widespread adoption in hdealthcare. First, medical data are by nature longitudinal. Pre-processing them, particularly for feature engineering, is labor intensive and often takes 50-80% of the model building effort. Predictive temporal features are the basis of building accurate models, but are difficult to identify. This is problematic. Healthcare systems have limited resources for model building, while inaccurate models produce sub-optimal outcomes and are often useless. Second, most machine learning models provide no explanation of their prediction results. However, offering such explanations is essential for a model to be used in usual clinical practice. To address these two hurdles, this paper outlines: 1) a data-driven method for semi-automatically extracting predictive and clinically meaningful temporal features from medical data for predictive modeling; and 2) a method of using these features to automatically explain machine learning prediction results and suggest tailored interventions. This provides a roadmap for future research.
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, UW Medicine South Lake Union, 850 Republican Street, Building C, Box 358047, Seattle, WA, 98109, USA
| |
Collapse
|
21
|
Luo G. Progress Indication for Machine Learning Model Building: A Feasibility Demonstration. SIGKDD EXPLORATIONS : NEWSLETTER OF THE SPECIAL INTEREST GROUP (SIG) ON KNOWLEDGE DISCOVERY & DATA MINING 2018; 20:1-12. [PMID: 30854154 DOI: 10.1145/3299986.3299988] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Progress indicators are desirable for machine learning model building that often takes a long time, by continuously estimating the remaining model building time and the portion of model building work that has been finished. Recently, we proposed a high-level framework using system approaches to support non-trivial progress indicators for machine learning model building, but offered no detailed implementation technique. It remains to be seen whether it is feasible to provide such progress indicators. In this paper, we fill this gap and give the first demonstration that offering such progress indicators is viable. We describe detailed progress indicator implementation techniques for three major, supervised machine learning algorithms. We report an implementation of these techniques in Weka.
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington UW Medicine South Lake Union, 850 Republican Street, Building C, Box 358047 Seattle, WA 98195, USA,
| |
Collapse
|
22
|
Luo G, Tarczy-Hornoch P, Wilcox AB, Lee ES. Identifying Patients Who Are Likely to Receive Most of Their Care From a Specific Health Care System: Demonstration via Secondary Analysis. JMIR Med Inform 2018; 6:e12241. [PMID: 30401670 PMCID: PMC6246965 DOI: 10.2196/12241] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 10/13/2018] [Accepted: 10/16/2018] [Indexed: 01/22/2023] Open
Abstract
Background In the United States, health care is fragmented in numerous distinct health care systems including private, public, and federal organizations like private physician groups and academic medical centers. Many patients have their complete medical data scattered across these several health care systems, with no particular system having complete data on any of them. Several major data analysis tasks such as predictive modeling using historical data are considered impractical on incomplete data. Objective Our objective was to find a way to enable these analysis tasks for a health care system with incomplete data on many of its patients. Methods This study presents, to the best of our knowledge, the first method to use a geographic constraint to identify a reasonably large subset of patients who tend to receive most of their care from a given health care system. A data analysis task needing relatively complete data can be conducted on this subset of patients. We demonstrated our method using data from the University of Washington Medicine (UWM) and PreManage data covering the use of all hospitals in Washington State. We compared 10 candidate constraints to optimize the solution. Results For UWM, the best constraint is that the patient has a UWM primary care physician and lives within 5 miles of at least one UWM hospital. About 16.01% (55,707/348,054) of UWM patients satisfied this constraint. Around 69.38% (10,501/15,135) of their inpatient stays and emergency department visits occurred within UWM in the following 6 months, more than double the corresponding percentage for all UWM patients. Conclusions Our method can identify a reasonably large subset of patients who tend to receive most of their care from UWM. This enables several major analysis tasks on incomplete medical data that were previously deemed infeasible.
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - Peter Tarczy-Hornoch
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States.,Division of Neonatology, Department of Pediatrics, University of Washington, Seattle, WA, United States.,Department of Computer Science and Engineering, University of Washington, Seattle, WA, United States
| | - Adam B Wilcox
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - E Sally Lee
- Population Health Analytics, University of Washington Medicine Finance, University of Washington, Seattle, WA, United States
| |
Collapse
|
23
|
Alaa AM, van der Schaar M. Prognostication and Risk Factors for Cystic Fibrosis via Automated Machine Learning. Sci Rep 2018; 8:11242. [PMID: 30050169 PMCID: PMC6062529 DOI: 10.1038/s41598-018-29523-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 07/03/2018] [Indexed: 01/14/2023] Open
Abstract
Accurate prediction of survival for cystic fibrosis (CF) patients is instrumental in establishing the optimal timing for referring patients with terminal respiratory failure for lung transplantation (LT). Current practice considers referring patients for LT evaluation once the forced expiratory volume (FEV1) drops below 30% of its predicted nominal value. While FEV1 is indeed a strong predictor of CF-related mortality, we hypothesized that the survival behavior of CF patients exhibits a lot more heterogeneity. To this end, we developed an algorithmic framework, which we call AutoPrognosis, that leverages the power of machine learning to automate the process of constructing clinical prognostic models, and used it to build a prognostic model for CF using data from a contemporary cohort that involved 99% of the CF population in the UK. AutoPrognosis uses Bayesian optimization techniques to automate the process of configuring ensembles of machine learning pipelines, which involve imputation, feature processing, classification and calibration algorithms. Because it is automated, it can be used by clinical researchers to build prognostic models without the need for in-depth knowledge of machine learning. Our experiments revealed that the accuracy of the model learned by AutoPrognosis is superior to that of existing guidelines and other competing models.
Collapse
Affiliation(s)
- Ahmed M Alaa
- Department of Electrical Engineering, University of California, Los Angeles, CA, 90095, USA.
| | - Mihaela van der Schaar
- Department of Electrical Engineering, University of California, Los Angeles, CA, 90095, USA.
- Alan Turing Institute, London, NW1 2DB, UK.
- Engineering Science Department, University of Oxford, Oxford, OX1 3PJ, UK.
| |
Collapse
|
24
|
D'Argenio V. The High-Throughput Analyses Era: Are We Ready for the Data Struggle? High Throughput 2018; 7:E8. [PMID: 29498666 PMCID: PMC5876534 DOI: 10.3390/ht7010008] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Revised: 02/16/2018] [Accepted: 02/27/2018] [Indexed: 12/23/2022] Open
Abstract
Recent and rapid technological advances in molecular sciences have dramatically increased the ability to carry out high-throughput studies characterized by big data production. This, in turn, led to the consequent negative effect of highlighting the presence of a gap between data yield and their analysis. Indeed, big data management is becoming an increasingly important aspect of many fields of molecular research including the study of human diseases. Now, the challenge is to identify, within the huge amount of data obtained, that which is of clinical relevance. In this context, issues related to data interpretation, sharing and storage need to be assessed and standardized. Once this is achieved, the integration of data from different -omic approaches will improve the diagnosis, monitoring and therapy of diseases by allowing the identification of novel, potentially actionably biomarkers in view of personalized medicine.
Collapse
Affiliation(s)
- Valeria D'Argenio
- CEINGE-Biotecnologie Avanzate, via G. Salvatore 486, 80145 Naples, Italy.
- Department of Molecular Medicine and Medical Biotechnologies, University of Naples Federico II, via Pansini 5, 80131 Naples, Italy.
| |
Collapse
|
25
|
Zeng X, Luo G. Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection. Health Inf Sci Syst 2017; 5:2. [PMID: 29038732 PMCID: PMC5617811 DOI: 10.1007/s13755-017-0023-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 09/20/2017] [Indexed: 12/11/2022] Open
Abstract
PURPOSE Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. METHODS To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. RESULTS We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. CONCLUSIONS This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Collapse
Affiliation(s)
- Xueqiang Zeng
- Computer Center, Nanchang University, 999 Xuefu Road, Nanchang, 330031 Jiangxi People’s Republic of China
| | - Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, UW Medicine South Lake Union, 850 Republican Street, Building C, Box 358047, Seattle, WA 98109 USA
| |
Collapse
|
26
|
Luo G. Toward a Progress Indicator for Machine Learning Model Building and Data Mining Algorithm Execution: A Position Paper. ACTA ACUST UNITED AC 2017; 19:13-24. [PMID: 29177022 DOI: 10.1145/3166054.3166057] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
For user-friendliness, many software systems offer progress indicators for long-duration tasks. A typical progress indicator continuously estimates the remaining task execution time as well as the portion of the task that has been finished. Building a machine learning model often takes a long time, but no existing machine learning software supplies a non-trivial progress indicator. Similarly, running a data mining algorithm often takes a long time, but no existing data mining software provides a nontrivial progress indicator. In this article, we consider the problem of offering progress indicators for machine learning model building and data mining algorithm execution. We discuss the goals and challenges intrinsic to this problem. Then we describe an initial framework for implementing such progress indicators and two advanced, potential uses of them, with the goal of inspiring future research on this topic.
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, UW Medicine South Lake Union, 850 Republican Street, Building C, Box 358047, Seattle, WA 98195, USA
| |
Collapse
|