1
|
Chen W, Howard K, Gorham G, Abeyaratne A, Zhao Y, Adegboye O, Kangaharan N, Talukder MRR, Taylor S, Cass A. Costs and healthcare use of patients with chronic kidney disease in the Northern Territory, Australia. BMC Health Serv Res 2024; 24:791. [PMID: 38982437 PMCID: PMC11234693 DOI: 10.1186/s12913-024-11258-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 06/27/2024] [Indexed: 07/11/2024] Open
Abstract
BACKGROUND The burden of chronic kidney disease (CKD) is high in the Northern Territory (NT), Australia. This study aims to describe the healthcare use and associated costs of people at risk of CKD (e.g. acute kidney injury, diabetes, hypertension, and cardiovascular disease) or living with CKD in the NT, from a healthcare funder perspective. METHODS We included a retrospective cohort of patients at risk of, or living with CKD, on 1 January 2017. Patients on kidney replacement therapy were excluded from the study. Data from the Territory Kidney Care database, encompassing patients from public hospitals and primary health care services across the NT was used to conduct costing. Annual healthcare costs, including hospital, primary health care, medication, and investigation costs were described over a one-year follow-up period. Factors associated with high total annual healthcare costs were identified with a cost prediction model. RESULTS Among 37,398 patients included in this study, 23,419 had a risk factor for CKD while 13,979 had CKD (stages 1 to 5, not on kidney replacement therapy). The overall mean (± SD) age was 45 years (± 17), and a large proportion of the study cohort were First Nations people (68%). Common comorbidities in the overall cohort included diabetes (36%), hypertension (32%), and coronary artery disease (11%). Annual healthcare cost was lowest in those at risk of CKD (AUD$7,958 per person) and highest in those with CKD stage 5 (AUD$67,117 per person). Inpatient care contributed to the majority (76%) of all healthcare costs. Predictors of increased total annual healthcare cost included more advanced stages of CKD, and the presence of comorbidities. In CKD stage 5, the additional cost per person per year was + $53,634 (95%CI 32,769 to 89,482, p < 0.001) compared to people in the at risk group without CKD. CONCLUSION The total healthcare costs in advanced stages of CKD is high, even when patients are not on dialysis. There remains a need for effective primary prevention and early intervention strategies targeting CKD and related chronic conditions.
Collapse
Affiliation(s)
- Winnie Chen
- Menzies School of Health Research, Charles Darwin University, PO Box 41096, Casuarina, Darwin, NT, Australia.
- Menzies Centre for Health Policy and Economics, Faculty of Medicine and Health, University of Sydney, Sydney, Australia.
| | - Kirsten Howard
- Menzies Centre for Health Policy and Economics, Faculty of Medicine and Health, University of Sydney, Sydney, Australia
| | - Gillian Gorham
- Menzies School of Health Research, Charles Darwin University, PO Box 41096, Casuarina, Darwin, NT, Australia
| | - Asanga Abeyaratne
- Menzies School of Health Research, Charles Darwin University, PO Box 41096, Casuarina, Darwin, NT, Australia
- NT Health, Darwin, Australia
| | | | - Oyelola Adegboye
- Menzies School of Health Research, Charles Darwin University, PO Box 41096, Casuarina, Darwin, NT, Australia
| | | | | | - Sean Taylor
- Menzies School of Health Research, Charles Darwin University, PO Box 41096, Casuarina, Darwin, NT, Australia
- NT Health, Darwin, Australia
| | - Alan Cass
- Menzies School of Health Research, Charles Darwin University, PO Box 41096, Casuarina, Darwin, NT, Australia
| |
Collapse
|
2
|
Cauchi M, Mills AR, Lawrie A, Kiely DG, Kadirkamanathan V. Individualized survival predictions using state space model with longitudinal and survival data. J R Soc Interface 2024; 21:20230682. [PMID: 39081111 PMCID: PMC11289657 DOI: 10.1098/rsif.2023.0682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 05/21/2024] [Indexed: 08/02/2024] Open
Abstract
Monitoring disease progression often involves tracking biomarker measurements over time. Joint models (JMs) for longitudinal and survival data provide a framework to explore the relationship between time-varying biomarkers and patients' event outcomes, offering the potential for personalized survival predictions. In this article, we introduce the linear state space dynamic survival model for handling longitudinal and survival data. This model enhances the traditional linear Gaussian state space model by including survival data. It differs from the conventional JMs by offering an alternative interpretation via differential or difference equations, eliminating the need for creating a design matrix. To showcase the model's effectiveness, we conduct a simulation case study, emphasizing its performance under conditions of limited observed measurements. We also apply the proposed model to a dataset of pulmonary arterial hypertension patients, demonstrating its potential for enhanced survival predictions when compared with conventional risk scores.
Collapse
Affiliation(s)
- Mark Cauchi
- Department of Automatic Control and Systems Engineering, The University of Sheffield, Mappin Street, Sheffield S1 3JD, UK
| | - Andrew R. Mills
- Department of Automatic Control and Systems Engineering, The University of Sheffield, Mappin Street, Sheffield S1 3JD, UK
| | - Allan Lawrie
- National Heart and Lung Institute, Imperial College London, Dovehouse Street, London SW3 6LY, UK
| | - David G. Kiely
- Sheffield Pulmonary Vascular Disease Unit, Royal Hallamshire Hospital Sheffield, NIHR Biomedical Research Centre Sheffield and Department of Clinical Medicine, The University of Sheffield, Beech Hill Road, Sheffield S10 2RX, UK
| | - Visakan Kadirkamanathan
- Department of Automatic Control and Systems Engineering, The University of Sheffield, Mappin Street, Sheffield S1 3JD, UK
| |
Collapse
|
3
|
Miao G, Yu L, Yang J, Bennett DA, Zhao J, Wu SS. Learning from vertically distributed data across multiple sites: An efficient privacy-preserving algorithm for Cox proportional hazards model with variable selection. J Biomed Inform 2024; 149:104581. [PMID: 38142903 PMCID: PMC10996392 DOI: 10.1016/j.jbi.2023.104581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/24/2023] [Accepted: 12/19/2023] [Indexed: 12/26/2023]
Abstract
OBJECTIVE To develop a lossless distributed algorithm for regularized Cox proportional hazards model with variable selection to support federated learning for vertically distributed data. METHODS We propose a novel distributed algorithm for fitting regularized Cox proportional hazards model when data sharing among different data providers is restricted. Based on cyclical coordinate descent, the proposed algorithm computes intermediary statistics by each site and then exchanges them to update the model parameters in other sites without accessing individual patient-level data. We evaluate the performance of the proposed algorithm with (1) a simulation study and (2) a real-world data analysis predicting the risk of Alzheimer's dementia from the Religious Orders Study and Rush Memory and Aging Project (ROSMAP). Moreover, we compared the performance of our method with existing privacy-preserving models. RESULTS Our algorithm achieves privacy-preserving variable selection for time-to-event data in the vertically distributed setting, without degradation of accuracy compared with a centralized approach. Simulation demonstrates that our algorithm is highly efficient in analyzing high-dimensional datasets. Real-world data analysis reveals that our distributed Cox model yields higher accuracy in predicting the risk of Alzheimer's dementia than the conventional Cox model built by each data provider without data sharing. Moreover, our algorithm is computationally more efficient compared with existing privacy-preserving Cox models with or without regularization term. CONCLUSION The proposed algorithm is lossless, privacy-preserving and highly efficient to fit regularized Cox model for vertically distributed data. It provides a suitable and convenient approach for modeling time-to-event data in a distributed manner.
Collapse
Affiliation(s)
- Guanhong Miao
- Department of Epidemiology, College of Public Health & Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA; Center for Genetic Epidemiology and Bioinformatics, University of Florida, Gainesville, FL, USA; Department of Biostatistics, College of Public Health & Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA
| | - Lei Yu
- Rush Alzheimer's Disease Center & Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA
| | - Jingyun Yang
- Rush Alzheimer's Disease Center & Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center & Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA
| | - Jinying Zhao
- Department of Epidemiology, College of Public Health & Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA; Center for Genetic Epidemiology and Bioinformatics, University of Florida, Gainesville, FL, USA
| | - Samuel S Wu
- Department of Biostatistics, College of Public Health & Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
4
|
Arivazhagan N, Van Vleck TT. Natural Language Processing Basics. Clin J Am Soc Nephrol 2023; 18:400-401. [PMID: 36763809 PMCID: PMC10103357 DOI: 10.2215/cjn.0000000000000081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
| | - Tielman T. Van Vleck
- Icahn School of Medicine at Mount Sinai, Institute of Personalized Medicine, New York, New York
| |
Collapse
|
5
|
Ötleş E, Seymour J, Wang H, Denton BT. Dynamic prediction of work status for workers with occupational injuries: assessing the value of longitudinal observations. J Am Med Inform Assoc 2022; 29:1931-1940. [PMID: 36036358 PMCID: PMC9552285 DOI: 10.1093/jamia/ocac130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 06/22/2022] [Accepted: 07/20/2022] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE Occupational injuries (OIs) cause an immense burden on the US population. Prediction models help focus resources on those at greatest risk of a delayed return to work (RTW). RTW depends on factors that develop over time; however, existing methods only utilize information collected at the time of injury. We investigate the performance benefits of dynamically estimating RTW, using longitudinal observations of diagnoses and treatments collected beyond the time of initial injury. MATERIALS AND METHODS We characterize the difference in predictive performance between an approach that uses information collected at the time of initial injury (baseline model) and a proposed approach that uses longitudinal information collected over the course of the patient's recovery period (proposed model). To control the comparison, both models use the same deep learning architecture and differ only in the information used. We utilize a large longitudinal observation dataset of OI claims and compare the performance of the two approaches in terms of daily prediction of future work state (working vs not working). The performance of these two approaches was assessed in terms of the area under the receiver operator characteristic curve (AUROC) and expected calibration error (ECE). RESULTS After subsampling and applying inclusion criteria, our final dataset covered 294 103 OIs, which were split evenly between train, development, and test datasets (1/3, 1/3, 1/3). In terms of discriminative performance on the test dataset, the proposed model had an AUROC of 0.728 (90% confidence interval: 0.723, 0.734) versus the baseline's 0.591 (0.585, 0.598). The proposed model had an ECE of 0.004 (0.003, 0.005) versus the baseline's 0.016 (0.009, 0.018). CONCLUSION The longitudinal approach outperforms current practice and shows potential for leveraging observational data to dynamically update predictions of RTW in the setting of OI. This approach may enable physicians and workers' compensation programs to manage large populations of injured workers more effectively.
Collapse
Affiliation(s)
- Erkin Ötleş
- Department of Industrial & Operations Engineering, University of Michigan, Ann Arbor, Michigan, USA
- Medical Scientist Training Program, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | | | - Haozhu Wang
- Department of Electrical Engineering & Computer Science, University of Michigan, Ann Arbor, Michigan, USA
| | - Brian T Denton
- Department of Industrial & Operations Engineering, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
6
|
Wang X, Zhang HG, Xiong X, Hong C, Weber GM, Brat GA, Bonzel CL, Luo Y, Duan R, Palmer NP, Hutch MR, Gutiérrez-Sacristán A, Bellazzi R, Chiovato L, Cho K, Dagliati A, Estiri H, García-Barrio N, Griffier R, Hanauer DA, Ho YL, Holmes JH, Keller MS, Klann MEng JG, L'Yi S, Lozano-Zahonero S, Maidlow SE, Makoudjou A, Malovini A, Moal B, Moore JH, Morris M, Mowery DL, Murphy SN, Neuraz A, Yuan Ngiam K, Omenn GS, Patel LP, Pedrera-Jiménez M, Prunotto A, Jebathilagam Samayamuthu M, Sanz Vidorreta FJ, Schriver ER, Schubert P, Serrano-Balazote P, South AM, Tan ALM, Tan BWL, Tibollo V, Tippmann P, Visweswaran S, Xia Z, Yuan W, Zöller D, Kohane IS, Avillach P, Guo Z, Cai T. SurvMaximin: Robust federated approach to transporting survival risk prediction models. J Biomed Inform 2022; 134:104176. [PMID: 36007785 PMCID: PMC9707637 DOI: 10.1016/j.jbi.2022.104176] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 07/18/2022] [Accepted: 08/15/2022] [Indexed: 10/15/2022]
Abstract
OBJECTIVE For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information. MATERIALS AND METHODS For each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or a single center, corresponding to transfer learning. RESULTS Simulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations. CONCLUSIONS The SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.
Collapse
Affiliation(s)
- Xuan Wang
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Harrison G Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Xin Xiong
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Chuan Hong
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Gabriel A Brat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yuan Luo
- Department of Preventive Medicine Northwestern University, Chicago, IL, USA
| | - Rui Duan
- Department of Biostatistics, Harvard University, Boston, MA, USA
| | - Nathan P Palmer
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Meghan R Hutch
- Department of Preventive Medicine Northwestern University, Chicago, IL, USA
| | | | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Luca Chiovato
- Unit of Internal Medicine and Endocrinology, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Kelly Cho
- Population Health and Data Science, VA Boston Healthcare System, Boston, MA, USA; Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | - Arianna Dagliati
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Hossein Estiri
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | | | - Romain Griffier
- IAM unit, Bordeaux University Hospital, Bordeaux, France; INSERM Bordeaux Population Health ERIAS TEAM, ERIAS - Inserm U1219 BPH, Bordeaux, France
| | - David A Hanauer
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, USA
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | - John H Holmes
- Department of Biostatistics, Epidemiology, and Informatics University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Mark S Keller
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | | | - Sehi L'Yi
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Sara Lozano-Zahonero
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Sarah E Maidlow
- Michigan Institute for Clinical and Health Research (MICHR) Informatics, University of Michigan, Ann Arbor, MI, USA
| | - Adeline Makoudjou
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Alberto Malovini
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Bertrand Moal
- IAM unit, Bordeaux University Hospital, Bordeaux, France
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Danielle L Mowery
- Department of Biostatistics, Epidemiology, and Informatics University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Shawn N Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | - Antoine Neuraz
- Department of biomedical informatics, Hôpital Necker-Enfants Malade, Assistance Publique Hôpitaux de Paris (APHP), University of Paris, Paris, France
| | - Kee Yuan Ngiam
- Department of Biomedical informatics, WiSDM, National University Health Systems, Singapore
| | - Gilbert S Omenn
- Depts of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics, Public Health University of Michigan, Ann Arbor, MI, USA
| | - Lav P Patel
- Department of Internal Medicine, Division of Medical Informatics, University Of Kansas Medical Center
| | | | - Andrea Prunotto
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | | | | | - Emily R Schriver
- Data Analytics Center, University of Pennsylvania Health System, Philadelphia, PA, USA
| | - Petra Schubert
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | | | - Andrew M South
- Department of Pediatrics-Section of Nephrology, Brenner Children's, Wake Forest School of Medicine, Winston Salem, NC, USA
| | - Amelia L M Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Byorn W L Tan
- Department of Medicine, National University Hospital, Singapore
| | - Valentina Tibollo
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Patric Tippmann
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, PA, USA
| | - William Yuan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Daniela Zöller
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Zijian Guo
- Department of Statistics, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
7
|
Lim D, Randall S, Robinson S, Thomas E, Williamson J, Chakera A, Napier K, Schwan C, Manuel J, Betts K, Kane C, Boyd J. Unlocking Potential within Health Systems Using Privacy-Preserving Record Linkage: Exploring Chronic Kidney Disease Outcomes through Linked Data Modelling. Appl Clin Inform 2022; 13:901-909. [PMID: 36170880 PMCID: PMC9519263 DOI: 10.1055/s-0042-1757174] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 07/28/2022] [Indexed: 11/02/2022] Open
Abstract
BACKGROUND Chronic kidney disease (CKD) is a major global health problem that affects approximately one in 10 adults. Up to 90% of individuals with CKD go undetected until its progression to advanced stages, invariably leading to death in the absence of treatment. The project aims to fill information gaps around the burden of CKD in the Western Australian (WA) population, including incidence, prevalence, rate of progression, and economic cost to the health system. METHODS Given the sensitivity of the information involved, the project employed a privacy preserving record linkage methodology to link data from four major pathology providers in WA to hospital records, to establish a CKD registry with continuous medical record for individuals with biochemical specification for CKD. This method uses encrypted personal identifying information in a probability-based linkage framework (Bloom filters) to help mitigate risk while maximizing linkage quality. RESULTS The project developed interoperable technology to create a transparent CKD data catalogue which is linkable to other datasets. This technology has been designed to support the aspirations of the research program to provide linked de-identified pathology, morbidity, and mortality data that can be used to derive insights to enable better CKD patient outcomes. The cohort includes over 1 million individuals with creatinine results over the period 2002 to 2021. CONCLUSION Using linked data from across the care continuum, researchers are able to evaluate the effectiveness of service delivery and provide evidence for policy and program development. The CKD registry will enable an innovative review of the epidemiology of CKD in WA. Linking pathology records can identify cases of CKD that are missed in the early stages due to disaggregation of results, enabling identification of at-risk populations that represent targets for early intervention and management.
Collapse
Affiliation(s)
- David Lim
- Curtin School of Population Health, Curtin University, Perth, Western Australia, Australia
| | - Sean Randall
- Curtin School of Population Health, Curtin University, Perth, Western Australia, Australia
| | - Suzanne Robinson
- Curtin School of Population Health, Curtin University, Perth, Western Australia, Australia
- Deakin Health Economics, Deakin University, Burwood, Victoria, Australia
| | - Elizabeth Thomas
- Curtin School of Population Health, Curtin University, Perth, Western Australia, Australia
- Medical School, The University of Western Australia, Perth, Western Australia, Australia
| | | | - Aron Chakera
- Medical School, The University of Western Australia, Perth, Western Australia, Australia
- Renal Unit, Sir Charles Gairdner Hospital, Perth, Western Australia, Australia
| | - Kathryn Napier
- Curtin Institute for Computation, Curtin University, Perth, Western Australia, Australia
| | - Carola Schwan
- WA Country Health Service, Perth, Western Australia, Australia
| | - Justin Manuel
- WA Country Health Service, Perth, Western Australia, Australia
| | - Kim Betts
- Curtin School of Population Health, Curtin University, Perth, Western Australia, Australia
| | - Chris Kane
- WA Primary Health Alliance, Perth, Western Australia, Australia
| | - James Boyd
- Curtin School of Population Health, Curtin University, Perth, Western Australia, Australia
- La Trobe University, Melbourne, Bundoora, Victoria, Australia
| |
Collapse
|
8
|
Lim DKE, Boyd JH, Thomas E, Chakera A, Tippaya S, Irish A, Manuel J, Betts K, Robinson S. Prediction models used in the progression of chronic kidney disease: A scoping review. PLoS One 2022; 17:e0271619. [PMID: 35881639 PMCID: PMC9321365 DOI: 10.1371/journal.pone.0271619] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 07/04/2022] [Indexed: 11/19/2022] Open
Abstract
Objective
To provide a review of prediction models that have been used to measure clinical or pathological progression of chronic kidney disease (CKD).
Design
Scoping review.
Data sources
Medline, EMBASE, CINAHL and Scopus from the year 2011 to 17th February 2022.
Study selection
All English written studies that are published in peer-reviewed journals in any country, that developed at least a statistical or computational model that predicted the risk of CKD progression.
Data extraction
Eligible studies for full text review were assessed on the methods that were used to predict the progression of CKD. The type of information extracted included: the author(s), title of article, year of publication, study dates, study location, number of participants, study design, predicted outcomes, type of prediction model, prediction variables used, validation assessment, limitations and implications.
Results
From 516 studies, 33 were included for full-text review. A qualitative analysis of the articles was compared following the extracted information. The study populations across the studies were heterogenous and data acquired by the studies were sourced from different levels and locations of healthcare systems. 31 studies implemented supervised models, and 2 studies included unsupervised models. Regardless of the model used, the predicted outcome included measurement of risk of progression towards end-stage kidney disease (ESKD) of related definitions, over given time intervals. However, there is a lack of reporting consistency on details of the development of their prediction models.
Conclusions
Researchers are working towards producing an effective model to provide key insights into the progression of CKD. This review found that cox regression modelling was predominantly used among the small number of studies in the review. This made it difficult to perform a comparison between ML algorithms, more so when different validation methods were used in different cohort types. There needs to be increased investment in a more consistent and reproducible approach for future studies looking to develop risk prediction models for CKD progression.
Collapse
Affiliation(s)
- David K. E. Lim
- Curtin School of Population Health, Curtin University, Perth, WA, Australia
- * E-mail:
| | - James H. Boyd
- Curtin School of Population Health, Curtin University, Perth, WA, Australia
- La Trobe University, Melbourne, Bundoora, VIC, Australia
| | - Elizabeth Thomas
- Curtin School of Population Health, Curtin University, Perth, WA, Australia
- Medical School, The University of Western Australia, Perth, WA, Australia
| | - Aron Chakera
- Medical School, The University of Western Australia, Perth, WA, Australia
- Renal Unit, Sir Charles Gairdner Hospital, Perth, WA, Australia
| | - Sawitchaya Tippaya
- Curtin Institute for Computation, Curtin University, Perth, WA, Australia
| | | | | | - Kim Betts
- Curtin School of Population Health, Curtin University, Perth, WA, Australia
| | - Suzanne Robinson
- Curtin School of Population Health, Curtin University, Perth, WA, Australia
- Deakin Health Economics, Deakin University, Burwood, VIC, Australia
| |
Collapse
|
9
|
Dai W, Jiang X, Bonomi L, Li Y, Xiong H, Ohno-Machado L. VERTICOX: Vertically Distributed Cox Proportional Hazards Model Using the Alternating Direction Method of Multipliers. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2022; 34:996-1010. [PMID: 36158636 PMCID: PMC9491599 DOI: 10.1109/tkde.2020.2989301] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The Cox proportional hazards model is a popular semi-parametric model for survival analysis. In this paper, we aim at developing a federated algorithm for the Cox proportional hazards model over vertically partitioned data (i.e., data from the same patient are stored at different institutions). We propose a novel algorithm, namely VERTICOX, to obtain the global model parameters in a distributed fashion based on the Alternating Direction Method of Multipliers (ADMM) framework. The proposed model computes intermediary statistics and exchanges them to calculate the global model without collecting individual patient-level data. We demonstrate that our algorithm achieves equivalent accuracy for the estimation of model parameters and statistics to that of its centralized realization. The proposed algorithm converges linearly under the ADMM framework. Its computational complexity and communication costs are polynomially and linearly associated with the number of subjects, respectively. Experimental results show that VERTICOX can achieve accurate model parameter estimation to support federated survival analysis over vertically distributed data by saving bandwidth and avoiding exchange of information about individual patients. The source code for VERTICOX is available at: https://github.com/daiwenrui/VERTICOX.
Collapse
Affiliation(s)
- Wenrui Dai
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Xiaoqian Jiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Luca Bonomi
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA
| | - Yong Li
- Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hongkai Xiong
- Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
10
|
Rossetti SC, Knaplund C, Albers D, Dykes PC, Kang MJ, Korach TZ, Zhou L, Schnock K, Garcia J, Schwartz J, Fu LH, Klann JG, Lowenthal G, Cato K. Healthcare Process Modeling to Phenotype Clinician Behaviors for Exploiting the Signal Gain of Clinical Expertise (HPM-ExpertSignals): Development and evaluation of a conceptual framework. J Am Med Inform Assoc 2021; 28:1242-1251. [PMID: 33624765 PMCID: PMC8200261 DOI: 10.1093/jamia/ocab006] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 12/28/2020] [Accepted: 01/12/2021] [Indexed: 12/23/2022] Open
Abstract
OBJECTIVE There are signals of clinicians' expert and knowledge-driven behaviors within clinical information systems (CIS) that can be exploited to support clinical prediction. Describe development of the Healthcare Process Modeling Framework to Phenotype Clinician Behaviors for Exploiting the Signal Gain of Clinical Expertise (HPM-ExpertSignals). MATERIALS AND METHODS We employed an iterative framework development approach that combined data-driven modeling and simulation testing to define and refine a process for phenotyping clinician behaviors. Our framework was developed and evaluated based on the Communicating Narrative Concerns Entered by Registered Nurses (CONCERN) predictive model to detect and leverage signals of clinician expertise for prediction of patient trajectories. RESULTS Seven themes-identified during development and simulation testing of the CONCERN model-informed framework development. The HPM-ExpertSignals conceptual framework includes a 3-step modeling technique: (1) identify patterns of clinical behaviors from user interaction with CIS; (2) interpret patterns as proxies of an individual's decisions, knowledge, and expertise; and (3) use patterns in predictive models for associations with outcomes. The CONCERN model differentiated at risk patients earlier than other early warning scores, lending confidence to the HPM-ExpertSignals framework. DISCUSSION The HPM-ExpertSignals framework moves beyond transactional data analytics to model clinical knowledge, decision making, and CIS interactions, which can support predictive modeling with a focus on the rapid and frequent patient surveillance cycle. CONCLUSIONS We propose this framework as an approach to embed clinicians' knowledge-driven behaviors in predictions and inferences to facilitate capture of healthcare processes that are activated independently, and sometimes well before, physiological changes are apparent.
Collapse
Affiliation(s)
- Sarah Collins Rossetti
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
- School of Nursing, Columbia University, New York, New York, USA
| | - Chris Knaplund
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Dave Albers
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Patricia C Dykes
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Min Jeoung Kang
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Tom Z Korach
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Li Zhou
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Kumiko Schnock
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Jose Garcia
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, USA
| | | | - Li-Heng Fu
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Jeffrey G Klann
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Graham Lowenthal
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, USA
| | - Kenrick Cato
- School of Nursing, Columbia University, New York, New York, USA
| |
Collapse
|
11
|
Alsunaidi SJ, Almuhaideb AM, Ibrahim NM, Shaikh FS, Alqudaihi KS, Alhaidari FA, Khan IU, Aslam N, Alshahrani MS. Applications of Big Data Analytics to Control COVID-19 Pandemic. SENSORS (BASEL, SWITZERLAND) 2021; 21:2282. [PMID: 33805218 PMCID: PMC8037067 DOI: 10.3390/s21072282] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 03/20/2021] [Accepted: 03/22/2021] [Indexed: 12/29/2022]
Abstract
The COVID-19 epidemic has caused a large number of human losses and havoc in the economic, social, societal, and health systems around the world. Controlling such epidemic requires understanding its characteristics and behavior, which can be identified by collecting and analyzing the related big data. Big data analytics tools play a vital role in building knowledge required in making decisions and precautionary measures. However, due to the vast amount of data available on COVID-19 from various sources, there is a need to review the roles of big data analysis in controlling the spread of COVID-19, presenting the main challenges and directions of COVID-19 data analysis, as well as providing a framework on the related existing applications and studies to facilitate future research on COVID-19 analysis. Therefore, in this paper, we conduct a literature review to highlight the contributions of several studies in the domain of COVID-19-based big data analysis. The study presents as a taxonomy several applications used to manage and control the pandemic. Moreover, this study discusses several challenges encountered when analyzing COVID-19 data. The findings of this paper suggest valuable future directions to be considered for further research and applications.
Collapse
Affiliation(s)
- Shikah J. Alsunaidi
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia; (S.J.A.); (N.M.I.); (K.S.A.); (I.U.K.); (N.A.)
| | - Abdullah M. Almuhaideb
- Department of Networks and Communications, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia;
| | - Nehad M. Ibrahim
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia; (S.J.A.); (N.M.I.); (K.S.A.); (I.U.K.); (N.A.)
| | - Fatema S. Shaikh
- Department of Computer Information Systems, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia;
| | - Kawther S. Alqudaihi
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia; (S.J.A.); (N.M.I.); (K.S.A.); (I.U.K.); (N.A.)
| | - Fahd A. Alhaidari
- Department of Networks and Communications, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia;
| | - Irfan Ullah Khan
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia; (S.J.A.); (N.M.I.); (K.S.A.); (I.U.K.); (N.A.)
| | - Nida Aslam
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia; (S.J.A.); (N.M.I.); (K.S.A.); (I.U.K.); (N.A.)
| | - Mohammed S. Alshahrani
- Department of Emergency Medicine, College of Medicine, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia;
| |
Collapse
|
12
|
Majewski D, Ball S, Bailey P, Bray J, Finn J. Long-term survival among OHCA patients who survive to 30 days: Does initial arrest rhythm remain a prognostic determinant? Resuscitation 2021; 162:128-134. [PMID: 33640430 DOI: 10.1016/j.resuscitation.2021.02.030] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 01/20/2021] [Accepted: 02/16/2021] [Indexed: 10/22/2022]
Abstract
OBJECTIVE To determine whether initial cardiac arrest rhythm remains a prognostic determinant in longer term OHCA survival. METHODS The St John Western Australian OHCA database was used to identify adults who survived for at least 30 days after an OHCA of presumed medical aetiology, in the Perth metropolitan area between 1998 and 2017. Associations between 8-year OHCA survival and variables of interest were analysed using a Multi-Resolution Hazard (MRH) estimator model with 1-year intervals. RESULTS Of the 871 OHCA patients who survived 30 days, 718 (82%) presented with a shockable initial arrest rhythm and 153 (18%) presented with a non-shockable rhythm. Compared to patients with initial shockable arrests, patients with non-shockable arrests experienced increased mortality in the first (HR 3.33, 95% CI 2.12-5.32), second (HR 2.58, 95% CI 1.22-5.15), third (HR 2.21, 95% CI 1.02-4.42) and fourth (HR 2.21, 95% CI 1.02-4.42) year post arrest; however, in subsequent years the initial arrest rhythm ceased to be significantly associated with survival. The overall 8-year survival estimates after adjustment for peri-arrest factors (as potential confounders) were 87% (95% CI 77-93%) for shockable arrests and 73% (95% CI 55-86%) for non-shockable arrests. CONCLUSIONS Patients with non-shockable (as opposed to shockable) initial arrest rhythms experienced higher mortality in the first 4-years following their OHCA; however, after four years the initial arrest rhythm ceased to be associated with survival.
Collapse
Affiliation(s)
- David Majewski
- Prehospital, Resuscitation and Emergency Care Research Unit (PRECRU), School of Nursing, Curtin University, Bentley, WA, Australia.
| | - Stephen Ball
- Prehospital, Resuscitation and Emergency Care Research Unit (PRECRU), School of Nursing, Curtin University, Bentley, WA, Australia; St John WA, Belmont, WA, Australia
| | - Paul Bailey
- Prehospital, Resuscitation and Emergency Care Research Unit (PRECRU), School of Nursing, Curtin University, Bentley, WA, Australia; St John WA, Belmont, WA, Australia
| | - Janet Bray
- Prehospital, Resuscitation and Emergency Care Research Unit (PRECRU), School of Nursing, Curtin University, Bentley, WA, Australia; School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| | - Judith Finn
- Prehospital, Resuscitation and Emergency Care Research Unit (PRECRU), School of Nursing, Curtin University, Bentley, WA, Australia; Medical School (Emergency Medicine), The University of Western Australia, Crawley, WA, Australia; St John WA, Belmont, WA, Australia; School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
13
|
Duan R, Luo C, Schuemie MJ, Tong J, Liang CJ, Chang HH, Boland MR, Bian J, Xu H, Holmes JH, Forrest CB, Morton SC, Berlin JA, Moore JH, Mahoney KB, Chen Y. Learning from local to global: An efficient distributed algorithm for modeling time-to-event data. J Am Med Inform Assoc 2020; 27:1028-1036. [PMID: 32626900 PMCID: PMC7647322 DOI: 10.1093/jamia/ocaa044] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 02/27/2020] [Accepted: 03/28/2020] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE We developed and evaluated a privacy-preserving One-shot Distributed Algorithm to fit a multicenter Cox proportional hazards model (ODAC) without sharing patient-level information across sites. MATERIALS AND METHODS Using patient-level data from a single site combined with only aggregated information from other sites, we constructed a surrogate likelihood function, approximating the Cox partial likelihood function obtained using patient-level data from all sites. By maximizing the surrogate likelihood function, each site obtained a local estimate of the model parameter, and the ODAC estimator was constructed as a weighted average of all the local estimates. We evaluated the performance of ODAC with (1) a simulation study and (2) a real-world use case study using 4 datasets from the Observational Health Data Sciences and Informatics network. RESULTS On the one hand, our simulation study showed that ODAC provided estimates nearly the same as the estimator obtained by analyzing, in a single dataset, the combined patient-level data from all sites (ie, the pooled estimator). The relative bias was <0.1% across all scenarios. The accuracy of ODAC remained high across different sample sizes and event rates. On the other hand, the meta-analysis estimator, which was obtained by the inverse variance weighted average of the site-specific estimates, had substantial bias when the event rate is <5%, with the relative bias reaching 20% when the event rate is 1%. In the Observational Health Data Sciences and Informatics network application, the ODAC estimates have a relative bias <5% for 15 out of 16 log hazard ratios, whereas the meta-analysis estimates had substantially higher bias than ODAC. CONCLUSIONS ODAC is a privacy-preserving and noniterative method for implementing time-to-event analyses across multiple sites. It provides estimates on par with the pooled estimator and substantially outperforms the meta-analysis estimator when the event is uncommon, making it extremely suitable for studying rare events and diseases in a distributed manner.
Collapse
Affiliation(s)
- Rui Duan
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Chongliang Luo
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | | | - Jiayi Tong
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - C Jason Liang
- Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, Bethesda, Maryland, USA
| | - Howard H Chang
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, USA
| | - Mary Regina Boland
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
- Cancer Informatics and eHealth Core, University of Florida Health Cancer Center, Gainesville, Florida, USA
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - John H Holmes
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Christopher B Forrest
- Applied Clinical Research Center, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Sally C Morton
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, USA
| | | | - Jason H Moore
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Kevin B Mahoney
- University of Pennsylvania Health System, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
14
|
Fu LH, Schwartz J, Moy A, Knaplund C, Kang MJ, Schnock KO, Garcia JP, Jia H, Dykes PC, Cato K, Albers D, Rossetti SC. Development and validation of early warning score system: A systematic literature review. J Biomed Inform 2020; 105:103410. [PMID: 32278089 PMCID: PMC7295317 DOI: 10.1016/j.jbi.2020.103410] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 03/19/2020] [Accepted: 03/21/2020] [Indexed: 12/23/2022]
Abstract
OBJECTIVES This review aims to: 1) evaluate the quality of model reporting, 2) provide an overview of methodology for developing and validating Early Warning Score Systems (EWSs) for adult patients in acute care settings, and 3) highlight the strengths and limitations of the methodologies, as well as identify future directions for EWS derivation and validation studies. METHODOLOGY A systematic search was conducted in PubMed, Cochrane Library, and CINAHL. Only peer reviewed articles and clinical guidelines regarding developing and validating EWSs for adult patients in acute care settings were included. 615 articles were extracted and reviewed by five of the authors. Selected studies were evaluated based on the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) checklist. The studies were analyzed according to their study design, predictor selection, outcome measurement, methodology of modeling, and validation strategy. RESULTS A total of 29 articles were included in the final analysis. Twenty-six articles reported on the development and validation of a new EWS, while three reported on validation and model modification. Only eight studies met more than 75% of the items in the TRIPOD checklist. Three major techniques were utilized among the studies to inform their predictive algorithms: 1) clinical-consensus models (n = 6), 2) regression models (n = 15), and 3) tree models (n = 5). The number of predictors included in the EWSs varied from 3 to 72 with a median of seven. Twenty-eight models included vital signs, while 11 included lab data. Pulse oximetry, mental status, and other variables extracted from electronic health records (EHRs) were among other frequently used predictors. In-hospital mortality, unplanned transfer to the intensive care unit (ICU), and cardiac arrest were commonly used clinical outcomes. Twenty-eight studies conducted a form of model validation either within the study or against other widely-used EWSs. Only three studies validated their model using an external database separate from the derived database. CONCLUSION This literature review demonstrates that the characteristics of the cohort, predictors, and outcome selection, as well as the metrics for model validation, vary greatly across EWS studies. There is no consensus on the optimal strategy for developing such algorithms since data-driven models with acceptable predictive accuracy are often site-specific. A standardized checklist for clinical prediction model reporting exists, but few studies have included reporting aligned with it in their publications. Data-driven models are subjected to biases in the use of EHR data, thus it is particularly important to provide detailed study protocols and acknowledge, leverage, or reduce potential biases of the data used for EWS development to improve transparency and generalizability.
Collapse
Affiliation(s)
- Li-Heng Fu
- Department of Biomedical Informatics, Columbia University, New York, NY, United States.
| | - Jessica Schwartz
- School of Nursing, Columbia University, New York, NY, United States
| | - Amanda Moy
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| | - Chris Knaplund
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| | - Min-Jeoung Kang
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States
| | - Kumiko O Schnock
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States
| | - Jose P Garcia
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, United States
| | - Haomiao Jia
- School of Nursing, Columbia University, New York, NY, United States; Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, United States
| | - Patricia C Dykes
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States
| | - Kenrick Cato
- School of Nursing, Columbia University, New York, NY, United States
| | - David Albers
- Department of Biomedical Informatics, Columbia University, New York, NY, United States; Department of Pediatrics, Section of Informatics and Data Science, University of Colorado, Aurora, CO, United States
| | - Sarah Collins Rossetti
- Department of Biomedical Informatics, Columbia University, New York, NY, United States; School of Nursing, Columbia University, New York, NY, United States
| |
Collapse
|
15
|
Song X, Waitman LR, Yu AS, Robbins DC, Hu Y, Liu M. Longitudinal Risk Prediction of Chronic Kidney Disease in Diabetic Patients Using a Temporal-Enhanced Gradient Boosting Machine: Retrospective Cohort Study. JMIR Med Inform 2020; 8:e15510. [PMID: 32012067 PMCID: PMC7055762 DOI: 10.2196/15510] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 10/31/2019] [Accepted: 10/31/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Artificial intelligence-enabled electronic health record (EHR) analysis can revolutionize medical practice from the diagnosis and prediction of complex diseases to making recommendations in patient care, especially for chronic conditions such as chronic kidney disease (CKD), which is one of the most frequent complications in patients with diabetes and is associated with substantial morbidity and mortality. OBJECTIVE The longitudinal prediction of health outcomes requires effective representation of temporal data in the EHR. In this study, we proposed a novel temporal-enhanced gradient boosting machine (GBM) model that dynamically updates and ensembles learners based on new events in patient timelines to improve the prediction accuracy of CKD among patients with diabetes. METHODS Using a broad spectrum of deidentified EHR data on a retrospective cohort of 14,039 adult patients with type 2 diabetes and GBM as the base learner, we validated our proposed Landmark-Boosting model against three state-of-the-art temporal models for rolling predictions of 1-year CKD risk. RESULTS The proposed model uniformly outperformed other models, achieving an area under receiver operating curve of 0.83 (95% CI 0.76-0.85), 0.78 (95% CI 0.75-0.82), and 0.82 (95% CI 0.78-0.86) in predicting CKD risk with automatic accumulation of new data in later years (years 2, 3, and 4 since diabetes mellitus onset, respectively). The Landmark-Boosting model also maintained the best calibration across moderate- and high-risk groups and over time. The experimental results demonstrated that the proposed temporal model can not only accurately predict 1-year CKD risk but also improve performance over time with additionally accumulated data, which is essential for clinical use to improve renal management of patients with diabetes. CONCLUSIONS Incorporation of temporal information in EHR data can significantly improve predictive model performance and will particularly benefit patients who follow-up with their physicians as recommended.
Collapse
Affiliation(s)
- Xing Song
- University of Kansas Medical Center, Department of Internal Medicine, Division of Medical Informatics, Kansas City, KS, United States
| | - Lemuel R Waitman
- University of Kansas Medical Center, Department of Internal Medicine, Division of Medical Informatics, Kansas City, KS, United States
| | - Alan Sl Yu
- University of Kansas Medical Center, Division of Nephrology and Hypertension and the Kidney Institute, Kansas City, KS, United States
| | - David C Robbins
- University of Kansas Medical Center, Diabetes Institute, Kansas City, KS, United States
| | - Yong Hu
- Jinan University, Big Data Decision Institute, Guangzhou, China
| | - Mei Liu
- University of Kansas Medical Center, Department of Internal Medicine, Division of Medical Informatics, Kansas City, KS, United States
| |
Collapse
|
16
|
Alizadehsani R, Roshanzamir M, Abdar M, Beykikhoshk A, Khosravi A, Panahiazar M, Koohestani A, Khozeimeh F, Nahavandi S, Sarrafzadegan N. A database for using machine learning and data mining techniques for coronary artery disease diagnosis. Sci Data 2019; 6:227. [PMID: 31645559 PMCID: PMC6811630 DOI: 10.1038/s41597-019-0206-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 08/16/2019] [Indexed: 12/28/2022] Open
Abstract
We present the coronary artery disease (CAD) database, a comprehensive resource, comprising 126 papers and 68 datasets relevant to CAD diagnosis, extracted from the scientific literature from 1992 and 2018. These data were collected to help advance research on CAD-related machine learning and data mining algorithms, and hopefully to ultimately advance clinical diagnosis and early treatment. To aid users, we have also built a web application that presents the database through various reports.
Collapse
Affiliation(s)
- R Alizadehsani
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
| | - M Roshanzamir
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, 84156-83111, Iran
| | - M Abdar
- Département d'informatique, Université du Québec à Montréal, Montréal, Québec, Canada
| | - A Beykikhoshk
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia
| | - A Khosravi
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
| | - M Panahiazar
- University of California San Francisco, San Francisco, CA, USA.
| | - A Koohestani
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
| | - F Khozeimeh
- Mashhad University of Medical Science, Mashhad, Iran
| | - S Nahavandi
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
| | - N Sarrafzadegan
- Isfahan Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran
- School of Population and Public Health, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
17
|
Hripcsak G, Albers DJ. High-fidelity phenotyping: richness and freedom from bias. J Am Med Inform Assoc 2018; 25:289-294. [PMID: 29040596 PMCID: PMC7282504 DOI: 10.1093/jamia/ocx110] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Revised: 08/07/2017] [Accepted: 09/06/2017] [Indexed: 01/14/2023] Open
Abstract
Electronic health record phenotyping is the use of raw electronic health record data to assert characterizations about patients. Researchers have been doing it since the beginning of biomedical informatics, under different names. Phenotyping will benefit from an increasing focus on fidelity, both in the sense of increasing richness, such as measured levels, degree or severity, timing, probability, or conceptual relationships, and in the sense of reducing bias. Research agendas should shift from merely improving binary assignment to studying and improving richer representations. The field is actively researching new temporal directions and abstract representations, including deep learning. The field would benefit from research in nonlinear dynamics, in combining mechanistic models with empirical data, including data assimilation, and in topology. The health care process produces substantial bias, and studying that bias explicitly rather than treating it as merely another source of noise would facilitate addressing it.
Collapse
Affiliation(s)
- George Hripcsak
- Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA
| | - David J Albers
- Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
18
|
Vassy JL, Ho YL, Honerlaw J, Cho K, Gaziano JM, Wilson PWF, Gagnon DR. Yield and bias in defining a cohort study baseline from electronic health record data. J Biomed Inform 2018; 78:54-59. [PMID: 29305952 PMCID: PMC5846098 DOI: 10.1016/j.jbi.2017.12.017] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 11/07/2017] [Accepted: 12/31/2017] [Indexed: 01/24/2023]
Abstract
AIMS Despite growing interest in using electronic health records (EHR) to create longitudinal cohort studies, the distribution and missingness of EHR data might introduce selection bias and information bias to such analyses. We aimed to examine the yield and potential for these healthcare process biases in defining a study baseline using EHR data, using the example of cholesterol and blood pressure (BP) measurements. METHODS We created a virtual cohort study of cardiovascular disease (CVD) from patients with eligible cholesterol profiles in the New England (NE) and Southeast (SE) networks of the Veterans Health Administration in the United States. Using clinical data from the EHR, we plotted the yield of patients with BP measurements within an expanding timeframe around an index date of cholesterol testing. We compared three groups: (1) patients with BP from the exact index date; (2) patients with BP not on the index date but within the network-specific 90th percentile around the index date; and (3) patients with no BP within the network-specific 90th percentile. RESULTS Among 589,361 total patients in the two networks, 146,636 (61.0%) of 240,479 patients from NE and 289,906 (83.1%) of 348,882 patients from SE had BP measurements on the index date. Ninety percent had BP measured within 11 days of the index date in NE and within 5 days of the index date in SE. Group 3 in both networks had fewer available race data, fewer comorbidities and CVD medications, and fewer health system encounters. CONCLUSIONS Requiring same-day risk factor measurement in the creation of a virtual CVD cohort study from EHR data might exclude 40% of eligible patients, but including patients with infrequent visits might introduce bias. Data visualization can inform study-specific strategies to address these challenges for the research use of EHR data.
Collapse
Affiliation(s)
- Jason L Vassy
- VA Boston Healthcare System, Boston, MA, USA; Division of General Internal Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA.
| | - Yuk-Lam Ho
- VA Boston Healthcare System, Boston, MA, USA
| | | | - Kelly Cho
- VA Boston Healthcare System, Boston, MA, USA; Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - J Michael Gaziano
- VA Boston Healthcare System, Boston, MA, USA; Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Peter W F Wilson
- Atlanta VA Medical Center, Atlanta, GA, USA; Emory University Schools of Medicine and Public Health, Atlanta, GA, USA
| | - David R Gagnon
- VA Boston Healthcare System, Boston, MA, USA; Boston University School of Public Health, Boston, MA, USA
| |
Collapse
|
19
|
Albers DJ, Elhadad N, Claassen J, Perotte R, Goldstein A, Hripcsak G. Estimating summary statistics for electronic health record laboratory data for use in high-throughput phenotyping algorithms. J Biomed Inform 2018; 78:87-101. [PMID: 29369797 PMCID: PMC5856130 DOI: 10.1016/j.jbi.2018.01.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 12/05/2017] [Accepted: 01/14/2018] [Indexed: 01/12/2023]
Abstract
We study the question of how to represent or summarize raw laboratory data taken from an electronic health record (EHR) using parametric model selection to reduce or cope with biases induced through clinical care. It has been previously demonstrated that the health care process (Hripcsak and Albers, 2012, 2013), as defined by measurement context (Hripcsak and Albers, 2013; Albers et al., 2012) and measurement patterns (Albers and Hripcsak, 2010, 2012), can influence how EHR data are distributed statistically (Kohane and Weber, 2013; Pivovarov et al., 2014). We construct an algorithm, PopKLD, which is based on information criterion model selection (Burnham and Anderson, 2002; Claeskens and Hjort, 2008), is intended to reduce and cope with health care process biases and to produce an intuitively understandable continuous summary. The PopKLD algorithm can be automated and is designed to be applicable in high-throughput settings; for example, the output of the PopKLD algorithm can be used as input for phenotyping algorithms. Moreover, we develop the PopKLD-CAT algorithm that transforms the continuous PopKLD summary into a categorical summary useful for applications that require categorical data such as topic modeling. We evaluate our methodology in two ways. First, we apply the method to laboratory data collected in two different health care contexts, primary versus intensive care. We show that the PopKLD preserves known physiologic features in the data that are lost when summarizing the data using more common laboratory data summaries such as mean and standard deviation. Second, for three disease-laboratory measurement pairs, we perform a phenotyping task: we use the PopKLD and PopKLD-CAT algorithms to define high and low values of the laboratory variable that are used for defining a disease state. We then compare the relationship between the PopKLD-CAT summary disease predictions and the same predictions using empirically estimated mean and standard deviation to a gold standard generated by clinical review of patient records. We find that the PopKLD laboratory data summary is substantially better at predicting disease state. The PopKLD or PopKLD-CAT algorithms are not meant to be used as phenotyping algorithms, but we use the phenotyping task to show what information can be gained when using a more informative laboratory data summary. In the process of evaluation our method we show that the different clinical contexts and laboratory measurements necessitate different statistical summaries. Similarly, leveraging the principle of maximum entropy we argue that while some laboratory data only have sufficient information to estimate a mean and standard deviation, other laboratory data captured in an EHR contain substantially more information than can be captured in higher-parameter models.
Collapse
Affiliation(s)
- D J Albers
- Department of Biomedical Informatics, Columbia University, 622 West 168th Street, New York, NY, USA.
| | - N Elhadad
- Department of Biomedical Informatics, Columbia University, 622 West 168th Street, New York, NY, USA.
| | - J Claassen
- Department of Neurology, Columbia University, 710 West 168th Street, New York, NY 10032, USA.
| | - R Perotte
- Value Institute, New York Presbyterian Hospital, 601 West 168th Street New York, NY 10032, USA.
| | - A Goldstein
- Department of Biomedical Informatics, Columbia University, 622 West 168th Street, New York, NY, USA.
| | - G Hripcsak
- Department of Biomedical Informatics, Columbia University, 622 West 168th Street, New York, NY, USA.
| |
Collapse
|
20
|
Hagar Y, Dignam JJ, Dukic V. Flexible modeling of the hazard rate and treatment effects in long-term survival studies. Stat Methods Med Res 2017; 26:2455-2480. [PMID: 28150523 PMCID: PMC5651995 DOI: 10.1177/0962280216688034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
The effects of predictors on time to failure may be difficult to assess in cancer studies with longer follow-up, as the commonly used assumption of proportionality of hazards holding over an extended period is often questionable. Motivated by a long-term prostate cancer clinical trial, we contrast and compare four powerful methods for estimation of the hazard rate. These four methods allow for varying degrees of smoothness as well as covariates with effects that vary over time. We pay particular attention to an extended multiresolution hazard estimator, which is a flexible, semi-parametric, Bayesian method for joint estimation of predictor effects and the hazard rate. We compare the results of the extended multiresolution hazard model to three other commonly used, comparable models: Aalen's additive model, Kooperberg's hazard regression model, and an extended Cox model. Through simulations and the analysis of a large-scale randomized prostate cancer clinical trial, we use the different methods to examine patterns of biochemical failure and to estimate the time-varying effects of androgen deprivation therapy treatment and other covariates.
Collapse
Affiliation(s)
- Yolanda Hagar
- Department of Applied Mathematics, University of Colorado, CO, USA
| | - James J Dignam
- Department of Public Health Sciences, University of Chicago, IL, USA
| | - Vanja Dukic
- Department of Applied Mathematics, University of Colorado, CO, USA
| |
Collapse
|
21
|
Andreu-Perez J, Leff DR, Ip HMD, Yang GZ. From Wearable Sensors to Smart Implants-–Toward Pervasive and Personalized Healthcare. IEEE Trans Biomed Eng 2015; 62:2750-62. [DOI: 10.1109/tbme.2015.2422751] [Citation(s) in RCA: 221] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
22
|
Lu CL, Wang S, Ji Z, Wu Y, Xiong L, Jiang X, Ohno-Machado L. WebDISCO: a web service for distributed cox model learning without patient-level data sharing. J Am Med Inform Assoc 2015; 22:1212-9. [PMID: 26159465 PMCID: PMC5009917 DOI: 10.1093/jamia/ocv083] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Revised: 05/16/2015] [Accepted: 05/26/2015] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE The Cox proportional hazards model is a widely used method for analyzing survival data. To achieve sufficient statistical power in a survival analysis, it usually requires a large amount of data. Data sharing across institutions could be a potential workaround for providing this added power. METHODS AND MATERIALS The authors develop a web service for distributed Cox model learning (WebDISCO), which focuses on the proof-of-concept and algorithm development for federated survival analysis. The sensitive patient-level data can be processed locally and only the less-sensitive intermediate statistics are exchanged to build a global Cox model. Mathematical derivation shows that the proposed distributed algorithm is identical to the centralized Cox model. RESULTS The authors evaluated the proposed framework at the University of California, San Diego (UCSD), Emory, and Duke. The experimental results show that both distributed and centralized models result in near-identical model coefficients with differences in the range [Formula: see text] to [Formula: see text]. The results confirm the mathematical derivation and show that the implementation of the distributed model can achieve the same results as the centralized implementation. LIMITATION The proposed method serves as a proof of concept, in which a publicly available dataset was used to evaluate the performance. The authors do not intend to suggest that this method can resolve policy and engineering issues related to the federated use of institutional data, but they should serve as evidence of the technical feasibility of the proposed approach.Conclusions WebDISCO (Web-based Distributed Cox Regression Model; https://webdisco.ucsd-dbmi.org:8443/cox/) provides a proof-of-concept web service that implements a distributed algorithm to conduct distributed survival analysis without sharing patient level data.
Collapse
Affiliation(s)
- Chia-Lun Lu
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Shuang Wang
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Zhanglong Ji
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Yuan Wu
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, 27708, USA
| | - Li Xiong
- Department of Mathematics & Computer Science, Emory University, Atlanta, GA 30322, USA. Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , , Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| |
Collapse
|
23
|
Abstract
This paper provides an overview of recent developments in big data in the context of biomedical and health informatics. It outlines the key characteristics of big data and how medical and health informatics, translational bioinformatics, sensor informatics, and imaging informatics will benefit from an integrated approach of piecing together different aspects of personalized information from a diverse range of data sources, both structured and unstructured, covering genomics, proteomics, metabolomics, as well as imaging, clinical diagnosis, and long-term continuous physiological sensing of an individual. It is expected that recent advances in big data will expand our knowledge for testing new hypotheses about disease management from diagnosis to prevention to personalized treatment. The rise of big data, however, also raises challenges in terms of privacy, security, data ownership, data stewardship, and governance. This paper discusses some of the existing activities and future opportunities related to big data for health, outlining some of the key underlying issues that need to be tackled.
Collapse
|