1
|
Pinho X, Meijer W, de Graaf A. Deriving Treatment Decision Support From Dutch Electronic Health Records by Exploring the Applicability of a Precision Cohort-Based Procedure for Patients With Type 2 Diabetes Mellitus: Precision Cohort Study. Online J Public Health Inform 2024; 16:e51092. [PMID: 38691393 PMCID: PMC11097050 DOI: 10.2196/51092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 02/28/2024] [Accepted: 03/15/2024] [Indexed: 05/03/2024] Open
Abstract
BACKGROUND The rapidly increasing availability of medical data in electronic health records (EHRs) may contribute to the concept of learning health systems, allowing for better personalized care. Type 2 diabetes mellitus was chosen as the use case in this study. OBJECTIVE This study aims to explore the applicability of a recently developed patient similarity-based analytics approach based on EHRs as a candidate data analytical decision support tool. METHODS A previously published precision cohort analytics workflow was adapted for the Dutch primary care setting using EHR data from the Nivel Primary Care Database. The workflow consisted of extracting patient data from the Nivel Primary Care Database to retrospectively generate decision points for treatment change, training a similarity model, generating a precision cohort of the most similar patients, and analyzing treatment options. This analysis showed the treatment options that led to a better outcome for the precision cohort in terms of clinical readouts for glycemic control. RESULTS Data from 11,490 registered patients diagnosed with type 2 diabetes mellitus were extracted from the database. Treatment-specific filter cohorts of patient groups were generated, and the effect of past treatment choices in these cohorts was assessed separately for glycated hemoglobin and fasting glucose as clinical outcome variables. Precision cohorts were generated for several individual patients from the filter cohorts. Treatment options and outcome analyses were technically well feasible but in general had a lack of statistical power to demonstrate statistical significance for treatment options with better outcomes. CONCLUSIONS The precision cohort analytics workflow was successfully adapted for the Dutch primary care setting, proving its potential for use as a learning health system component. Although the approach proved technically well feasible, data size limitations need to be overcome before application for clinical decision support becomes realistically possible.
Collapse
Affiliation(s)
- Xavier Pinho
- Netherlands Organisation for Applied Scientific Research (TNO), Utrecht, Netherlands
| | - Willemijn Meijer
- Nivel, Nederlands Instituut voor Onderzoek van de Gezondheidszorg, Utrecht, Netherlands
| | - Albert de Graaf
- Netherlands Organisation for Applied Scientific Research (TNO), Utrecht, Netherlands
| |
Collapse
|
2
|
Ahmed MS, Hasan T, Islam S, Ahmed N. Investigating Rhythmicity in App Usage to Predict Depressive Symptoms: Protocol for Personalized Framework Development and Validation Through a Countrywide Study. JMIR Res Protoc 2024; 13:e51540. [PMID: 38657238 PMCID: PMC11079771 DOI: 10.2196/51540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 12/27/2023] [Accepted: 01/11/2024] [Indexed: 04/26/2024] Open
Abstract
BACKGROUND Understanding a student's depressive symptoms could facilitate significantly more precise diagnosis and treatment. However, few studies have focused on depressive symptom prediction through unobtrusive systems, and these studies are limited by small sample sizes, low performance, and the requirement for higher resources. In addition, research has not explored whether statistically significant rhythms based on different app usage behavioral markers (eg, app usage sessions) exist that could be useful in finding subtle differences to predict with higher accuracy like the models based on rhythms of physiological data. OBJECTIVE The main objective of this study is to explore whether there exist statistically significant rhythms in resource-insensitive app usage behavioral markers and predict depressive symptoms through these marker-based rhythmic features. Another objective of this study is to understand whether there is a potential link between rhythmic features and depressive symptoms. METHODS Through a countrywide study, we collected 2952 students' raw app usage behavioral data and responses to the 9 depressive symptoms in the 9-item Patient Health Questionnaire (PHQ-9). The behavioral data were retrieved through our developed app, which was previously used in our pilot studies in Bangladesh on different research problems. To explore whether there is a rhythm based on app usage data, we will conduct a zero-amplitude test. In addition, we will develop a cosinor model for each participant to extract rhythmic parameters (eg, acrophase). In addition, to obtain a comprehensive picture of the rhythms, we will explore nonparametric rhythmic features (eg, interdaily stability). Furthermore, we will conduct regression analysis to understand the association of rhythmic features with depressive symptoms. Finally, we will develop a personalized multitask learning (MTL) framework to predict symptoms through rhythmic features. RESULTS After applying inclusion criteria (eg, having app usage data of at least 2 days to explore rhythmicity), we kept the data of 2902 (98.31%) students for analysis, with 24.48 million app usage events, and 7 days' app usage of 2849 (98.17%) students. The students are from all 8 divisions of Bangladesh, both public and private universities (19 different universities and 52 different departments). We are analyzing the data and will publish the findings in a peer-reviewed publication. CONCLUSIONS Having an in-depth understanding of app usage rhythms and their connection with depressive symptoms through a countrywide study can significantly help health care professionals and researchers better understand depressed students and may create possibilities for using app usage-based rhythms for intervention. In addition, the MTL framework based on app usage rhythmic features may more accurately predict depressive symptoms due to the rhythms' capability to find subtle differences. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/51540.
Collapse
Affiliation(s)
- Md Sabbir Ahmed
- Design Inclusion and Access Lab, North South University, Dhaka, Bangladesh
| | - Tanvir Hasan
- Design Inclusion and Access Lab, North South University, Dhaka, Bangladesh
| | - Salekul Islam
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Nova Ahmed
- Design Inclusion and Access Lab, North South University, Dhaka, Bangladesh
| |
Collapse
|
3
|
Seki T, Kawazoe Y, Ohe K. Clinical Feature Vector Generation using Unsupervised Graph Representation Learning from Heterogeneous Medical Records. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:618-623. [PMID: 38222342 PMCID: PMC10785854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
The diversity of patient information recorded on electronic medical records generally, presents a challenge for converting it into fixed-length vectors that align with clinical characteristics. To address this issue, this study aimed to utilize an unsupervised graph representation learning method to transform the unstructured inpatient information from electronic medical records into a fixed-length vector. Infograph, one of the unsupervised graph representation learning algorithms was applied to the graphed inpatient information, resulting in embedded vectors of fixed length. The embedded vectors were then evaluated for whether the clinical information was preserved in it. The results indicated that the embedded representation contained information that could predict readmission within 30 days, demonstrating the feasibility of using unsupervised graph representation learning to transform patient information into fixed-length vectors that retain clinical characteristics.
Collapse
Affiliation(s)
- Tomohisa Seki
- Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan
| | - Yoshimasa Kawazoe
- Artificial Intelligence in Healthcare, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Kazuhiko Ohe
- Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
4
|
Rigdon J, Ostasiewski B, Woelfel K, Wiseman KD, Hetherington T, Downs S, Kowalkowski M. Automated generation of comparator patients in the electronic medical record. Learn Health Syst 2024; 8:e10362. [PMID: 38249842 PMCID: PMC10797581 DOI: 10.1002/lrh2.10362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 02/17/2023] [Accepted: 02/18/2023] [Indexed: 03/30/2023] Open
Abstract
Background Well-designed randomized trials provide high-quality clinical evidence but are not always feasible or ethical. In their absence, the electronic medical record (EMR) presents a platform to conduct comparative effectiveness research, central to the emerging academic learning health system (aLHS) model. A barrier to realizing this vision is the lack of a process to efficiently generate a reference comparison group for each patient. Objective To test a multi-step process for the selection of comparators in the EMR. Materials and Methods We conducted a mixed-methods study within a large aLHS in North Carolina. We (1) created a list of 35 candidate variables; (2) surveyed 270 researchers to assess the importance of candidate variables; and (3) built consensus rankings around survey-identified variables (ie, importance scores >7) across two panels of 7-8 clinical research experts. Prioritized algorithm inputs were collected from the EMR and applied using a greedy matching technique. Feasibility was measured as the percentage of patients with 100 matched comparators and performance was measured via computational time and Euclidean distance. Results Nine variables were selected: age, sex, race, ethnicity, body mass index, insurance status, smoking status, Charlson Comorbidity Index, and neighborhood percentage in poverty. The final process successfully generated 100 matched comparators for each of 1.8 million candidate patients, executed in less than 100 min for the majority of strata, and had average Euclidean distance 0.043. Conclusion EMR-derived matching is feasible to implement across a diverse patient population and can provide a reproducible, efficient source of comparator data for observational studies, with additional testing in clinical research applications needed.
Collapse
Affiliation(s)
- Joseph Rigdon
- Department of Biostatistics and Data ScienceWake Forest University School of MedicineWinston‐SalemNorth CarolinaUSA
- Center for Biomedical InformaticsWake Forest University School of MedicineWinston‐SalemNorth CarolinaUSA
- Clinical and Translational Science InstituteWake Forest University School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Brian Ostasiewski
- Center for Biomedical InformaticsWake Forest University School of MedicineWinston‐SalemNorth CarolinaUSA
- Clinical and Translational Science InstituteWake Forest University School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Kamah Woelfel
- Clinical and Translational Science InstituteWake Forest University School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Kimberly D. Wiseman
- Department of Social Sciences and Health PolicyWake Forest University School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Tim Hetherington
- Clinical and Translational Science InstituteWake Forest University School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Stephen Downs
- Center for Biomedical InformaticsWake Forest University School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Marc Kowalkowski
- Clinical and Translational Science InstituteWake Forest University School of MedicineWinston‐SalemNorth CarolinaUSA
| |
Collapse
|
5
|
Ma M, Sun P, Li Y, Huo W. Predicting the risk of mortality in ICU patients based on dynamic graph attention network of patient similarity. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:15326-15344. [PMID: 37679182 DOI: 10.3934/mbe.2023685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Predicting the risk of mortality of hospitalized patients in the ICU is essential for timely identification of high-risk patients and formulate and adjustment of treatment strategies when patients are hospitalized. Traditional machine learning methods usually ignore the similarity between patients and make it difficult to uncover the hidden relationships between patients, resulting in poor accuracy of prediction models. In this paper, we propose a new model named PS-DGAT to solve the above problem. First, we construct a patient-weighted similarity network by calculating the similarity of patient clinical data to represent the similarity relationship between patients; second, we fill in the missing features and reconstruct the patient similarity network based on the data of neighboring patients in the network; finally, from the reconstructed patient similarity network after feature completion, we use the dynamic attention mechanism to extract and learn the structural features of the nodes to obtain a vector representation of each patient node in the low-dimensional embedding The vector representation of each patient node in the low-dimensional embedding space is used to achieve patient mortality risk prediction. The experimental results show that the accuracy is improved by about 1.8% compared with the basic GAT and about 8% compared with the traditional machine learning methods.
Collapse
Affiliation(s)
- Manfu Ma
- College of Computer Science and Engineering, Northwest Normal University, 967 Anning East Road, Lanzhou 730070, China
| | - Penghui Sun
- College of Computer Science and Engineering, Northwest Normal University, 967 Anning East Road, Lanzhou 730070, China
| | - Yong Li
- College of Computer Science and Engineering, Northwest Normal University, 967 Anning East Road, Lanzhou 730070, China
| | - Weilong Huo
- College of Traffic and Transportation, Lanzhou Jiaotong University, 88 Anning West Road, Lanzhou 730070, China
| |
Collapse
|
6
|
Pikoula M, Kallis C, Madjiheurem S, Quint JK, Bafadhel M, Denaxas S. Evaluation of data processing pipelines on real-world electronic health records data for the purpose of measuring patient similarity. PLoS One 2023; 18:e0287264. [PMID: 37319288 PMCID: PMC10270623 DOI: 10.1371/journal.pone.0287264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 06/01/2023] [Indexed: 06/17/2023] Open
Abstract
BACKGROUND The ever-growing size, breadth, and availability of patient data allows for a wide variety of clinical features to serve as inputs for phenotype discovery using cluster analysis. Data of mixed types in particular are not straightforward to combine into a single feature vector, and techniques used to address this can be biased towards certain data types in ways that are not immediately obvious or intended. In this context, the process of constructing clinically meaningful patient representations from complex datasets has not been systematically evaluated. AIMS Our aim was to a) outline and b) implement an analytical framework to evaluate distinct methods of constructing patient representations from routine electronic health record data for the purpose of measuring patient similarity. We applied the analysis on a patient cohort diagnosed with chronic obstructive pulmonary disease. METHODS Using data from the CALIBER data resource, we extracted clinically relevant features for a cohort of patients diagnosed with chronic obstructive pulmonary disease. We used four different data processing pipelines to construct lower dimensional patient representations from which we calculated patient similarity scores. We described the resulting representations, ranked the influence of each individual feature on patient similarity and evaluated the effect of different pipelines on clustering outcomes. Experts evaluated the resulting representations by rating the clinical relevance of similar patient suggestions with regard to a reference patient. RESULTS Each of the four pipelines resulted in similarity scores primarily driven by a unique set of features. It was demonstrated that data transformations according to each pipeline prior to clustering can result in a variation of clustering results of over 40%. The most appropriate pipeline was selected on the basis of feature ranking and clinical expertise. There was moderate agreement between clinicians as measured by Cohen's kappa coefficient. CONCLUSIONS Data transformation has downstream and unforeseen consequences in cluster analysis. Rather than viewing this process as a black box, we have shown ways to quantitatively and qualitatively evaluate and select the appropriate preprocessing pipeline.
Collapse
Affiliation(s)
- Maria Pikoula
- Institute of Health Informatics, University College London, London, United Kingdom
| | - Constantinos Kallis
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Sephora Madjiheurem
- Department of Electronic and Electrical Engineering, University College London, London, United Kingdom
| | - Jennifer K. Quint
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Mona Bafadhel
- School of Immunology and Microbial Sciences, King’s College London, London, United Kingdom
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, United Kingdom
| |
Collapse
|
7
|
Liu Q, Ostinelli EG, De Crescenzo F, Li Z, Tomlinson A, Salanti G, Cipriani A, Efthimiou O. Predicting outcomes at the individual patient level: what is the best method? BMJ MENTAL HEALTH 2023; 26:e300701. [PMID: 37316257 PMCID: PMC10277128 DOI: 10.1136/bmjment-2023-300701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 04/26/2023] [Indexed: 06/16/2023]
Abstract
OBJECTIVE When developing prediction models, researchers commonly employ a single model which uses all the available data (end-to-end approach). Alternatively, a similarity-based approach has been previously proposed, in which patients with similar clinical characteristics are first grouped into clusters, then prediction models are developed within each cluster. The potential advantage of the similarity-based approach is that it may better address heterogeneity in patient characteristics. However, it remains unclear whether it improves the overall predictive performance. We illustrate the similarity-based approach using data from people with depression and empirically compare its performance with the end-to-end approach. METHODS We used primary care data collected in general practices in the UK. Using 31 predefined baseline variables, we aimed to predict the severity of depressive symptoms, measured by Patient Health Questionnaire-9, 60 days after initiation of antidepressant treatment. Following the similarity-based approach, we used k-means to cluster patients based on their baseline characteristics. We derived the optimal number of clusters using the Silhouette coefficient. We used ridge regression to build prediction models in both approaches. To compare the models' performance, we calculated the mean absolute error (MAE) and the coefficient of determination (R2) using bootstrapping. RESULTS We analysed data from 16 384 patients. The end-to-end approach resulted in an MAE of 4.64 and R2 of 0.20. The best-performing similarity-based model was for four clusters, with MAE of 4.65 and R2 of 0.19. CONCLUSIONS The end-to-end and the similarity-based model yielded comparable performance. Due to its simplicity, the end-to-end approach can be favoured when using demographic and clinical data to build prediction models on pharmacological treatments for depression.
Collapse
Affiliation(s)
- Qiang Liu
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
- Department of Engineering Mathematics, University of Bristol, Bristol, UK
| | - Edoardo Giuseppe Ostinelli
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
- Oxford Health NHS Foundation Trust, Warneford Hospital, Oxford, UK
| | - Franco De Crescenzo
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
| | - Zhenpeng Li
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
| | - Anneka Tomlinson
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
| | - Georgia Salanti
- Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
| | - Andrea Cipriani
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
- Oxford Health NHS Foundation Trust, Warneford Hospital, Oxford, UK
| | - Orestis Efthimiou
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
- Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
- Institute of Primary Health Care (BIHAM), University of Bern, Bern, Switzerland
| |
Collapse
|
8
|
Jo H, Jun CH. A personalized classification model using similarity learning via supervised autoencoder. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
9
|
Omar N, Nazirun NN, Vijayam B, Wahab AA, Bahuri HA. Diabetes subtypes classification for personalized health care: A review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10202-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
10
|
Gliozzo J, Mesiti M, Notaro M, Petrini A, Patak A, Puertas-Gallardo A, Paccanaro A, Valentini G, Casiraghi E. Heterogeneous data integration methods for patient similarity networks. Brief Bioinform 2022; 23:6604996. [PMID: 35679533 PMCID: PMC9294435 DOI: 10.1093/bib/bbac207] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 04/14/2022] [Accepted: 05/04/2022] [Indexed: 12/29/2022] Open
Abstract
Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms. The advent of high-throughput technologies, enabling us to acquire high-dimensional views of the same patients (e.g. omics data, laboratory data, imaging data), calls for the development of data fusion techniques for PSNs in order to leverage this rich heterogeneous information. In this article, we review existing methods for integrating multiple biomedical data views to construct PSNs, together with the different patient similarity measures that have been proposed. We also review methods that have appeared in the machine learning literature but have not yet been applied to PSNs, thus providing a resource to navigate the vast machine learning literature existing on this topic. In particular, we focus on methods that could be used to integrate very heterogeneous datasets, including multi-omics data as well as data derived from clinical information and medical imaging.
Collapse
Affiliation(s)
- Jessica Gliozzo
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,European Commission, Joint Research Centre (JRC), Ispra (VA), Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Marco Mesiti
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Marco Notaro
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Alessandro Petrini
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Alex Patak
- European Commission, Joint Research Centre (JRC), Ispra (VA), Italy
| | | | - Alberto Paccanaro
- Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX UK.,School of Applied Mathematics (EMAp), Fundação Getúlio Vargas, Rio de Janeiro Brazil
| | - Giorgio Valentini
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy.,DSRC UNIMI, Data Science Research Center, Milano, 20135, Italy.,ELLIS, European Laboratory for Learning and Intelligent Systems, Berlin, Germany
| | - Elena Casiraghi
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| |
Collapse
|
11
|
Gim JA. A Genomic Information Management System for Maintaining Healthy Genomic States and Application of Genomic Big Data in Clinical Research. Int J Mol Sci 2022; 23:5963. [PMID: 35682641 PMCID: PMC9180925 DOI: 10.3390/ijms23115963] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 05/22/2022] [Accepted: 05/25/2022] [Indexed: 01/19/2023] Open
Abstract
Improvements in next-generation sequencing (NGS) technology and computer systems have enabled personalized therapies based on genomic information. Recently, health management strategies using genomics and big data have been developed for application in medicine and public health science. In this review, I first discuss the development of a genomic information management system (GIMS) to maintain a highly detailed health record and detect diseases by collecting the genomic information of one individual over time. Maintaining a health record and detecting abnormal genomic states are important; thus, the development of a GIMS is necessary. Based on the current research status, open public data, and databases, I discuss the possibility of a GIMS for clinical use. I also discuss how the analysis of genomic information as big data can be applied for clinical and research purposes. Tremendous volumes of genomic information are being generated, and the development of methods for the collection, cleansing, storing, indexing, and serving must progress under legal regulation. Genetic information is a type of personal information and is covered under privacy protection; here, I examine the regulations on the use of genetic information in different countries. This review provides useful insights for scientists and clinicians who wish to use genomic information for healthy aging and personalized medicine.
Collapse
Affiliation(s)
- Jeong-An Gim
- Medical Science Research Center, College of Medicine, Korea University Guro Hospital, Seoul 08308, Korea
| |
Collapse
|
12
|
Personalised Outcomes Forecasts of Supervised Exercise Therapy in Intermittent Claudication: An Application of Neighbours Based Prediction Methods with Routinely Collected Clinical Data. Eur J Vasc Endovasc Surg 2022; 63:594-601. [PMID: 35210160 DOI: 10.1016/j.ejvs.2021.12.040] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 12/08/2021] [Accepted: 12/29/2021] [Indexed: 11/21/2022]
Abstract
OBJECTIVE Insights regarding individual patient prognosis may improve exercise therapy by informing patient expectations, promoting exercise adherence, and facilitating tailored care. Therefore, the aim was to develop and evaluate personalised outcomes forecasts for functional claudication distance over six months of supervised exercise therapy for patients with intermittent claudication. METHODS Data of 5 940 patients were eligible for analysis. Neighbours based predictions were generated via an adaptation of predictive mean matching. Data from the nearest 223 matches (a.k.a. neighbours) for an index patient were modelled via Generalised Additive Model for Location Scale and Shape (GAMLSS). The realised outcome measures were then evaluated against the GAMLSS model, and the average bias, coverage, and precision were calculated. Model calibration was analysed via within sample and of sample analyses. RESULTS Neighbours based predictions demonstrated small average bias (- 0.04 standard deviations; ideal = 0) and accurate average coverage (48.7% of realised data within 50% prediction interval; ideal = 50%). Moreover, neighbours based predictions improved prediction precision by 24%, compared with estimates derived from the whole sample. Both within sample and of sample testing showed predictions to be well calibrated. CONCLUSION Neighbours based prediction is a method for generating accurate personalised outcomes forecasts for patients with intermittent claudication undertaking supervised exercise therapy. Future work should examine the influence of personalised outcomes forecasts on clinical decisions and patient outcomes.
Collapse
|
13
|
Wang N, Wang M, Zhou Y, Liu H, Wei L, Fei X, Chen H. Sequential Data-Based Patient Similarity Framework for Patient Outcome Prediction: Algorithm Development. J Med Internet Res 2022; 24:e30720. [PMID: 34989682 PMCID: PMC8778569 DOI: 10.2196/30720] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 10/08/2021] [Accepted: 11/08/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Sequential information in electronic medical records is valuable and helpful for patient outcome prediction but is rarely used for patient similarity measurement because of its unevenness, irregularity, and heterogeneity. OBJECTIVE We aimed to develop a patient similarity framework for patient outcome prediction that makes use of sequential and cross-sectional information in electronic medical record systems. METHODS Sequence similarity was calculated from timestamped event sequences using edit distance, and trend similarity was calculated from time series using dynamic time warping and Haar decomposition. We also extracted cross-sectional information, namely, demographic, laboratory test, and radiological report data, for additional similarity calculations. We validated the effectiveness of the framework by constructing k-nearest neighbors classifiers to predict mortality and readmission for acute myocardial infarction patients, using data from (1) a public data set and (2) a private data set, at 3 time points-at admission, on Day 7, and at discharge-to provide early warning patient outcomes. We also constructed state-of-the-art Euclidean-distance k-nearest neighbor, logistic regression, random forest, long short-term memory network, and recurrent neural network models, which were used for comparison. RESULTS With all available information during a hospitalization episode, predictive models using the similarity model outperformed baseline models based on both public and private data sets. For mortality predictions, all models except for the logistic regression model showed improved performances over time. There were no such increasing trends in predictive performances for readmission predictions. The random forest and logistic regression models performed best for mortality and readmission predictions, respectively, when using information from the first week after admission. CONCLUSIONS For patient outcome predictions, the patient similarity framework facilitated sequential similarity calculations for uneven electronic medical record data and helped improve predictive performance.
Collapse
Affiliation(s)
- Ni Wang
- School of Biomedical Engineering, Capital Medical University, Beijing, China.,Beijing Advanced Innovation Center for Big Data-based Precision Medicine, Capital Medical University, Beijing, China
| | - Muyu Wang
- School of Biomedical Engineering, Capital Medical University, Beijing, China.,Beijing Advanced Innovation Center for Big Data-based Precision Medicine, Capital Medical University, Beijing, China
| | - Yang Zhou
- Department of Epidemiology and Biostatistics, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing, China
| | - Honglei Liu
- School of Biomedical Engineering, Capital Medical University, Beijing, China.,Beijing Advanced Innovation Center for Big Data-based Precision Medicine, Capital Medical University, Beijing, China
| | - Lan Wei
- Information Center, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Xiaolu Fei
- Information Center, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Hui Chen
- School of Biomedical Engineering, Capital Medical University, Beijing, China.,Beijing Advanced Innovation Center for Big Data-based Precision Medicine, Capital Medical University, Beijing, China
| |
Collapse
|
14
|
Oh SH, Back S, Park J. Measuring Patient Similarity on Multiple Diseases by Joint Learning via a Convolutional Neural Network. SENSORS (BASEL, SWITZERLAND) 2021; 22:131. [PMID: 35009673 PMCID: PMC8749530 DOI: 10.3390/s22010131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Revised: 12/22/2021] [Accepted: 12/23/2021] [Indexed: 06/14/2023]
Abstract
Patient similarity research is one of the most fundamental tasks in healthcare, helping to make decisions without incurring additional time and costs in clinical practices. Patient similarity can also apply to various medical fields, such as cohort analysis and personalized treatment recommendations. Because of this importance, patient similarity measurement studies are actively being conducted. However, medical data have complex, irregular, and sequential characteristics, making it challenging to measure similarity. Therefore, measuring accurate similarity is a significant problem. Existing similarity measurement studies use supervised learning to calculate the similarity between patients, with similarity measurement studies conducted only on one specific disease. However, it is not realistic to consider only one kind of disease, because other conditions usually accompany it; a study to measure similarity with multiple diseases is needed. This research proposes a convolution neural network-based model that jointly combines feature learning and similarity learning to define similarity in patients with multiple diseases. We used the cohort data from the National Health Insurance Sharing Service of Korea for the experiment. Experimental results verify that the proposed model has outstanding performance when compared to other existing models for measuring multiple-disease patient similarity.
Collapse
Affiliation(s)
- Sang Ho Oh
- Research Center of Electrical and Information Technology, Seoul National University of Science and Technology, Seoul 01811, Korea;
| | - Seunghwa Back
- Department of Industrial Engineering, Yonsei University, Seoul 03722, Korea;
| | - Jongyoul Park
- Research Center of Electrical and Information Technology, Seoul National University of Science and Technology, Seoul 01811, Korea;
- Department of Applied Artificial Intelligence, Seoul National University of Science and Technology, Seoul 01811, Korea
| |
Collapse
|
15
|
Huang HZ, Lu XD, Guo W, Jiang XB, Yan ZM, Wang SP. Heterogeneous Information Network-Based Patient Similarity Search. Front Cell Dev Biol 2021; 9:735687. [PMID: 34568345 PMCID: PMC8456037 DOI: 10.3389/fcell.2021.735687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Accepted: 07/30/2021] [Indexed: 11/13/2022] Open
Abstract
Patient similarity search is a fundamental and important task in artificial intelligence-assisted medicine service, which is beneficial to medical diagnosis, such as making accurate predictions for similar diseases and recommending personalized treatment plans. Existing patient similarity search methods retrieve medical events associated with patients from Electronic Health Record (EHR) data and map them to vectors. The similarity between patients is expressed by calculating the similarity or dissimilarity between the corresponding vectors of medical events, thereby completing the patient similarity measurement. However, the obtained vectors tend to be high dimensional and sparse, which makes it hard to calculate patient similarity accurately. In addition, most of existing methods cannot capture the time information in the EHR, which is not conducive to analyzing the influence of time factors on patient similarity search. To solve these problems, we propose a patient similarity search method based on a heterogeneous information network. On the one hand, the proposed method uses a heterogeneous information network to connect patients, diseases, and drugs, which solves the problem of vector representation of mixed information related to patients, diseases, and drugs. Meanwhile, our method measures the similarity between patients by calculating the similarity between nodes in the heterogeneous information network. In this way, the challenges caused by high-dimensional and sparse vectors can be addressed. On the other hand, the proposed method solves the problem of inaccurate patient similarity search caused by the lack of use of time information in the patient similarity measurement process by encoding time information into an annotated heterogeneous information network. Experiments show that our method is better than the compared baseline methods.
Collapse
Affiliation(s)
- Hao-Zhe Huang
- School of Software, Shandong University, Jinan, China
| | - Xu-Dong Lu
- School of Software, Shandong University, Jinan, China
| | - Wei Guo
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, China
| | - Xin-Bo Jiang
- School of Software, Shandong University, Jinan, China.,Shandong Provincial Key Laboratory of Software Engineering, Shandong University, Jinan, China
| | - Zhong-Min Yan
- School of Software, Shandong University, Jinan, China
| | - Shi-Peng Wang
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
16
|
Wang N, Huang Y, Liu H, Zhang Z, Wei L, Fei X, Chen H. Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records. BMC Med Inform Decis Mak 2021; 21:58. [PMID: 34330261 PMCID: PMC8323210 DOI: 10.1186/s12911-021-01432-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 02/09/2021] [Indexed: 12/24/2022] Open
Abstract
Background A new learning-based patient similarity measurement was proposed to measure patients’ similarity for heterogeneous electronic medical records (EMRs) data. Methods We first calculated feature-level similarities according to the features’ attributes. A domain expert provided patient similarity scores of 30 randomly selected patients. These similarity scores and feature-level similarities for 30 patients comprised the labeled sample set, which was used for the semi-supervised learning algorithm to learn the patient-level similarities for all patients. Then we used the k-nearest neighbor (kNN) classifier to predict four liver conditions. The predictive performances were compared in four different situations. We also compared the performances between personalized kNN models and other machine learning models. We assessed the predictive performances by the area under the receiver operating characteristic curve (AUC), F1-score, and cross-entropy (CE) loss. Results As the size of the random training samples increased, the kNN models using the learned patient similarity to select near neighbors consistently outperformed those using the Euclidean distance to select near neighbors (all P values < 0.001). The kNN models using the learned patient similarity to identify the top k nearest neighbors from the random training samples also had a higher best-performance (AUC: 0.95 vs. 0.89, F1-score: 0.84 vs. 0.67, and CE loss: 1.22 vs. 1.82) than those using the Euclidean distance. As the size of the similar training samples increased, which composed the most similar samples determined by the learned patient similarity, the performance of kNN models using the simple Euclidean distance to select the near neighbors degraded gradually. When exchanging the role of the Euclidean distance, and the learned patient similarity in selecting the near neighbors and similar training samples, the performance of the kNN models gradually increased. These two kinds of kNN models had the same best-performance of AUC 0.95, F1-score 0.84, and CE loss 1.22. Among the four reference models, the highest AUC and F1-score were 0.94 and 0.80, separately, which were both lower than those for the simple and similarity-based kNN models. Conclusions This learning-based method opened an opportunity for similarity measurement based on heterogeneous EMR data and supported the secondary use of EMR data.
Collapse
Affiliation(s)
- Ni Wang
- School of Biomedical Engineering, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing, 100069, People's Republic of China.,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, 100069, People's Republic of China
| | - Yanqun Huang
- School of Biomedical Engineering, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing, 100069, People's Republic of China.,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, 100069, People's Republic of China
| | - Honglei Liu
- School of Biomedical Engineering, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing, 100069, People's Republic of China.,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, 100069, People's Republic of China
| | - Zhiqiang Zhang
- School of Biomedical Engineering, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing, 100069, People's Republic of China.,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, 100069, People's Republic of China
| | - Lan Wei
- Information Center, Xuanwu Hospital, Capital Medical University, Beijing, 100053, People's Republic of China
| | - Xiaolu Fei
- Information Center, Xuanwu Hospital, Capital Medical University, Beijing, 100053, People's Republic of China
| | - Hui Chen
- School of Biomedical Engineering, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing, 100069, People's Republic of China. .,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, 100069, People's Republic of China.
| |
Collapse
|
17
|
Oei RW, Fang HSA, Tan WY, Hsu W, Lee ML, Tan NC. Using Domain Knowledge and Data-Driven Insights for Patient Similarity Analytics. J Pers Med 2021; 11:jpm11080699. [PMID: 34442343 PMCID: PMC8398126 DOI: 10.3390/jpm11080699] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/15/2021] [Accepted: 07/21/2021] [Indexed: 12/23/2022] Open
Abstract
Patient similarity analytics has emerged as an essential tool to identify cohorts of patients who have similar clinical characteristics to some specific patient of interest. In this study, we propose a patient similarity measure called D3K that incorporates domain knowledge and data-driven insights. Using the electronic health records (EHRs) of 169,434 patients with either diabetes, hypertension or dyslipidaemia (DHL), we construct patient feature vectors containing demographics, vital signs, laboratory test results, and prescribed medications. We discretize the variables of interest into various bins based on domain knowledge and make the patient similarity computation to be aligned with clinical guidelines. Key findings from this study are: (1) D3K outperforms baseline approaches in all seven sub-cohorts; (2) our domain knowledge-based binning strategy outperformed the traditional percentile-based binning in all seven sub-cohorts; (3) there is substantial agreement between D3K and physicians (κ = 0.746), indicating that D3K can be applied to facilitate shared decision making. This is the first study to use patient similarity analytics on a cardiometabolic syndrome-related dataset sourced from medical institutions in Singapore. We consider patient similarity among patient cohorts with the same medical conditions to develop localized models for personalized decision support to improve the outcomes of a target patient.
Collapse
Affiliation(s)
- Ronald Wihal Oei
- Institute of Data Science, National University of Singapore, Singapore 117602, Singapore; (W.-Y.T.); (W.H.); (M.-L.L.)
- Correspondence:
| | - Hao Sen Andrew Fang
- SingHealth Polyclinics, SingHealth, Singapore 150167, Singapore; (H.S.A.F.); (N.-C.T.)
| | - Wei-Ying Tan
- Institute of Data Science, National University of Singapore, Singapore 117602, Singapore; (W.-Y.T.); (W.H.); (M.-L.L.)
| | - Wynne Hsu
- Institute of Data Science, National University of Singapore, Singapore 117602, Singapore; (W.-Y.T.); (W.H.); (M.-L.L.)
- School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Mong-Li Lee
- Institute of Data Science, National University of Singapore, Singapore 117602, Singapore; (W.-Y.T.); (W.H.); (M.-L.L.)
- School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Ngiap-Chuan Tan
- SingHealth Polyclinics, SingHealth, Singapore 150167, Singapore; (H.S.A.F.); (N.-C.T.)
| |
Collapse
|
18
|
Xu D, Sheng JQ, Hu PJH, Huang TS, Hsu CC. A Deep Learning-Based Unsupervised Method to Impute Missing Values in Patient Records for Improved Management of Cardiovascular Patients. IEEE J Biomed Health Inform 2021; 25:2260-2272. [PMID: 33095720 DOI: 10.1109/jbhi.2020.3033323] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Physicians increasingly depend on electronic health records (EHRs) to manage their patients. However, many patient records have substantial missing values that pose a fundamental challenge to their clinical use. To address this prevailing challenge, we propose an unsupervised deep learning-based method that can facilitate physicians' use of EHRs to improve their management of cardiovascular patients. By building on the deep autoencoder framework, we develop a novel method to impute missing values in patient records. To demonstrate its clinical applicability and values, we use data from cardiovascular patients and evaluate the proposed method's imputation effectiveness and predictive efficacy, in comparison with six prevalent benchmark techniques. The proposed method can impute missing values and predict important patient outcomes more effectively than all the benchmark techniques. This study reinforces the importance of adequately addressing missing values in patient records. It further illustrates how effective imputations can enable greater predictive efficacy with regard to important patient outcomes, which are crucial to the use of EHRs and health analytics for improved patient management. Supported by the complete data imputed by the proposed method, physicians can make timely patient outcome estimations (predictions) and therapeutic treatment assessments.
Collapse
|
19
|
Yong Z, Luo L, Gu Y, Li C. Implication of excessive length of stay of asthma patient with heterogenous status attributed to air pollution. JOURNAL OF ENVIRONMENTAL HEALTH SCIENCE & ENGINEERING 2021; 19:95-106. [PMID: 34150221 PMCID: PMC8172679 DOI: 10.1007/s40201-020-00584-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 11/05/2020] [Indexed: 02/08/2023]
Abstract
OBJECTIVE Air pollution has potential risk on asthma patients, further prolongs the length of stay. However, it is unclear that the impact of air pollution on excessive length of stay (ELoS) of heterogeneous asthma patients. In this study, we proposed a K-Nearest Neighbor (KNN) embedded approach incorporating with patient status to analyze the impact of short-term air pollution on the ELoS of asthma patients. METHODS The KNN embedded approach includes two stages. Firstly, the KNN algorithm was employed to search for the most similar patient community and approximate kernel proxy of each index patient by Euclidean distance. Then, we built the differential fixed-effect linear model to estimate the risk of air pollution to the ELoS. RESULTS We analyzed 6563 asthma patients' medical insurance records in a large city of China from January to December in 2014. It was found that when the duration of exposure to air pollution (i.e., PM2.5, PM10, SO2, NO2, and CO) reaches around 4-5 days, the risk of increasing the ELoS becomes the largest. But only O3 shows the opposite effect. What's more, CO is the dominant risk to increase the ELoS. With a 1 mg/m3 increment of CO average concentration in 5 days, the ELoS will go up by 0.8157 day (95%CI:0.72,0.9114). Based on the kernel proxy in the top 1% similar patient community, the additional financial burden posed on each patient increases by RMB 488.6002 (95%CI:430.1962,547.0043) due to the ELoS. CONCLUSIONS The KNN embedded approach is an innovative method that takes into account the heterogeneous patient status, and effectively estimates the impact of air pollution on the ELoS. It is concluded that air pollution poses adverse effects and additional financial burdens on asthma patients. Heterogeneous patients should adopt different strategies in health management to reduce the risk of increasing the ELoS due to air pollution, and improve the efficiency of medical resource utilization. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s40201-020-00584-8.
Collapse
Affiliation(s)
- Zhilin Yong
- Business School, Sichuan University, Chengdu, Sichuan 610065 People’s Republic of China
| | - Li Luo
- Business School, Sichuan University, Chengdu, Sichuan 610065 People’s Republic of China
| | - Yonghong Gu
- West China Hospital, Sichuan University, Guo Xue Xiang No. 37, Chengdu, Sichuan 610041 People’s Republic of China
| | - Chunyang Li
- West China Hospital, Sichuan University, Guo Xue Xiang No. 37, Chengdu, Sichuan 610041 People’s Republic of China
| |
Collapse
|
20
|
Seligson ND, Warner JL, Dalton WS, Martin D, Miller RS, Patt D, Kehl KL, Palchuk MB, Alterovitz G, Wiley LK, Huang M, Shen F, Wang Y, Nguyen KA, Wong AF, Meric-Bernstam F, Bernstam EV, Chen JL. Recommendations for patient similarity classes: results of the AMIA 2019 workshop on defining patient similarity. J Am Med Inform Assoc 2021; 27:1808-1812. [PMID: 32885823 PMCID: PMC7671612 DOI: 10.1093/jamia/ocaa159] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 06/19/2020] [Accepted: 07/24/2020] [Indexed: 12/14/2022] Open
Abstract
Defining patient-to-patient similarity is essential for the development of precision medicine in clinical care and research. Conceptually, the identification of similar patient cohorts appears straightforward; however, universally accepted definitions remain elusive. Simultaneously, an explosion of vendors and published algorithms have emerged and all provide varied levels of functionality in identifying patient similarity categories. To provide clarity and a common framework for patient similarity, a workshop at the American Medical Informatics Association 2019 Annual Meeting was convened. This workshop included invited discussants from academics, the biotechnology industry, the FDA, and private practice oncology groups. Drawing from a broad range of backgrounds, workshop participants were able to coalesce around 4 major patient similarity classes: (1) feature, (2) outcome, (3) exposure, and (4) mixed-class. This perspective expands into these 4 subtypes more critically and offers the medical informatics community a means of communicating their work on this important topic.
Collapse
Affiliation(s)
- Nathan D Seligson
- University of Florida, Jacksonville, Florida, USA.,Nemours Children's Specialty Care, Jacksonville, Florida, USA
| | | | - William S Dalton
- M2Gen, Tampa, Florida, USA.,H. Lee Moffitt Cancer Center, Tampa, Florida, USA
| | - David Martin
- United States Food and Drug Administration, Silver Spring, Maryland, USA
| | - Robert S Miller
- American Society of Clinical Oncology, Alexandria, Virginia, USA
| | | | - Kenneth L Kehl
- Dana-Farber Cancer Institute, Boston, Massachusetts, USA.,Harvard Medical School, Boston, Massachusetts, USA
| | - Matvey B Palchuk
- Harvard Medical School, Boston, Massachusetts, USA.,TriNetX, Cambridge, Massachusetts, USA
| | - Gil Alterovitz
- Harvard Medical School, Boston, Massachusetts, USA.,Boston Children's Hospital, Boston, Massachusetts, USA
| | - Laura K Wiley
- University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | | | | | | | | | - Anthony F Wong
- Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, Illinois, USA
| | | | - Elmer V Bernstam
- The University of Texas Health Science Center at Houston, Texas, USA
| | | |
Collapse
|
21
|
Sisk R, Lin L, Sperrin M, Barrett JK, Tom B, Diaz-Ordaz K, Peek N, Martin GP. Informative presence and observation in routine health data: A review of methodology for clinical risk prediction. J Am Med Inform Assoc 2021; 28:155-166. [PMID: 33164082 PMCID: PMC7810439 DOI: 10.1093/jamia/ocaa242] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 09/17/2020] [Indexed: 12/20/2022] Open
Abstract
Objective Informative presence (IP) is the phenomenon whereby the presence or absence of patient data is potentially informative with respect to their health condition, with informative observation (IO) being the longitudinal equivalent. These phenomena predominantly exist within routinely collected healthcare data, in which data collection is driven by the clinical requirements of patients and clinicians. The extent to which IP and IO are considered when using such data to develop clinical prediction models (CPMs) is unknown, as is the existing methodology aiming at handling these issues. This review aims to synthesize such existing methodology, thereby helping identify an agenda for future methodological work. Materials and Methods A systematic literature search was conducted by 2 independent reviewers using prespecified keywords. Results Thirty-six articles were included. We categorized the methods presented within as derived predictors (including some representation of the measurement process as a predictor in the model), modeling under IP, and latent structures. Including missing indicators or summary measures as predictors is the most commonly presented approach amongst the included studies (24 of 36 articles). Discussion This is the first review to collate the literature in this area under a prediction framework. A considerable body relevant of literature exists, and we present ways in which the described methods could be developed further. Guidance is required for specifying the conditions under which each method should be used to enable applied prediction modelers to use these methods. Conclusions A growing recognition of IP and IO exists within the literature, and methodology is increasingly becoming available to leverage these phenomena for prediction purposes. IP and IO should be approached differently in a prediction context than when the primary goal is explanation. The work included in this review has demonstrated theoretical and empirical benefits of incorporating IP and IO, and therefore we recommend that applied health researchers consider incorporating these methods in their work.
Collapse
Affiliation(s)
- Rose Sisk
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Lijing Lin
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Matthew Sperrin
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Jessica K Barrett
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom.,Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Brian Tom
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| | - Karla Diaz-Ordaz
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Niels Peek
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom.,NIHR Biomedical Research Centre, Manchester Academic Health Science Centre, University of Manchester, Manchester, United Kingdom.,Alan Turing Institute, University of Manchester, London, United Kingdom
| | - Glen P Martin
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
22
|
Personalized treatment options for chronic diseases using precision cohort analytics. Sci Rep 2021; 11:1139. [PMID: 33441956 PMCID: PMC7806725 DOI: 10.1038/s41598-021-80967-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 12/31/2020] [Indexed: 12/15/2022] Open
Abstract
To support point-of-care decision making by presenting outcomes of past treatment choices for cohorts of similar patients based on observational data from electronic health records (EHRs), a machine-learning precision cohort treatment option (PCTO) workflow consisting of (1) data extraction, (2) similarity model training, (3) precision cohort identification, and (4) treatment options analysis was developed. The similarity model is used to dynamically create a cohort of similar patients, to inform clinical decisions about an individual patient. The workflow was implemented using EHR data from a large health care provider for three different highly prevalent chronic diseases: hypertension (HTN), type 2 diabetes mellitus (T2DM), and hyperlipidemia (HL). A retrospective analysis demonstrated that treatment options with better outcomes were available for a majority of cases (75%, 74%, 85% for HTN, T2DM, HL, respectively). The models for HTN and T2DM were deployed in a pilot study with primary care physicians using it during clinic visits. A novel data-analytic workflow was developed to create patient-similarity models that dynamically generate personalized treatment insights at the point-of-care. By leveraging both knowledge-driven treatment guidelines and data-driven EHR data, physicians can incorporate real-world evidence in their medical decision-making process when considering treatment options for individual patients.
Collapse
|
23
|
Feng Y, Wang Y, Zeng C, Mao H. Artificial Intelligence and Machine Learning in Chronic Airway Diseases: Focus on Asthma and Chronic Obstructive Pulmonary Disease. Int J Med Sci 2021; 18:2871-2889. [PMID: 34220314 PMCID: PMC8241767 DOI: 10.7150/ijms.58191] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 05/20/2021] [Indexed: 02/05/2023] Open
Abstract
Chronic airway diseases are characterized by airway inflammation, obstruction, and remodeling and show high prevalence, especially in developing countries. Among them, asthma and chronic obstructive pulmonary disease (COPD) show the highest morbidity and socioeconomic burden worldwide. Although there are extensive guidelines for the prevention, early diagnosis, and rational treatment of these lifelong diseases, their value in precision medicine is very limited. Artificial intelligence (AI) and machine learning (ML) techniques have emerged as effective methods for mining and integrating large-scale, heterogeneous medical data for clinical practice, and several AI and ML methods have recently been applied to asthma and COPD. However, very few methods have significantly contributed to clinical practice. Here, we review four aspects of AI and ML implementation in asthma and COPD to summarize existing knowledge and indicate future steps required for the safe and effective application of AI and ML tools by clinicians.
Collapse
Affiliation(s)
- Yinhe Feng
- Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China.,Department of Respiratory and Critical Care Medicine, People's Hospital of Deyang City, Affiliated Hospital of Chengdu College of Medicine, Deyang, Sichuan Province, China
| | - Yubin Wang
- Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China
| | - Chunfang Zeng
- Department of Respiratory and Critical Care Medicine, People's Hospital of Deyang City, Affiliated Hospital of Chengdu College of Medicine, Deyang, Sichuan Province, China
| | - Hui Mao
- Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China
| |
Collapse
|
24
|
Lopez Pineda A, Pourshafeie A, Ioannidis A, Leibold CM, Chan AL, Bustamante CD, Frankovich J, Wojcik GL. Discovering prescription patterns in pediatric acute-onset neuropsychiatric syndrome patients. J Biomed Inform 2020; 113:103664. [PMID: 33359113 DOI: 10.1016/j.jbi.2020.103664] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 10/28/2020] [Accepted: 12/10/2020] [Indexed: 11/28/2022]
Abstract
OBJECTIVE Pediatric acute-onset neuropsychiatric syndrome (PANS) is a complex neuropsychiatric syndrome characterized by an abrupt onset of obsessive-compulsive symptoms and/or severe eating restrictions, along with at least two concomitant debilitating cognitive, behavioral, or neurological symptoms. A wide range of pharmacological interventions along with behavioral and environmental modifications, and psychotherapies have been adopted to treat symptoms and underlying etiologies. Our goal was to develop a data-driven approach to identify treatment patterns in this cohort. MATERIALS AND METHODS In this cohort study, we extracted medical prescription histories from electronic health records. We developed a modified dynamic programming approach to perform global alignment of those medication histories. Our approach is unique since it considers time gaps in prescription patterns as part of the similarity strategy. RESULTS This study included 43 consecutive new-onset pre-pubertal patients who had at least 3 clinic visits. Our algorithm identified six clusters with distinct medication usage history which may represent clinician's practice of treating PANS of different severities and etiologies i.e., two most severe groups requiring high dose intravenous steroids; two arthritic or inflammatory groups requiring prolonged nonsteroidal anti-inflammatory drug (NSAID); and two mild relapsing/remitting group treated with a short course of NSAID. The psychometric scores as outcomes in each cluster generally improved within the first two years. DISCUSSION AND CONCLUSION Our algorithm shows potential to improve our knowledge of treatment patterns in the PANS cohort, while helping clinicians understand how patients respond to a combination of drugs.
Collapse
Affiliation(s)
- Arturo Lopez Pineda
- Department of Biomedical Data Science, Stanford University, CA, USA; Department of Data Science, Amphora Health, Morelia, Mexico
| | - Armin Pourshafeie
- Department of Biomedical Data Science, Stanford University, CA, USA; Department of Physics, Stanford University, CA, USA
| | | | - Collin McCloskey Leibold
- Department of Pediatrics, Division of Allergy, Immunology, and Rheumatology, Stanford University, CA, USA; Department of Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| | - Avis L Chan
- Department of Pediatrics, Division of Allergy, Immunology, and Rheumatology, Stanford University, CA, USA
| | - Carlos D Bustamante
- Department of Biomedical Data Science, Stanford University, CA, USA; Department of Genetics, Stanford University, CA, USA; Chan Zuckerberg Biohub, San Francisco, CA, USA.
| | - Jennifer Frankovich
- Department of Pediatrics, Division of Allergy, Immunology, and Rheumatology, Stanford University, CA, USA.
| | - Genevieve L Wojcik
- Department of Biomedical Data Science, Stanford University, CA, USA; Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
25
|
Sharafoddini A, Dubin JA, Lee J. Identifying subpopulations of septic patients: A temporal data-driven approach. Comput Biol Med 2020; 130:104182. [PMID: 33370712 DOI: 10.1016/j.compbiomed.2020.104182] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 12/14/2020] [Accepted: 12/14/2020] [Indexed: 01/31/2023]
Abstract
Sepsis is one of the deadliest diseases in North America and in spite of the vast amount of research on this topic there is still uncertainty in the outcome of sepsis treatments. This study aimed at investigating the informativeness of temporal electronic health records (EHR) in stratifying septic patients and identifying subpopulations of septic patients with similar trajectories and clinical needs. We performed hierarchical clustering and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) analyses using data from septic patients in the MIMIC III intensive care unit database. The t-Distributed Stochastic Neighbor Embedding (t-SNE) method was utilized to map patients to a two-dimensional space. We utilized silhouette index and cluster-wise stability assessment by resampling to investigate the validity of the clusters. The hierarchical clustering with Euclidean metric identified twelve clinically recognizable subgroups that demonstrated different characteristics in spite of sharing common conditions. Our results demonstrated that data-driven approaches can help in customizing care platforms for septic patients by identifying similar clinically relevant groups.
Collapse
Affiliation(s)
- Anis Sharafoddini
- School of Public Health and Health Systems, University of Waterloo, 200 University Ave. West, Waterloo, ON, N2L 3G1, Canada.
| | - Joel A Dubin
- School of Public Health and Health Systems, University of Waterloo, 200 University Ave. West, Waterloo, ON, N2L 3G1, Canada; Department of Statistics and Actuarial Science, University of Waterloo, 200 University Ave. West, Waterloo, ON, N2L 3G1, Canada.
| | - Joon Lee
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, 3280 Hospital Dr NW, Calgary, AB, T2N 4Z6, Canada; Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, 3280 Hospital Dr NW, Calgary, AB, T2N 4Z6, Canada.
| |
Collapse
|
26
|
Saad M, Lee IH. Leveraging hybrid biomarkers in clinical endpoint prediction. BMC Med Inform Decis Mak 2020; 20:255. [PMID: 33028301 PMCID: PMC7538849 DOI: 10.1186/s12911-020-01262-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 09/15/2020] [Indexed: 11/20/2022] Open
Abstract
Background Clinical endpoint prediction remains challenging for health providers. Although predictors such as age, gender, and disease staging are of considerable predictive value, the accuracy often ranges between 60 and 80%. An accurate prognosis assessment is required for making effective clinical decisions. Methods We proposed an extended prognostic model based on clinical covariates with adjustment for additional variables that were radio-graphically induced, termed imaging biomarkers. Eight imaging biomarkers were introduced and investigated in a cohort of 68 non-small cell lung cancer subjects with tumor internal characteristic. The subjects comprised of 40 males and 28 females with mean age at 68.7 years. The imaging biomarkers used to quantify the solid component and non-solid component of a tumor. The extended model comprises of additional frameworks that correlate these markers to the survival ends through uni- and multi-variable analysis to determine the most informative predictors, before combining them with existing clinical predictors. Performance was compared between traditional and extended approaches using Receiver Operating Characteristic (ROC) curves, Area under the ROC curves (AUC), Kaplan-Meier (KM) curves, Cox Proportional Hazard, and log-rank tests (p-value). Results The proposed hybrid model exhibited an impressive boosting pattern over the traditional approach of prognostic modelling in the survival prediction (AUC ranging from 77 to 97%). Four developed imaging markers were found to be significant in distinguishing between subjects having more and less dense components: (P = 0.002–0.006). The correlation to survival analysis revealed that patients with denser composition of tumor (solid dominant) lived 1.6–2.2 years longer (mean survival) and 0.5–2.0 years longer (median survival), than those with less dense composition (non-solid dominant). Conclusion The present study provides crucial evidence that there is an added value for incorporating additional image-based predictors while predicting clinical endpoints. Though the hypotheses were confirmed in a customized case study, we believe the proposed model is easily adapted to various clinical cases, such as predictions of complications, treatment response, and disease evolution.
Collapse
Affiliation(s)
- Maliazurina Saad
- University of Illinois at Urbana-Champaign, 1406 W. Green St, Urbana, IL, 61801, USA.,Korea Polytechnic University, 237 Sangidaehak-ro, Siheung-si, Gyeonggi-do, 15073, South Korea
| | - Ik Hyun Lee
- Korea Polytechnic University, 237 Sangidaehak-ro, Siheung-si, Gyeonggi-do, 15073, South Korea.
| |
Collapse
|
27
|
Hendrickx JO, van Gastel J, Leysen H, Martin B, Maudsley S. High-dimensionality Data Analysis of Pharmacological Systems Associated with Complex Diseases. Pharmacol Rev 2020; 72:191-217. [PMID: 31843941 DOI: 10.1124/pr.119.017921] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
It is widely accepted that molecular reductionist views of highly complex human physiologic activity, e.g., the aging process, as well as therapeutic drug efficacy are largely oversimplifications. Currently some of the most effective appreciation of biologic disease and drug response complexity is achieved using high-dimensionality (H-D) data streams from transcriptomic, proteomic, metabolomics, or epigenomic pipelines. Multiple H-D data sets are now common and freely accessible for complex diseases such as metabolic syndrome, cardiovascular disease, and neurodegenerative conditions such as Alzheimer's disease. Over the last decade our ability to interrogate these high-dimensionality data streams has been profoundly enhanced through the development and implementation of highly effective bioinformatic platforms. Employing these computational approaches to understand the complexity of age-related diseases provides a facile mechanism to then synergize this pathologic appreciation with a similar level of understanding of therapeutic-mediated signaling. For informative pathology and drug-based analytics that are able to generate meaningful therapeutic insight across diverse data streams, novel informatics processes such as latent semantic indexing and topological data analyses will likely be important. Elucidation of H-D molecular disease signatures from diverse data streams will likely generate and refine new therapeutic strategies that will be designed with a cognizance of a realistic appreciation of the complexity of human age-related disease and drug effects. We contend that informatic platforms should be synergistic with more advanced chemical/drug and phenotypic cellular/tissue-based analytical predictive models to assist in either de novo drug prioritization or effective repurposing for the intervention of aging-related diseases. SIGNIFICANCE STATEMENT: All diseases, as well as pharmacological mechanisms, are far more complex than previously thought a decade ago. With the advent of commonplace access to technologies that produce large volumes of high-dimensionality data (e.g., transcriptomics, proteomics, metabolomics), it is now imperative that effective tools to appreciate this highly nuanced data are developed. Being able to appreciate the subtleties of high-dimensionality data will allow molecular pharmacologists to develop the most effective multidimensional therapeutics with effectively engineered efficacy profiles.
Collapse
Affiliation(s)
- Jhana O Hendrickx
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| | - Jaana van Gastel
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| | - Hanne Leysen
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| | - Bronwen Martin
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| | - Stuart Maudsley
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| |
Collapse
|
28
|
Hier DB, Kopel J, Brint SU, Wunsch DC, Olbricht GR, Azizi S, Allen B. Evaluation of standard and semantically-augmented distance metrics for neurology patients. BMC Med Inform Decis Mak 2020; 20:203. [PMID: 32843023 PMCID: PMC7448345 DOI: 10.1186/s12911-020-01217-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 08/12/2020] [Indexed: 12/23/2022] Open
Abstract
Background Patient distances can be calculated based on signs and symptoms derived from an ontological hierarchy. There is controversy as to whether patient distance metrics that consider the semantic similarity between concepts can outperform standard patient distance metrics that are agnostic to concept similarity. The choice of distance metric can dominate the performance of classification or clustering algorithms. Our objective was to determine if semantically augmented distance metrics would outperform standard metrics on machine learning tasks. Methods We converted the neurological findings from 382 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated patient distances by four different metrics (cosine distance, a semantically augmented cosine distance, Jaccard distance, and a semantically augmented bipartite distance). Semantic augmentation for two of the metrics depended on concept similarities from a hierarchical neuro-ontology. For machine learning algorithms, we used the patient diagnosis as the ground truth label and patient findings as machine learning features. We assessed classification accuracy for four classifiers and cluster quality for two clustering algorithms for each of the distance metrics. Results Inter-patient distances were smaller when the distance metric was semantically augmented. Classification accuracy and cluster quality were not significantly different by distance metric. Conclusion Although semantic augmentation reduced inter-patient distances, we did not find improved classification accuracy or improved cluster quality with semantically augmented patient distance metrics when applied to a dataset of neurology patients. Further work is needed to assess the utility of semantically augmented patient distances.
Collapse
Affiliation(s)
- Daniel B Hier
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, 60612, USA.
| | - Jonathan Kopel
- Department of Internal Medicine, Texas Tech University Health Sciences Center, Lubbock, TX, USA
| | - Steven U Brint
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, 60612, USA
| | - Donald C Wunsch
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, 65401, USA
| | - Gayla R Olbricht
- Department of Mathematics and Statistics, Missouri University of Science and Technology, Rolla, MO, 65401, USA
| | - Sima Azizi
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, 65401, USA
| | - Blaine Allen
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, 65401, USA
| |
Collapse
|
29
|
Tokodi M, Shrestha S, Bianco C, Kagiyama N, Casaclang-Verzosa G, Narula J, Sengupta PP. Interpatient Similarities in Cardiac Function: A Platform for Personalized Cardiovascular Medicine. JACC Cardiovasc Imaging 2020; 13:1119-1132. [PMID: 32199835 PMCID: PMC7556337 DOI: 10.1016/j.jcmg.2019.12.018] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 10/31/2019] [Accepted: 12/19/2019] [Indexed: 12/20/2022]
Abstract
OBJECTIVES The authors applied unsupervised machine-learning techniques for integrating echocardiographic features of left ventricular (LV) structure and function into a patient similarity network that predicted major adverse cardiac event(s) (MACE) in an individual patient. BACKGROUND Patient similarity analysis is an evolving paradigm for precision medicine in which patients are clustered or classified based on their similarities in several clinical features. METHODS A retrospective cohort of 866 patients was used to develop a network architecture using 9 echocardiographic features of LV structure and function. The data for 468 patients from 2 prospective cohort registries were then added to test the model's generalizability. RESULTS The map of cross-sectional data in the retrospective cohort resulted in a looped patient network that persisted even after the addition of data from the prospective cohort registries. After subdividing the loop into 4 regions, patients in each region showed unique differences in LV function, with Kaplan-Meier curves demonstrating significant differences in MACE-related rehospitalization and death (both p < 0.001). Addition of network information to clinical risk predictors resulted in significant improvements in net reclassification, integrated discrimination, and median risk scores for predicting MACE (p < 0.05 for all). Furthermore, the network predicted the cardiac disease cycle in each of the 96 patients who had second echocardiographic evaluations. An improvement or remaining in low-risk regions was associated with lower MACE-related rehospitalization rates than worsening or remaining in high-risk regions (3% vs. 37%; p < 0.001). CONCLUSIONS Patient similarity analysis integrates multiple features of cardiac function to develop a phenotypic network in which patients can be mapped to specific locations associated with specific disease stage and clinical outcomes. The use of patient similarity analysis may have relevance for automated staging of cardiac disease severity, personalized prediction of prognosis, and monitoring progression or response to therapies.
Collapse
Affiliation(s)
- Márton Tokodi
- Division of Cardiology, West Virginia University Heart & Vascular Institute, Morgantown, West Virginia; Heart and Vascular Center, Semmelweis University, Budapest, Hungary
| | - Sirish Shrestha
- Division of Cardiology, West Virginia University Heart & Vascular Institute, Morgantown, West Virginia
| | - Christopher Bianco
- Division of Cardiology, West Virginia University Heart & Vascular Institute, Morgantown, West Virginia
| | - Nobuyuki Kagiyama
- Division of Cardiology, West Virginia University Heart & Vascular Institute, Morgantown, West Virginia
| | - Grace Casaclang-Verzosa
- Division of Cardiology, West Virginia University Heart & Vascular Institute, Morgantown, West Virginia
| | - Jagat Narula
- Division of Cardiology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Partho P Sengupta
- Division of Cardiology, West Virginia University Heart & Vascular Institute, Morgantown, West Virginia.
| |
Collapse
|
30
|
Cho JS, Shrestha S, Kagiyama N, Hu L, Ghaffar YA, Casaclang-Verzosa G, Zeb I, Sengupta PP. A Network-Based "Phenomics" Approach for Discovering Patient Subtypes From High-Throughput Cardiac Imaging Data. JACC Cardiovasc Imaging 2020; 13:1655-1670. [PMID: 32762883 DOI: 10.1016/j.jcmg.2020.02.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 02/19/2020] [Accepted: 02/20/2020] [Indexed: 12/16/2022]
Abstract
OBJECTIVES The authors present a method that focuses on cohort matching algorithms for performing patient-to-patient comparisons along multiple echocardiographic parameters for predicting meaningful patient subgroups. BACKGROUND Recent efforts in collecting multiomics data open numerous opportunities for comprehensive integration of highly heterogenous data to classify a patient's cardiovascular state, eventually leading to tailored therapies. METHODS A total of 42 echocardiography features, including 2-dimensional and Doppler measurements, left ventricular (LV) and atrial speckle-tracking, and vector flow mapping data, were obtained in 297 patients. A similarity network was developed to delineate distinct patient phenotypes, and then neural network models were trained for discriminating the phenotypic presentations. RESULTS The patient similarity model identified 4 clusters (I to IV), with patients in each cluster showed distinctive clinical presentations based on American College of Cardiology/American Heart Association heart failure stage and the occurrence of short-term major adverse cardiac and cerebrovascular events. Compared with other clusters, cluster IV had a higher prevalence of stage C or D heart failure (78%; p < 0.001), New York Heart Association functional classes III or IV (61%; p < 0.001), and a higher incidence of major adverse cardiac and cerebrovascular events (p < 0.001). The neural network model showed robust prediction of patient clusters, with area under the receiver-operating characteristic curve ranging from 0.82 to 0.99 for the independent hold-out validation set. CONCLUSIONS Automated computational methods for phenotyping can be an effective strategy to fuse multidimensional parameters of LV structure and function. It can identify distinct cardiac phenogroups in terms of clinical characteristics, cardiac structure and function, hemodynamics, and outcomes.
Collapse
Affiliation(s)
- Jung Sun Cho
- West Virginia University Heart & Vascular Institute, Morgantown, West Virginia; Division of Cardiology, Daejeon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Sirish Shrestha
- West Virginia University Heart & Vascular Institute, Morgantown, West Virginia
| | - Nobuyuki Kagiyama
- West Virginia University Heart & Vascular Institute, Morgantown, West Virginia
| | - Lan Hu
- West Virginia University Heart & Vascular Institute, Morgantown, West Virginia
| | - Yasir Abdul Ghaffar
- West Virginia University Heart & Vascular Institute, Morgantown, West Virginia
| | | | - Irfan Zeb
- West Virginia University Heart & Vascular Institute, Morgantown, West Virginia
| | - Partho P Sengupta
- West Virginia University Heart & Vascular Institute, Morgantown, West Virginia.
| |
Collapse
|
31
|
Wentzel A, Hanula P, Luciani T, Elgohari B, Elhalawani H, Canahuate G, Vock D, Fuller CD, Marai GE. Cohort-based T-SSIM Visual Computing for Radiation Therapy Prediction and Exploration. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:949-959. [PMID: 31442988 PMCID: PMC7253296 DOI: 10.1109/tvcg.2019.2934546] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We describe a visual computing approach to radiation therapy (RT) planning, based on spatial similarity within a patient cohort. In radiotherapy for head and neck cancer treatment, dosage to organs at risk surrounding a tumor is a large cause of treatment toxicity. Along with the availability of patient repositories, this situation has lead to clinician interest in understanding and predicting RT outcomes based on previously treated similar patients. To enable this type of analysis, we introduce a novel topology-based spatial similarity measure, T-SSIM, and a predictive algorithm based on this similarity measure. We couple the algorithm with a visual steering interface that intertwines visual encodings for the spatial data and statistical results, including a novel parallel-marker encoding that is spatially aware. We report quantitative results on a cohort of 165 patients, as well as a qualitative evaluation with domain experts in radiation oncology, data management, biostatistics, and medical imaging, who are collaborating remotely.
Collapse
|
32
|
Huang M, Shah ND, Yao L. Evaluating global and local sequence alignment methods for comparing patient medical records. BMC Med Inform Decis Mak 2019; 19:263. [PMID: 31856819 PMCID: PMC6921442 DOI: 10.1186/s12911-019-0965-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Background Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients. Methods We tested two cutting-edge global sequence alignment methods, namely dynamic time warping (DTW) and Needleman-Wunsch algorithm (NWA), together with their local modifications, DTW for Local alignment (DTWL) and Smith-Waterman algorithm (SWA), for aligning patient medical records. We also used 4 sets of synthetic patient medical records generated from a large real-world EHR database as gold standard data, to objectively evaluate these sequence alignment algorithms. Results For global sequence alignments, 47 out of 80 DTW alignments and 11 out of 80 NWA alignments had superior similarity scores than reference alignments while the rest 33 DTW alignments and 69 NWA alignments had the same similarity scores as reference alignments. Forty-six out of 80 DTW alignments had better similarity scores than NWA alignments with the rest 34 cases having the equal similarity scores from both algorithms. For local sequence alignments, 70 out of 80 DTWL alignments and 68 out of 80 SWA alignments had larger coverage and higher similarity scores than reference alignments while the rest DTWL alignments and SWA alignments received the same coverage and similarity scores as reference alignments. Six out of 80 DTWL alignments showed larger coverage and higher similarity scores than SWA alignments. Thirty DTWL alignments had the equal coverage but better similarity scores than SWA. DTWL and SWA received the equal coverage and similarity scores for the rest 44 cases. Conclusions DTW, NWA, DTWL and SWA outperformed the reference alignments. DTW (or DTWL) seems to align better than NWA (or SWA) by inserting new daily events and identifying more similarities between patient medical records. The evaluation results could provide valuable information on the strengths and weakness of these sequence alignment methods for future development of sequence alignment methods and patient similarity-based studies.
Collapse
Affiliation(s)
- Ming Huang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Nilay D Shah
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Lixia Yao
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
33
|
Ruan T, Lei L, Zhou Y, Zhai J, Zhang L, He P, Gao J. Representation learning for clinical time series prediction tasks in electronic health records. BMC Med Inform Decis Mak 2019; 19:259. [PMID: 31842854 PMCID: PMC6916209 DOI: 10.1186/s12911-019-0985-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Electronic health records (EHRs) provide possibilities to improve patient care and facilitate clinical research. However, there are many challenges faced by the applications of EHRs, such as temporality, high dimensionality, sparseness, noise, random error and systematic bias. In particular, temporal information is difficult to effectively use by traditional machine learning methods while the sequential information of EHRs is very useful. Method In this paper, we propose a general-purpose patient representation learning approach to summarize sequential EHRs. Specifically, a recurrent neural network based denoising autoencoder (RNN-DAE) is employed to encode inhospital records of each patient into a low dimensional dense vector. Results Based on EHR data collected from Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine, we experimentally evaluate our proposed RNN-DAE method on both mortality prediction task and comorbidity prediction task. Extensive experimental results show that our proposed RNN-DAE method outperforms existing methods. In addition, we apply the “Deep Feature” represented by our proposed RNN-DAE method to track similar patients with t-SNE, which also achieves some interesting observations. Conclusion We propose an effective unsupervised RNN-DAE method to summarize patient sequential information in EHR data. Our proposed RNN-DAE method is useful on both mortality prediction task and comorbidity prediction task.
Collapse
Affiliation(s)
- Tong Ruan
- School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Liqi Lei
- School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Yangming Zhou
- School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China.
| | - Jie Zhai
- School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Le Zhang
- School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Ping He
- Shanghai Hospital Development Center, 2 Kangding Road, Shanghai, 200000, China
| | - Ju Gao
- Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, 528 Zhangheng Road, Shanghai, 201203, China
| |
Collapse
|
34
|
Chen X, Garcelon N, Neuraz A, Billot K, Lelarge M, Bonald T, Garcia H, Martin Y, Benoit V, Vincent M, Faour H, Douillet M, Lyonnet S, Saunier S, Burgun A. Phenotypic similarity for rare disease: Ciliopathy diagnoses and subtyping. J Biomed Inform 2019; 100:103308. [DOI: 10.1016/j.jbi.2019.103308] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 09/05/2019] [Accepted: 10/11/2019] [Indexed: 01/29/2023]
|
35
|
Wang N, Huang Y, Liu H, Fei X, Wei L, Zhao X, Chen H. Measurement and application of patient similarity in personalized predictive modeling based on electronic medical records. Biomed Eng Online 2019; 18:98. [PMID: 31601207 PMCID: PMC6788002 DOI: 10.1186/s12938-019-0718-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Accepted: 10/01/2019] [Indexed: 12/24/2022] Open
Abstract
Background Conventional risk prediction techniques may not be the most suitable approach for personalized prediction for individual patients. Therefore, individualized predictive modeling based on similar patients has emerged. This study aimed to propose a comprehensive measurement of patient similarity using real-world electronic medical records data, and evaluate the effectiveness of the individualized prediction of a patient’s diabetes status based on the patient similarity. Results When using no more than 30% of the whole training sample, the personalized predictive models outperformed corresponding traditional models built on randomly selected training samples of the same size as the personalized models (P < 0.001 for all). With only the top 1000 (10%), 700 (7%) and 1400 (14%) similar samples, personalized random forest, k-nearest neighbor and logistic regression models reached the globally optimal performance with the area under the receiver-operating characteristic (ROC) curve of 0.90, 0.82 and 0.89, respectively. Conclusions The proposed patient similarity measurement was effective when developing personalized predictive models. The successful application of patient similarity in predicting a patient’s diabetes status provided useful references for diagnostic decision-making support by investigating the evidence on similar patients.
Collapse
Affiliation(s)
- Ni Wang
- School of Biomedical Engineering, Capital Medical University, No. 10, Xitoutiao, YouAnMen, Fengtai District, Beijing, 100069, China.,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, No. 10, Xitoutiao, YouAnMen, Fengtai District, Beijing, 100069, China
| | - Yanqun Huang
- School of Biomedical Engineering, Capital Medical University, No. 10, Xitoutiao, YouAnMen, Fengtai District, Beijing, 100069, China.,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, No. 10, Xitoutiao, YouAnMen, Fengtai District, Beijing, 100069, China
| | - Honglei Liu
- School of Biomedical Engineering, Capital Medical University, No. 10, Xitoutiao, YouAnMen, Fengtai District, Beijing, 100069, China.,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, No. 10, Xitoutiao, YouAnMen, Fengtai District, Beijing, 100069, China
| | - Xiaolu Fei
- Information Center, Xuanwu Hospital, Capital Medical University, No. 45 Changchun Street, Xicheng District, Beijing, 100053, China
| | - Lan Wei
- Information Center, Xuanwu Hospital, Capital Medical University, No. 45 Changchun Street, Xicheng District, Beijing, 100053, China
| | - Xiangkun Zhao
- School of Biomedical Engineering, Capital Medical University, No. 10, Xitoutiao, YouAnMen, Fengtai District, Beijing, 100069, China
| | - Hui Chen
- School of Biomedical Engineering, Capital Medical University, No. 10, Xitoutiao, YouAnMen, Fengtai District, Beijing, 100069, China. .,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, No. 10, Xitoutiao, YouAnMen, Fengtai District, Beijing, 100069, China.
| |
Collapse
|
36
|
Xu D, Sheng JQ, Hu PJH, Huang TS, Lee WC. Predicting hepatocellular carcinoma recurrences: A data-driven multiclass classification method incorporating latent variables. J Biomed Inform 2019; 96:103237. [PMID: 31238108 DOI: 10.1016/j.jbi.2019.103237] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Revised: 03/30/2019] [Accepted: 06/18/2019] [Indexed: 12/12/2022]
Abstract
Hepatocellular carcinoma (HCC), a malignant form of cancer, is frequently treated with surgical resections, which have relatively high recurrence rates. Effective recurrence predictions enable physicians' timely detections and adequate therapeutic measures that can greatly improve patient care and outcomes. Toward that end, predictions of early versus late HCC recurrences should be considered separately to reflect their distinct onset time horizons, clinical causes, underlying clinical etiology, and pathogenesis. We propose a novel Bayesian network-based method to predict different HCC recurrence outcomes by considering the respective recurrence evolution paths. Typical patient information obtained in early stages is insufficiently informative to predict recurrence outcomes accurately, due to the lack of subsequent patient progression information. Our method alleviates such information deficiency constraints by incorporating an independent latent variable, dominant recurrence type, to regulate recurrence outcome predictions (early, late, or no recurrence). We use a real-world HCC data set to evaluate the proposed method, relative to three prevalent benchmark techniques. Overall, the results show that our method consistently and significantly outperforms all the benchmark techniques in terms of accuracy, precision, recall, and F-measures. For increased robustness, we use another data set to perform an out-of-sample evaluation and obtain similar results. This study thus contributes to HCC recurrence research and offers several implications for clinical practice.
Collapse
Affiliation(s)
- Da Xu
- Department of Operations and Information Systems, David Eccles School of Business, University of Utah, USA.
| | - Jessica Qiuhua Sheng
- Department of Operations and Information Systems, David Eccles School of Business, University of Utah, USA.
| | - Paul Jen-Hwa Hu
- Department of Operations and Information Systems, David Eccles School of Business, University of Utah, USA.
| | - Ting Shuo Huang
- Department of General Surgery, Community Medicine Research Center, Chang Gung Memorial Hospital, Keelung, Taiwan, ROC; Department of Chinese Medicine, College of Medicine, Chang Gung University, Kwei-Shan, Taoyuan, Taiwan, ROC.
| | - Wei-Chen Lee
- Department of Liver and Transplantation Surgery, Chang Gung Memorial Hospital, Linkou, Taiwan, ROC; Department of Medicine, College of Medicine, Chang Gung University, Kwei-Shan, Taoyuan,Taiwan, ROC.
| |
Collapse
|
37
|
Haas K, Morton S, Gupta S, Mahoui M. Using Similarity Metrics on Real World Data and Patient Treatment Pathways to Recommend the Next Treatment. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2019; 2019:398-406. [PMID: 31258993 PMCID: PMC6568112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Non-small-cell lung cancer (NSCLC) is one of the most prevalent types of lung cancer and continues to have an ominous five year survival rate. Considerable work has been accomplished in analyzing the viability of the treatments offered to NSCLC patients; however, while many of these treatments have performed better over populations of diagnosed NSCLC patients, a specific treatment may not be the most effective therapy for a given patient. Coupling both patient similarity metrics using the Gower similarity metric and prior treatment knowledge, we were able to demonstrate how patient analytics can complement clinical efforts in recommending the next best treatment. Our retrospective and exploratory results indicate that a majority of patients are not recommended the best surviving therapy once they require a new therapy. This investigation lays the groundwork for treatment recommendation using analytics, but more investigation is required to analyze patient outcomes beyond survival.
Collapse
Affiliation(s)
- Kyle Haas
- Indiana University-Purdue University Indianapolis (IUPUI), Indianapolis, IN, USA
| | | | | | | |
Collapse
|
38
|
Sharafoddini A, Dubin JA, Maslove DM, Lee J. A New Insight Into Missing Data in Intensive Care Unit Patient Profiles: Observational Study. JMIR Med Inform 2019; 7:e11605. [PMID: 30622091 PMCID: PMC6329436 DOI: 10.2196/11605] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Revised: 10/30/2018] [Accepted: 10/30/2018] [Indexed: 01/08/2023] Open
Abstract
Background The data missing from patient profiles in intensive care units (ICUs) are substantial and unavoidable. However, this incompleteness is not always random or because of imperfections in the data collection process. Objective This study aimed to investigate the potential hidden information in data missing from electronic health records (EHRs) in an ICU and examine whether the presence or missingness of a variable itself can convey information about the patient health status. Methods Daily retrieval of laboratory test (LT) measurements from the Medical Information Mart for Intensive Care III database was set as our reference for defining complete patient profiles. Missingness indicators were introduced as a way of representing presence or absence of the LTs in a patient profile. Thereafter, various feature selection methods (filter and embedded feature selection methods) were used to examine the predictive power of missingness indicators. Finally, a set of well-known prediction models (logistic regression [LR], decision tree, and random forest) were used to evaluate whether the absence status itself of a variable recording can provide predictive power. We also examined the utility of missingness indicators in improving predictive performance when used with observed laboratory measurements as model input. The outcome of interest was in-hospital mortality and mortality at 30 days after ICU discharge. Results Regardless of mortality type or ICU day, more than 40% of the predictors selected by feature selection methods were missingness indicators. Notably, employing missingness indicators as the only predictors achieved reasonable mortality prediction on all days and for all mortality types (for instance, in 30-day mortality prediction with LR, we achieved area under the curve of the receiver operating characteristic [AUROC] of 0.6836±0.012). Including indicators with observed measurements in the prediction models also improved the AUROC; the maximum improvement was 0.0426. Indicators also improved the AUROC for Simplified Acute Physiology Score II model—a well-known ICU severity of illness score—confirming the additive information of the indicators (AUROC of 0.8045±0.0109 for 30-day mortality prediction for LR). Conclusions Our study demonstrated that the presence or absence of LT measurements is informative and can be considered a potential predictor of in-hospital and 30-day mortality. The comparative analysis of prediction models also showed statistically significant prediction improvement when indicators were included. Moreover, missing data might reflect the opinions of examining clinicians. Therefore, the absence of measurements can be informative in ICUs and has predictive power beyond the measured data themselves. This initial case study shows promise for more in-depth analysis of missing data and its informativeness in ICUs. Future studies are needed to generalize these results.
Collapse
Affiliation(s)
- Anis Sharafoddini
- Health Data Science Lab, School of Public Health and Health Systems, University of Waterloo, Waterloo, ON, Canada
| | - Joel A Dubin
- Health Data Science Lab, School of Public Health and Health Systems, University of Waterloo, Waterloo, ON, Canada.,Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
| | - David M Maslove
- Department of Critical Care Medicine, Queen's University, Kingston, ON, Canada
| | - Joon Lee
- Health Data Science Lab, School of Public Health and Health Systems, University of Waterloo, Waterloo, ON, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
39
|
Tsaneva-Atanasova K, Diaz-Zuccarini V. Editorial: Mathematics for Healthcare as Part of Computational Medicine. Front Physiol 2018; 9:985. [PMID: 30087624 PMCID: PMC6066689 DOI: 10.3389/fphys.2018.00985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2018] [Accepted: 07/04/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Krasimira Tsaneva-Atanasova
- Department of Mathematics and Living Systems Institute, University of Exeter, Exeter, United Kingdom
- EPSRC Centre for Predictive Modelling in Healthcare, University of Exeter, Exeter, United Kingdom
- *Correspondence: Krasimira Tsaneva-Atanasova
| | - Vanessa Diaz-Zuccarini
- Multiscale Cardiovascular Engineering Group, Department of Mechanical Engineering, University College London, London, United Kingdom
| |
Collapse
|
40
|
Zhang H, Zhu F, Dodge HH, Higgins GA, Omenn GS, Guan Y. A similarity-based approach to leverage multi-cohort medical data on the diagnosis and prognosis of Alzheimer's disease. Gigascience 2018; 7:5052206. [PMID: 30010762 PMCID: PMC6054197 DOI: 10.1093/gigascience/giy085] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 04/15/2018] [Accepted: 06/28/2018] [Indexed: 01/17/2023] Open
Abstract
Motivation Heterogeneous diseases such as Alzheimer's disease (AD) manifest a variety of phenotypes among populations. Early diagnosis and effective treatment offer cost benefits. Many studies on biochemical and imaging markers have shown potential promise in improving diagnosis, yet establishing quantitative diagnostic criteria for ancillary tests remains challenging. Results We have developed a similarity-based approach that matches individuals to subjects with similar conditions. We modeled the disease with a Gaussian process, and tested the method in the Alzheimer's Disease Big Data DREAM Challenge. Ranked the highest among submitted methods, our diagnostic model predicted cognitive impairment scores in an independent dataset test with a correlation score of 0.573. It differentiated AD patients from control subjects with an area under the receiver operating curve of 0.920. Without knowing longitudinal information about subjects, the model predicted patients who are vulnerable to conversion from mild-cognitive impairment to AD through the similarity network. This diagnostic framework can be applied to other diseases with clinical heterogeneity, such as Parkinson's disease.
Collapse
Affiliation(s)
- Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 2017G Palmer Commons, 100 Washtenaw Avenue, Ann Arbor, MI, USA 48109
| | - Fan Zhu
- Department of Computational Medicine and Bioinformatics, University of Michigan, 2017G Palmer Commons, 100 Washtenaw Avenue, Ann Arbor, MI, USA 48109
- Chongqing Key Laboratory of Big Data and Intelligent Computing, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, 266 Fangzheng Avenue, Shuitu Hi-tech Industrial Park, Shuitu Town, Beibei District, Chongqing, China 400714
| | - Hiroko H Dodge
- Michigan Alzheimer's Disease Center, University of Michigan, 2101 Commonwealth Blvd, Ann Arbor, MI, USA 48105
- Department of Neurology, University of Michigan, 1500 E. Medical Center Dr., 1914 Taubman Center SPC 5316, Ann Arbor, MI, USA 48109
- Layton Aging and Alzheimer's Disease Center and Department of Neurology, Oregon Health & Science University, 3181 S.W. Sam Jackson Park Road, L226, Portland, OR, USA 97239
| | - Gerald A Higgins
- Department of Computational Medicine and Bioinformatics, University of Michigan, 2017G Palmer Commons, 100 Washtenaw Avenue, Ann Arbor, MI, USA 48109
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, 2017G Palmer Commons, 100 Washtenaw Avenue, Ann Arbor, MI, USA 48109
- Department of Internal Medicine, University of Michigan, 3110 Taubman Center, SPC 5368, 1500 East Medical Center Drive, Ann Arbor, MI, USA 48109
- Department of Human Genetics, University of Michigan, 4909 Buhl Building, 1241 E. Catherine St., Ann Arbor, MI, USA 48109
- School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, USA 48109
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, 2017G Palmer Commons, 100 Washtenaw Avenue, Ann Arbor, MI, USA 48109
- Department of Internal Medicine, University of Michigan, 3110 Taubman Center, SPC 5368, 1500 East Medical Center Drive, Ann Arbor, MI, USA 48109
- Department of Electronic Engineering and Computer Science, Bob and Betty Beyster Building, 2260 Hayward Street, University of Michigan, Ann Arbor, MI, USA 48109
| | | |
Collapse
|
41
|
Suo Q, Ma F, Yuan Y, Huai M, Zhong W, Gao J, Zhang A. Deep Patient Similarity Learning for Personalized Healthcare. IEEE Trans Nanobioscience 2018; 17:219-227. [DOI: 10.1109/tnb.2018.2837622] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
42
|
Tényi Á, Vela E, Cano I, Cleries M, Monterde D, Gomez-Cabrero D, Roca J. Risk and temporal order of disease diagnosis of comorbidities in patients with COPD: a population health perspective. BMJ Open Respir Res 2018; 5:e000302. [PMID: 29955364 PMCID: PMC6018856 DOI: 10.1136/bmjresp-2018-000302] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Revised: 05/22/2018] [Indexed: 02/06/2023] Open
Abstract
Introduction Comorbidities in patients with chronic obstructive pulmonary disease (COPD) generate a major burden on healthcare. Identification of cost-effective strategies aiming at preventing and enhancing management of comorbid conditions in patients with COPD requires deeper knowledge on epidemiological patterns and on shared biological pathways explaining co-occurrence of diseases. Methods The study assesses the co-occurrence of several chronic conditions in patients with COPD using two different datasets: Catalan Healthcare Surveillance System (CHSS) (ES, 1.4 million registries) and Medicare (USA, 13 million registries). Temporal order of disease diagnosis was analysed in the CHSS dataset. Results The results demonstrate higher prevalence of most of the diseases, as comorbid conditions, in elderly (>65) patients with COPD compared with non-COPD subjects, an effect observed in both CHSS and Medicare datasets. Analysis of temporal order of disease diagnosis showed that comorbid conditions in elderly patients with COPD tend to appear after the diagnosis of the obstructive disease, rather than before it. Conclusion The results provide a population health perspective of the comorbidity challenge in patients with COPD, indicating the increased risk of developing comorbid conditions in these patients. The research reinforces the need for novel approaches in the prevention and management of comorbidities in patients with COPD to effectively reduce the overall burden of the disease on these patients.
Collapse
Affiliation(s)
- Ákos Tényi
- Hospital Clinic de Barcelona, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Universitat de Barcelona, Barcelona, Spain.,Center for Biomedical Network Research in Respiratory Diseases (CIBERES), Madrid, Spain
| | - Emili Vela
- Unitat d'Informació i Coneixement, Servei Catala de la Salut de la Generalitat de Catalunya, Barcelona, Catalunya, Spain
| | - Isaac Cano
- Hospital Clinic de Barcelona, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Universitat de Barcelona, Barcelona, Spain.,Center for Biomedical Network Research in Respiratory Diseases (CIBERES), Madrid, Spain
| | - Montserrat Cleries
- Unitat d'Informació i Coneixement, Servei Catala de la Salut de la Generalitat de Catalunya, Barcelona, Catalunya, Spain
| | - David Monterde
- Serveis Centrals, Institut Català de la Salut, Barcelona, Spain
| | - David Gomez-Cabrero
- Mucosal and Salivary Biology Division, King's College London Dental Institute, London, UK.,Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine Solna, Karolinska Institutet, Karolinska University Hospital and Science for Life Laboratory, Stockholm, Sweden.,Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
| | - Josep Roca
- Hospital Clinic de Barcelona, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Universitat de Barcelona, Barcelona, Spain.,Center for Biomedical Network Research in Respiratory Diseases (CIBERES), Madrid, Spain
| |
Collapse
|
43
|
Parimbelli E, Marini S, Sacchi L, Bellazzi R. Patient similarity for precision medicine: A systematic review. J Biomed Inform 2018; 83:87-96. [PMID: 29864490 DOI: 10.1016/j.jbi.2018.06.001] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 05/16/2018] [Accepted: 06/01/2018] [Indexed: 12/19/2022]
Abstract
Evidence-based medicine is the most prevalent paradigm adopted by physicians. Clinical practice guidelines typically define a set of recommendations together with eligibility criteria that restrict their applicability to a specific group of patients. The ever-growing size and availability of health-related data is currently challenging the broad definitions of guideline-defined patient groups. Precision medicine leverages on genetic, phenotypic, or psychosocial characteristics to provide precise identification of patient subsets for treatment targeting. Defining a patient similarity measure is thus an essential step to allow stratification of patients into clinically-meaningful subgroups. The present review investigates the use of patient similarity as a tool to enable precision medicine. 279 articles were analyzed along four dimensions: data types considered, clinical domains of application, data analysis methods, and translational stage of findings. Cancer-related research employing molecular profiling and standard data analysis techniques such as clustering constitute the majority of the retrieved studies. Chronic and psychiatric diseases follow as the second most represented clinical domains. Interestingly, almost one quarter of the studies analyzed presented a novel methodology, with the most advanced employing data integration strategies and being portable to different clinical domains. Integration of such techniques into decision support systems constitutes and interesting trend for future research.
Collapse
Affiliation(s)
- E Parimbelli
- Telfer School of Management, University of Ottawa, Ottawa, Canada; Interdepartmental Centre for Health Technologies, University of Pavia, Italy.
| | - S Marini
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA; Interdepartmental Centre for Health Technologies, University of Pavia, Italy
| | - L Sacchi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy; Interdepartmental Centre for Health Technologies, University of Pavia, Italy
| | - R Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy; Interdepartmental Centre for Health Technologies, University of Pavia, Italy; RCCS ICS Maugeri, Pavia, Italy
| |
Collapse
|
44
|
Balikuddembe MS, Tumwesigye NM, Wakholi PK, Tylleskär T. Computerized Childbirth Monitoring Tools for Health Care Providers Managing Labor: A Scoping Review. JMIR Med Inform 2017; 5:e14. [PMID: 28619702 PMCID: PMC5491898 DOI: 10.2196/medinform.6959] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 02/11/2017] [Accepted: 04/11/2017] [Indexed: 11/24/2022] Open
Abstract
Background Proper monitoring of labor and childbirth prevents many pregnancy-related complications. However, monitoring is still poor in many places partly due to the usability concerns of support tools such as the partograph. In 2011, the World Health Organization (WHO) called for the development and evaluation of context-adaptable electronic health solutions to health challenges. Computerized tools have penetrated many areas of health care, but their influence in supporting health staff with childbirth seems limited. Objective The objective of this scoping review was to determine the scope and trends of research on computerized labor monitoring tools that could be used by health care providers in childbirth management. Methods We used key terms to search the Web for eligible peer-reviewed and gray literature. Eligibility criteria were a computerized labor monitoring tool for maternity service providers and dated 2006 to mid-2016. Retrieved papers were screened to eliminate ineligible papers, and consensus was reached on the papers included in the final analysis. Results We started with about 380,000 papers, of which 14 papers qualified for the final analysis. Most tools were at the design and implementation stages of development. Three papers addressed post-implementation evaluations of two tools. No documentation on clinical outcome studies was retrieved. The parameters targeted with the tools varied, but they included fetal heart (10 of 11 tools), labor progress (8 of 11), and maternal status (7 of 11). Most tools were designed for use in personal computers in low-resource settings and could be customized for different user needs. Conclusions Research on computerized labor monitoring tools is inadequate. Compared with other labor parameters, there was preponderance to fetal heart monitoring and hardly any summative evaluation of the available tools. More research, including clinical outcomes evaluation of computerized childbirth monitoring tools, is needed.
Collapse
Affiliation(s)
- Michael S Balikuddembe
- Center for International Health, University of Bergen, Bergen, Norway.,Department of Epidemiology and Biostatistics, Makerere University, Kampala, Uganda.,Department of Obstetrics & Gynaecology, Mulago National Referral and Teaching Hospital, Kampala, Uganda
| | - Nazarius M Tumwesigye
- School of Public Health, Department of Epidemiology & Biostatistics, Makerere University, Kampala, Uganda
| | - Peter K Wakholi
- School of Computing & Informatics Technology, Makerere University, Kampala, Uganda
| | | |
Collapse
|
45
|
Cano I, Tenyi A, Vela E, Miralles F, Roca J. Perspectives on Big Data applications of health information. ACTA ACUST UNITED AC 2017. [DOI: 10.1016/j.coisb.2017.04.012] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
46
|
Kim BY, Lee J. Smart Devices for Older Adults Managing Chronic Disease: A Scoping Review. JMIR Mhealth Uhealth 2017; 5:e69. [PMID: 28536089 PMCID: PMC5461419 DOI: 10.2196/mhealth.7141] [Citation(s) in RCA: 132] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2016] [Revised: 03/30/2017] [Accepted: 04/18/2017] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND The emergence of smartphones and tablets featuring vastly advancing functionalities (eg, sensors, computing power, interactivity) has transformed the way mHealth interventions support chronic disease management for older adults. Baby boomers have begun to widely adopt smart devices and have expressed their desire to incorporate technologies into their chronic care. Although smart devices are actively used in research, little is known about the extent, characteristics, and range of smart device-based interventions. OBJECTIVE We conducted a scoping review to (1) understand the nature, extent, and range of smart device-based research activities, (2) identify the limitations of the current research and knowledge gap, and (3) recommend future research directions. METHODS We used the Arksey and O'Malley framework to conduct a scoping review. We identified relevant studies from MEDLINE, Embase, CINAHL, and Web of Science databases using search terms related to mobile health, chronic disease, and older adults. Selected studies used smart devices, sampled older adults, and were published in 2010 or after. The exclusion criteria were sole reliance on text messaging (short message service, SMS) or interactive voice response, validation of an electronic version of a questionnaire, postoperative monitoring, and evaluation of usability. We reviewed references. We charted quantitative data and analyzed qualitative studies using thematic synthesis. To collate and summarize the data, we used the chronic care model. RESULTS A total of 51 articles met the eligibility criteria. Research activity increased steeply in 2014 (17/51, 33%) and preexperimental design predominated (16/50, 32%). Diabetes (16/46, 35%) and heart failure management (9/46, 20%) were most frequently studied. We identified diversity and heterogeneity in the collection of biometrics and patient-reported outcome measures within and between chronic diseases. Across studies, we found 8 self-management supporting strategies and 4 distinct communication channels for supporting the decision-making process. In particular, self-monitoring (38/40, 95%), automated feedback (15/40, 38%), and patient education (13/40, 38%) were commonly used as self-management support strategies. Of the 23 studies that implemented decision support strategies, clinical decision making was delegated to patients in 10 studies (43%). The impact on patient outcomes was consistent with studies that used cellular phones. Patients with heart failure and asthma reported improved quality of life. Qualitative analysis yielded 2 themes of facilitating technology adoption for older adults and 3 themes of barriers. CONCLUSIONS Limitations of current research included a lack of gerontological focus, dominance of preexperimental design, narrow research scope, inadequate support for participants, and insufficient evidence for clinical outcome. Recommendations for future research include generating evidence for smart device-based programs, using patient-generated data for advanced data mining techniques, validating patient decision support systems, and expanding mHealth practice through innovative technologies.
Collapse
Affiliation(s)
- Ben Yb Kim
- Health Data Science Lab, School of Public Health and Health Systems, University of Waterloo, Waterloo, ON, Canada
| | - Joon Lee
- Health Data Science Lab, School of Public Health and Health Systems, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|