1
|
Li M, Li X, Pan K, Geva A, Yang D, Sweet SM, Bonzel CL, Ayakulangara Panickan V, Xiong X, Mandl K, Cai T. Multisource representation learning for pediatric knowledge extraction from electronic health records. NPJ Digit Med 2024; 7:319. [PMID: 39533050 PMCID: PMC11558010 DOI: 10.1038/s41746-024-01320-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 10/28/2024] [Indexed: 11/16/2024] Open
Abstract
Electronic Health Record (EHR) systems are particularly valuable in pediatrics due to high barriers in clinical studies, but pediatric EHR data often suffer from low content density. Existing EHR code embeddings tailored for the general patient population fail to address the unique needs of pediatric patients. To bridge this gap, we introduce a transfer learning approach, MUltisource Graph Synthesis (MUGS), aimed at accurate knowledge extraction and relation detection in pediatric contexts. MUGS integrates graphical data from both pediatric and general EHR systems, along with hierarchical medical ontologies, to create embeddings that adaptively capture both the homogeneity and heterogeneity between hospital systems. These embeddings enable refined EHR feature engineering and nuanced patient profiling, proving particularly effective in identifying pediatric patients similar to specific profiles, with a focus on pulmonary hypertension (PH). MUGS embeddings, resistant to negative transfer, outperform other benchmark methods in multiple applications, advancing evidence-based pediatric research.
Collapse
Affiliation(s)
- Mengyan Li
- Department of Mathematical Sciences, Bentley University, Waltham, MA, USA
| | - Xiaoou Li
- School of Statistics, University of Minnesota, Minneapolis, MN, USA
| | - Kevin Pan
- Mission San Jose High School, Fremont, CA, USA
| | - Alon Geva
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, MA, USA
- Department of Anaesthesia, Harvard Medical School, Boston, MA, USA
| | - Doris Yang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Sara Morini Sweet
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | | | - Xin Xiong
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Kenneth Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
2
|
Gao J, Bonzel CL, Hong C, Varghese P, Zakir K, Gronsbell J. Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms. J Am Med Inform Assoc 2024; 31:640-650. [PMID: 38128118 PMCID: PMC10873838 DOI: 10.1093/jamia/ocad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/22/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023] Open
Abstract
OBJECTIVE High-throughput phenotyping will accelerate the use of electronic health records (EHRs) for translational research. A critical roadblock is the extensive medical supervision required for phenotyping algorithm (PA) estimation and evaluation. To address this challenge, numerous weakly-supervised learning methods have been proposed. However, there is a paucity of methods for reliably evaluating the predictive performance of PAs when a very small proportion of the data is labeled. To fill this gap, we introduce a semi-supervised approach (ssROC) for estimation of the receiver operating characteristic (ROC) parameters of PAs (eg, sensitivity, specificity). MATERIALS AND METHODS ssROC uses a small labeled dataset to nonparametrically impute missing labels. The imputations are then used for ROC parameter estimation to yield more precise estimates of PA performance relative to classical supervised ROC analysis (supROC) using only labeled data. We evaluated ssROC with synthetic, semi-synthetic, and EHR data from Mass General Brigham (MGB). RESULTS ssROC produced ROC parameter estimates with minimal bias and significantly lower variance than supROC in the simulated and semi-synthetic data. For the 5 PAs from MGB, the estimates from ssROC are 30% to 60% less variable than supROC on average. DISCUSSION ssROC enables precise evaluation of PA performance without demanding large volumes of labeled data. ssROC is also easily implementable in open-source R software. CONCLUSION When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research.
Collapse
Affiliation(s)
- Jianhui Gao
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Paul Varghese
- Health Informatics, Verily Life Sciences, Cambridge, MA, United States
| | - Karim Zakir
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
3
|
Miller TA, McMurry AJ, Jones J, Gottlieb D, Mandl KD. The SMART Text2FHIR Pipeline. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:514-520. [PMID: 38222416 PMCID: PMC10785871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Objective: To implement an open source, free, and easily deployable high throughput natural language processing module to extract concepts from clinician notes and map them to Fast Healthcare Interoperability Resources (FHIR). Materials and Methods: Using a popular open-source NLP tool (Apache cTAKES), we create FHIR resources that use modifier extensions to represent negation and NLP sourcing, and another extension to represent provenance of extracted concepts. Results: The SMART Text2FHIR Pipeline is an open-source tool, released through standard package managers, and publicly available container images that implement the mappings, enabling ready conversion of clinical text to FHIR. Discussion: With the increased data liquidity because of new interoperability regulations, NLP processes that can output FHIR can enable a common language for transporting structured and unstructured data. This framework can be valuable for critical public health or clinical research use cases. Conclusion: Future work should include mapping more categories of NLP-extracted information into FHIR resources and mappings from additional open-source NLP tools.
Collapse
Affiliation(s)
- Timothy A Miller
- Boston Children's Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - Andrew J McMurry
- Boston Children's Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - James Jones
- Boston Children's Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - Daniel Gottlieb
- Boston Children's Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - Kenneth D Mandl
- Boston Children's Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| |
Collapse
|
4
|
Wang L, Zipursky AR, Geva A, McMurry AJ, Mandl KD, Miller TA. A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital. JAMIA Open 2023; 6:ooad047. [PMID: 37425487 PMCID: PMC10322650 DOI: 10.1093/jamiaopen/ooad047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 06/13/2023] [Accepted: 06/30/2023] [Indexed: 07/11/2023] Open
Abstract
Objective To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR). Materials and Methods Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier. Results On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19. Discussion Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned. Conclusion COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts.
Collapse
Affiliation(s)
- Lijing Wang
- Department of Data Science, New Jersey Institute of Technology, Newark, New Jersey, USA
| | - Amy R Zipursky
- Computational Health Informatics Program and Department of Emergency Medicine, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Alon Geva
- Computational Health Informatics Program and Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Andrew J McMurry
- Computational Health Informatics Program, Boston Children’s Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children’s Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children’s Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
5
|
Emerson SD, McLinden T, Sereda P, Lima VD, Hogg RS, Kooij KW, Yonkman AM, Salters KA, Moore D, Toy J, Wong J, Consolacion T, Montaner JSG, Barrios R. Identification of people with low prevalence diseases in administrative healthcare records: A case study of HIV in British Columbia, Canada. PLoS One 2023; 18:e0290777. [PMID: 37651428 PMCID: PMC10470893 DOI: 10.1371/journal.pone.0290777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 08/16/2023] [Indexed: 09/02/2023] Open
Abstract
INTRODUCTION Case-finding algorithms can be applied to administrative healthcare records to identify people with diseases, including people with HIV (PWH). When supplementing an existing registry of a low prevalence disease, near-perfect specificity helps minimize impacts of adding in algorithm-identified false positive cases. We evaluated the performance of algorithms applied to healthcare records to supplement an HIV registry in British Columbia (BC), Canada. METHODS We applied algorithms based on HIV-related diagnostic codes to healthcare practitioner and hospitalization records. We evaluated 28 algorithms in a validation sub-sample of 7,124 persons with positive HIV tests (2,817 with a prior negative test) from the STOP HIV/AIDS data linkage-a linkage of healthcare, clinical, and HIV test records for PWH in BC, resembling a disease registry (1996-2020). Algorithms were primarily assessed based on their specificity-derived from this validation sub-sample-and their impact on the estimate of the total number of PWH in BC as of 2020. RESULTS In the validation sub-sample, median age at positive HIV test was 37 years (Q1: 30, Q3: 46), 80.1% were men, and 48.9% resided in the Vancouver Coastal Health Authority. For all algorithms, specificity exceeded 97% and sensitivity ranged from 81% to 95%. To supplement the HIV registry, we selected an algorithm with 99.89% (95% CI: 99.76% - 100.00%) specificity and 82.21% (95% CI: 81.26% - 83.16%) sensitivity, requiring five HIV-related healthcare practitioner encounters or two HIV-related hospitalizations within a 12-month window, or one hospitalization with HIV as the most responsible diagnosis. Upon adding PWH identified by this highly-specific algorithm to the registry, 8,774 PWH were present in BC as of March 2020, of whom 333 (3.8%) were algorithm-identified. DISCUSSION In the context of an existing low prevalence disease registry, the results of our validation study demonstrate the value of highly-specific case-finding algorithms applied to administrative healthcare records to enhance our ability to estimate the number of PWH living in BC.
Collapse
Affiliation(s)
- Scott D. Emerson
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | - Taylor McLinden
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Paul Sereda
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | - Viviane D. Lima
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- Department of Medicine, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Robert S. Hogg
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Katherine W. Kooij
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Amanda M. Yonkman
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | - Kate A. Salters
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
| | - David Moore
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- Department of Medicine, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Junine Toy
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | - Jason Wong
- British Columbia Centre for Disease Control, Vancouver, British Columbia, Canada
| | - Theodora Consolacion
- British Columbia Centre for Disease Control, Vancouver, British Columbia, Canada
| | - Julio S. G. Montaner
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- Department of Medicine, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Rolando Barrios
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
6
|
Edgcomb JB, Tseng CH, Pan M, Klomhaus A, Zima BT. Assessing Detection of Children With Suicide-Related Emergencies: Evaluation and Development of Computable Phenotyping Approaches. JMIR Ment Health 2023; 10:e47084. [PMID: 37477974 PMCID: PMC10403798 DOI: 10.2196/47084] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 05/11/2023] [Accepted: 05/29/2023] [Indexed: 07/22/2023] Open
Abstract
BACKGROUND Although suicide is a leading cause of death among children, the optimal approach for using health care data sets to detect suicide-related emergencies among children is not known. OBJECTIVE This study aimed to assess the performance of suicide-related International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) codes and suicide-related chief complaint in detecting self-injurious thoughts and behaviors (SITB) among children compared with clinician chart review. The study also aimed to examine variations in performance by child sociodemographics and type of self-injury, as well as develop machine learning models trained on codified health record data (features) and clinician chart review (gold standard) and test model detection performance. METHODS A gold standard classification of suicide-related emergencies was determined through clinician manual review of clinical notes from 600 emergency department visits between 2015 and 2019 by children aged 10 to 17 years. Visits classified with nonfatal suicide attempt or intentional self-harm using the Centers for Disease Control and Prevention surveillance case definition list of ICD-10-CM codes and suicide-related chief complaint were compared with the gold standard classification. Machine learning classifiers (least absolute shrinkage and selection operator-penalized logistic regression and random forest) were then trained and tested using codified health record data (eg, child sociodemographics, medications, disposition, and laboratory testing) and the gold standard classification. The accuracy, sensitivity, and specificity of each detection approach and relative importance of features were examined. RESULTS SITB accounted for 47.3% (284/600) of the visits. Suicide-related diagnostic codes missed nearly one-third (82/284, 28.9%) and suicide-related chief complaints missed more than half (153/284, 53.9%) of the children presenting to emergency departments with SITB. Sensitivity was significantly lower for male children than for female children (0.69, 95% CI 0.61-0.77 vs 0.84, 95% CI 0.78-0.90, respectively) and for preteens compared with adolescents (0.66, 95% CI 0.54-0.78 vs 0.86, 95% CI 0.80-0.92, respectively). Specificity was significantly lower for detecting preparatory acts (0.68, 95% CI 0.64-0.72) and attempts (0.67, 95% CI 0.63-0.71) than for detecting ideation (0.79, 95% CI 0.75-0.82). Machine learning-based models significantly improved the sensitivity of detection compared with suicide-related codes and chief complaint alone. Models considering all 84 features performed similarly to models considering only mental health-related ICD-10-CM codes and chief complaints (34 features) and models considering non-ICD-10-CM code indicators and mental health-related chief complaints (53 features). CONCLUSIONS The capacity to detect children with SITB may be strengthened by applying a machine learning-based approach to codified health record data. To improve integration between clinical research informatics and child mental health care, future research is needed to evaluate the potential benefits of implementing detection approaches at the point of care and identifying precise targets for suicide prevention interventions in children.
Collapse
Affiliation(s)
- Juliet Beni Edgcomb
- Mental Health Informatics and Data Science (MINDS) Hub, Center for Community Health, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, United States
- Department of Psychiatry, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Chi-Hong Tseng
- Department of Medicine Statistics Core, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Mengtong Pan
- Department of Medicine Statistics Core, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Alexandra Klomhaus
- Department of Medicine Statistics Core, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Bonnie T Zima
- Mental Health Informatics and Data Science (MINDS) Hub, Center for Community Health, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, United States
- Department of Psychiatry, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
7
|
Miller TA, McMurry AJ, Jones J, Gottlieb D, Mandl KD. The SMART Text2FHIR Pipeline. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.03.21.23287499. [PMID: 37034815 PMCID: PMC10081439 DOI: 10.1101/2023.03.21.23287499] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/22/2023]
Abstract
Objective To implement an open source, free, and easily deployable high throughput natural language processing module to extract concepts from clinician notes and map them to Fast Healthcare Interoperability Resources (FHIR). Materials and Methods Using a popular open-source NLP tool (Apache cTAKES), we create FHIR resources that use modifier extensions to represent negation and NLP sourcing, and another extension to represent provenance of extracted concepts. Results The SMART Text2FHIR Pipeline is an open-source tool, released through standard package managers, and publicly available container images that implement the mappings, enabling ready conversion of clinical text to FHIR. Discussion With the increased data liquidity because of new interoperability regulations, NLP processes that can output FHIR can enable a common language for transporting structured and unstructured data. This framework can be valuable for critical public health or clinical research use cases. Conclusion Future work should include mapping more categories of NLP-extracted information into FHIR resources and mappings from additional open-source NLP tools.
Collapse
Affiliation(s)
- Timothy A Miller
- Computational Health Informatics Program, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Andrew J McMurry
- Computational Health Informatics Program, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - James Jones
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
| | - Daniel Gottlieb
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Department of Pediatrics, Department of Biomedical Informatics, Harvard Medical School, 401 Park Drive, Landmark Center, 5th Floor East, Boston, MA 02215, U.S.A
| |
Collapse
|
8
|
Wang L, Zipursky A, Geva A, McMurry AJ, Mandl KD, Miller TA. A computable phenotype for patients with SARS-CoV2 testing that occurred outside the hospital. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.19.23284738. [PMID: 36711461 PMCID: PMC9882620 DOI: 10.1101/2023.01.19.23284738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Objective To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR). Materials and Methods Statistical classifiers were trained on feature representations derived from unstructured text in patient electronic health records (EHRs). We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier. Results On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 90.8% (79/87) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier identified an additional 960 positive cases that did not have SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19. Discussion Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned. Conclusion COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor intensive labeling efforts.
Collapse
|
9
|
Koscielniak N, Piatt G, Friedman C, Vinson A, Richesson R, Tucker C. Development of a standards-based phenotype model for gross motor function to support learning health systems in pediatric rehabilitation. Learn Health Syst 2022; 6:e10266. [PMID: 35036550 PMCID: PMC8753308 DOI: 10.1002/lrh2.10266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 02/19/2021] [Accepted: 03/29/2021] [Indexed: 11/17/2022] Open
Abstract
INTRODUCTION Research and continuous quality improvement in pediatric rehabilitation settings require standardized data and a systematic approach to use these data. METHODS We systematically examined pediatric data concepts from a pediatric learning network to determine capacity for capturing gross motor function (GMF) for children with Cerebral Palsy (CP) as a demonstration for enabling infrastructure for research and quality improvement activities of an LHS. We used an iterative approach to construct phenotype models of GMF from standardized data element concepts based on case definitions from the Gross Motor Function Classification System (GMFCS). Data concepts were selected using a theory and expert-informed process and resulted in the construction of four phenotype models of GMF: an overall model and three classes corresponding to deviations in GMF for CP populations. RESULTS Sixty five data element concepts were identified for the overall GMF phenotype model. The 65 data elements correspond to 20 variables and logic statements that instantiate membership into one of three clinically meaningful classes of GMF. Data element concepts and variables are organized into five domains relevant to modeling GMF: Neurologic Function, Mobility Performance, Activity Performance, Motor Performance, and Device Use. CONCLUSION Our experience provides an approach for organizations to leverage existing data for care improvement and research in other conditions. This is the first consensus-based and theory-driven specification of data elements and logic to support identification and labeling of GMF in patients for measuring improvements in care or the impact of new treatments. More research is needed to validate this phenotype model and the extent that these data differentiate between classes of GMF to support various LHS activities.
Collapse
Affiliation(s)
- Nikolas Koscielniak
- Clinical and Translational Science InstituteWake Forest University School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Gretchen Piatt
- Department of Learning Health SciencesUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Charles Friedman
- Department of Learning Health SciencesUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Alexandra Vinson
- Department of Learning Health SciencesUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Rachel Richesson
- Department of Learning Health SciencesUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Carole Tucker
- Department of Health and Rehabilitation SciencesTemple UniversityPhiladelphiaPennsylvaniaUSA
| |
Collapse
|
10
|
Ross MK, Zheng H, Zhu B, Lao A, Hong H, Natesan A, Radparvar M, Bui AAT. Accuracy of Asthma Computable Phenotypes to Identify Pediatric Asthma at an Academic Institution. Methods Inf Med 2021; 59:219-226. [PMID: 34261147 DOI: 10.1055/s-0041-1729951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
OBJECTIVES Asthma is a heterogenous condition with significant diagnostic complexity, including variations in symptoms and temporal criteria. The disease can be difficult for clinicians to diagnose accurately. Properly identifying asthma patients from the electronic health record is consequently challenging as current algorithms (computable phenotypes) rely on diagnostic codes (e.g., International Classification of Disease, ICD) in addition to other criteria (e.g., inhaler medications)-but presume an accurate diagnosis. As such, there is no universally accepted or rigorously tested computable phenotype for asthma. METHODS We compared two established asthma computable phenotypes: the Chicago Area Patient-Outcomes Research Network (CAPriCORN) and Phenotype KnowledgeBase (PheKB). We established a large-scale, consensus gold standard (n = 1,365) from the University of California, Los Angeles Health System's clinical data warehouse for patients 5 to 17 years old. Results were manually reviewed and predictive performance (positive predictive value [PPV], sensitivity/specificity, F1-score) determined. We then examined the classification errors to gain insight for future algorithm optimizations. RESULTS As applied to our final cohort of 1,365 expert-defined gold standard patients, the CAPriCORN algorithms performed with a balanced PPV = 95.8% (95% CI: 94.4-97.2%), sensitivity = 85.7% (95% CI: 83.9-87.5%), and harmonized F1 = 90.4% (95% CI: 89.2-91.7%). The PheKB algorithm was performed with a balanced PPV = 83.1% (95% CI: 80.5-85.7%), sensitivity = 69.4% (95% CI: 66.3-72.5%), and F1 = 75.4% (95% CI: 73.1-77.8%). Four categories of errors were identified related to method limitations, disease definition, human error, and design implementation. CONCLUSION The performance of the CAPriCORN and PheKB algorithms was lower than previously reported as applied to pediatric data (PPV = 97.7 and 96%, respectively). There is room to improve the performance of current methods, including targeted use of natural language processing and clinical feature engineering.
Collapse
Affiliation(s)
- Mindy K Ross
- Department of Pediatrics, University of California Los Angeles, Los Angeles, California, United States
| | - Henry Zheng
- Department of Radiological Sciences, University of California Los Angeles, Los Angeles, California, United States
| | - Bing Zhu
- Department of Radiological Sciences, University of California Los Angeles, Los Angeles, California, United States
| | - Ailina Lao
- University of California Los Angeles, Los Angeles, California, United States
| | - Hyejin Hong
- University of California Los Angeles, Los Angeles, California, United States
| | - Alamelu Natesan
- Department of Pediatrics, University of California Los Angeles, Los Angeles, California, United States
| | - Melina Radparvar
- Department of Pediatrics, University of California Los Angeles, Los Angeles, California, United States
| | - Alex A T Bui
- Department of Radiological Sciences, University of California Los Angeles, Los Angeles, California, United States
| |
Collapse
|
11
|
Levin JC, Beam AL, Fox KP, Mandl KD. Medication utilization in children born preterm in the first two years of life. J Perinatol 2021; 41:1732-1738. [PMID: 33547407 PMCID: PMC8277664 DOI: 10.1038/s41372-021-00930-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 10/12/2020] [Accepted: 01/15/2021] [Indexed: 11/22/2022]
Abstract
OBJECTIVE To compare medications dispensed during the first 2 years in children born preterm and full-term. STUDY DESIGN Retrospective analysis of claims data from a commercial national managed care plan 2008-2019. 329,855 beneficiaries were enrolled from birth through 2 years, of which 25,408 (7.7%) were preterm (<37 weeks). Filled prescription claims and paid amount over 2 years were identified. RESULTS In preterm children, the number of filled prescriptions was 1.4 times and cost was 3.8 times that of full-term children. Number and cost of medications were inversely related to gestational age. Differences peak at 4-9 months and resolve by 19 months after discharge. Palivizumab, ranitidine, albuterol, lansoprazole, budesonide, and prednisolone had the greatest differences in utilization. CONCLUSION Prescription medication utilization among preterm children under 2 years is driven by palivizumab, anti-reflux, and respiratory medications, despite little evidence regarding efficacy for many medications and concern for harm with certain classes.
Collapse
Affiliation(s)
- Jonathan C Levin
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
- Division of Newborn Medicine, Boston Children's Hospital, Boston, MA, USA.
- Division of Pulmonary Medicine, Boston Children's Hospital, Boston, MA, USA.
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA.
| | - Andrew L Beam
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Kathe P Fox
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
12
|
Geva A, Abman SH, Manzi SF, Ivy DD, Mullen MP, Griffin J, Lin C, Savova GK, Mandl KD. Adverse drug event rates in pediatric pulmonary hypertension: a comparison of real-world data sources. J Am Med Inform Assoc 2021; 27:294-300. [PMID: 31769835 PMCID: PMC7025334 DOI: 10.1093/jamia/ocz194] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 10/08/2019] [Accepted: 10/21/2019] [Indexed: 11/14/2022] Open
Abstract
Objective Real-world data (RWD) are increasingly used for pharmacoepidemiology and regulatory innovation. Our objective was to compare adverse drug event (ADE) rates determined from two RWD sources, electronic health records and administrative claims data, among children treated with drugs for pulmonary hypertension. Materials and Methods Textual mentions of medications and signs/symptoms that may represent ADEs were identified in clinical notes using natural language processing. Diagnostic codes for the same signs/symptoms were identified in our electronic data warehouse for the patients with textual evidence of taking pulmonary hypertension-targeted drugs. We compared rates of ADEs identified in clinical notes to those identified from diagnostic code data. In addition, we compared putative ADE rates from clinical notes to those from a healthcare claims dataset from a large, national insurer. Results Analysis of clinical notes identified up to 7-fold higher ADE rates than those ascertained from diagnostic codes. However, certain ADEs (eg, hearing loss) were more often identified in diagnostic code data. Similar results were found when ADE rates ascertained from clinical notes and national claims data were compared. Discussion While administrative claims and clinical notes are both increasingly used for RWD-based pharmacovigilance, ADE rates substantially differ depending on data source. Conclusion Pharmacovigilance based on RWD may lead to discrepant results depending on the data source analyzed. Further work is needed to confirm the validity of identified ADEs, to distinguish them from disease effects, and to understand tradeoffs in sensitivity and specificity between data sources.
Collapse
Affiliation(s)
- Alon Geva
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Anaesthesia, Harvard Medical School, Boston, Massachusetts, USA
| | - Steven H Abman
- Division of Pediatric Pulmonary Medicine, Children's Hospital Colorado, Aurora, Colorado, USA.,Department of Pediatrics, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Shannon F Manzi
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Division of Genetics & Genomics, Clinical Pharmacogenomics Service, Department of Pharmacy, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Dunbar D Ivy
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, Colorado, USA.,Division of Cardiology, Heart Institute, Children's Hospital Colorado, Aurora, Colorado, USA
| | - Mary P Mullen
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Cardiology, Boston Children's Hospital, Boston, Massachusetts, USA
| | - John Griffin
- Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Chen Lin
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
13
|
Geva A, Liu M, Panickan VA, Avillach P, Cai T, Mandl KD. A high-throughput phenotyping algorithm is portable from adult to pediatric populations. J Am Med Inform Assoc 2021; 28:1265-1269. [PMID: 33594412 DOI: 10.1093/jamia/ocaa343] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 11/27/2020] [Accepted: 12/28/2020] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Multimodal automated phenotyping (MAP) is a scalable, high-throughput phenotyping method, developed using electronic health record (EHR) data from an adult population. We tested transportability of MAP to a pediatric population. MATERIALS AND METHODS Without additional feature engineering or supervised training, we applied MAP to a pediatric population enrolled in a biobank and evaluated performance against physician-reviewed medical records. We also compared performance of MAP at the pediatric institution and the original adult institution where MAP was developed, including for 6 phenotypes validated at both institutions against physician-reviewed medical records. RESULTS MAP performed equally well in the pediatric setting (average AUC 0.98) as it did at the general adult hospital system (average AUC 0.96). MAP's performance in the pediatric sample was similar across the 6 specific phenotypes also validated against gold-standard labels in the adult biobank. CONCLUSIONS MAP is highly transportable across diverse populations and has potential for wide-scale use.
Collapse
Affiliation(s)
- Alon Geva
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Anaesthesia, Harvard Medical School, Boston, Massachusetts, USA
| | - Molei Liu
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Vidul A Panickan
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Paul Avillach
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
14
|
Rostam Niakan Kalhori S, Tanhapour M, Gholamzadeh M. Enhanced childhood diseases treatment using computational models: Systematic review of intelligent experiments heading to precision medicine. J Biomed Inform 2021; 115:103687. [PMID: 33497811 DOI: 10.1016/j.jbi.2021.103687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 12/05/2020] [Accepted: 01/18/2021] [Indexed: 10/22/2022]
Abstract
INTRODUCTION Precision or personalized Medicine (PM) is used for the prevention and treatment of diseases by considering a huge amount of information about individuals variables. Due to high volume of information, AI-based computational models are required. A large set of studies conducted to examine the PM approach to improve childhood clinical outcomes. Thus, the main goal of this study was to review the application of health information technology and especially artificial intelligence (AI) methods for the treatment of childhood disease using PM. METHODS PubMed, Scopus, Web of Science, and EMBASE databases were searched up to December 18, 2019. Articles that focused on informatics applications for childhood disease PM included in this study. Included papers were classified for qualitative analysis and interpreting results. The results were analyzed using Microsoft Excel 2019. RESULTS From 341 citations, 62 papers met our inclusion criteria. The number of published papers that used AI methods to apply for PM in childhood diseases increased from 2010 to 2019. Our results showed that most applied methods were related to machine learning discipline. In terms of clinical scope, the largest number of clinical articles are devoted to oncology. Besides, the analysis showed that genomics was the most PM approach used regarding childhood disease. CONCLUSION This systematic review examined papers that used AI methods for applying PM approaches in childhood diseases from medical informatics perspectives. Thus, it provided new insight to researchers who are interested in knowing research needs in this field.
Collapse
Affiliation(s)
- Sharareh Rostam Niakan Kalhori
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Mozhgan Tanhapour
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Marsa Gholamzadeh
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
15
|
Ong M, Klann JG, Lin KJ, Maron BA, Murphy SN, Natter MD, Mandl KD. Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches. J Am Heart Assoc 2020; 9:e016648. [PMID: 32990147 PMCID: PMC7792386 DOI: 10.1161/jaha.120.016648] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Background Real‐world healthcare data are an important resource for epidemiologic research. However, accurate identification of patient cohorts—a crucial first step underpinning the validity of research results—remains a challenge. We developed and evaluated claims‐based case ascertainment algorithms for pulmonary hypertension (PH), comparing conventional decision rules with state‐of‐the‐art machine‐learning approaches. Methods and Results We analyzed an electronic health record‐Medicare linked database from two large academic tertiary care hospitals (years 2007–2013). Electronic health record charts were reviewed to form a gold standard cohort of patients with (n=386) and without PH (n=164). Using health encounter data captured in Medicare claims (including patients’ demographics, diagnoses, medications, and procedures), we developed and compared 2 approaches for identifying patients with PH: decision rules and machine‐learning algorithms using penalized lasso regression, random forest, and gradient boosting machine. The most optimal rule‐based algorithm—having ≥3 PH‐related healthcare encounters and having undergone right heart catheterization—attained an area under the receiver operating characteristic curve of 0.64 (sensitivity, 0.75; specificity, 0.48). All 3 machine‐learning algorithms outperformed the most optimal rule‐based algorithm (P<0.001). A model derived from the random forest algorithm achieved an area under the receiver operating characteristic curve of 0.88 (sensitivity, 0.87; specificity, 0.70), and gradient boosting machine achieved comparable results (area under the receiver operating characteristic curve, 0.85; sensitivity, 0.87; specificity, 0.70). Penalized lasso regression achieved an area under the receiver operating characteristic curve of 0.73 (sensitivity, 0.70; specificity, 0.68). Conclusions Research‐grade case identification algorithms for PH can be derived and rigorously validated using machine‐learning algorithms. Simple decision rules commonly applied in published literature performed poorly; more complex rule‐based algorithms may potentially address the limitation of this approach. PH research using claims data would be considerably strengthened through the use of validated algorithms for cohort ascertainment.
Collapse
Affiliation(s)
- Mei‐Sing Ong
- Department of Population MedicineHarvard Medical School &Harvard Pilgrim Health Care InstituteBostonMA
- Computational Health Informatics ProgramBoston Children’s HospitalBostonMA
| | - Jeffrey G. Klann
- Laboratory of Computer ScienceMassachusetts General HospitalHarvard Medical SchoolBostonMA
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemiology and PharmacoeconomicsDepartment of MedicineBrigham and Women’s HospitalHarvard Medical SchoolBostonMA
| | - Bradley A. Maron
- Cardiovascular DivisionDepartment of MedicineBrigham and Women’s HospitalHarvard Medical SchoolBostonMA
| | - Shawn N. Murphy
- Department of NeurologyMassachusetts General Hospital, Harvard Medical SchoolBostonMA
| | - Marc D. Natter
- Computational Health Informatics ProgramBoston Children’s HospitalBostonMA
- Department of PediatricsHarvard Medical SchoolBostonMA
| | - Kenneth D. Mandl
- Computational Health Informatics ProgramBoston Children’s HospitalBostonMA
- Department of PediatricsHarvard Medical SchoolBostonMA
- Department of Biomedical InformaticsHarvard Medical SchoolBostonMA
| |
Collapse
|
16
|
Geva A, Stedman JP, Manzi SF, Lin C, Savova GK, Avillach P, Mandl KD. Adverse drug event presentation and tracking (ADEPT): semiautomated, high throughput pharmacovigilance using real-world data. JAMIA Open 2020; 3:413-421. [PMID: 33215076 PMCID: PMC7660953 DOI: 10.1093/jamiaopen/ooaa031] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 06/23/2020] [Accepted: 06/27/2020] [Indexed: 11/24/2022] Open
Abstract
Objective To advance use of real-world data (RWD) for pharmacovigilance, we sought to integrate a high-sensitivity natural language processing (NLP) pipeline for detecting potential adverse drug events (ADEs) with easily interpretable output for high-efficiency human review and adjudication of true ADEs. Materials and methods The adverse drug event presentation and tracking (ADEPT) system employs an open source NLP pipeline to identify in clinical notes mentions of medications and signs and symptoms potentially indicative of ADEs. ADEPT presents the output to human reviewers by highlighting these drug-event pairs within the context of the clinical note. To measure incidence of seizures associated with sildenafil, we applied ADEPT to 149 029 notes for 982 patients with pediatric pulmonary hypertension. Results Of 416 patients identified as taking sildenafil, NLP found 72 [17%, 95% confidence interval (CI) 14–21] with seizures as a potential ADE. Upon human review and adjudication, only 4 (0.96%, 95% CI 0.37–2.4) patients with seizures were determined to have true ADEs. Reviewers using ADEPT required a median of 89 s (interquartile range 57–142 s) per patient to review potential ADEs. Discussion ADEPT combines high throughput NLP to increase sensitivity of ADE detection and human review, to increase specificity by differentiating true ADEs from signs and symptoms related to comorbidities, effects of other medications, or other confounders. Conclusion ADEPT is a promising tool for creating gold standard, patient-level labels for advancing NLP-based pharmacovigilance. ADEPT is a potentially time savings platform for computer-assisted pharmacovigilance based on RWD.
Collapse
Affiliation(s)
- Alon Geva
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Anaesthesia, Harvard Medical School, Boston, Massachusetts, USA
| | - Jason P Stedman
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Shannon F Manzi
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Clinical Pharmacogenomics Service, Division of Genetics & Genomics and Department of Pharmacy, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Chen Lin
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Paul Avillach
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
17
|
Electronic health records for the diagnosis of rare diseases. Kidney Int 2020; 97:676-686. [DOI: 10.1016/j.kint.2019.11.037] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 11/15/2019] [Accepted: 11/22/2019] [Indexed: 01/13/2023]
|
18
|
Brasil S, Pascoal C, Francisco R, dos Reis Ferreira V, A. Videira P, Valadão G. Artificial Intelligence (AI) in Rare Diseases: Is the Future Brighter? Genes (Basel) 2019; 10:genes10120978. [PMID: 31783696 PMCID: PMC6947640 DOI: 10.3390/genes10120978] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 11/19/2019] [Accepted: 11/20/2019] [Indexed: 02/06/2023] Open
Abstract
The amount of data collected and managed in (bio)medicine is ever-increasing. Thus, there is a need to rapidly and efficiently collect, analyze, and characterize all this information. Artificial intelligence (AI), with an emphasis on deep learning, holds great promise in this area and is already being successfully applied to basic research, diagnosis, drug discovery, and clinical trials. Rare diseases (RDs), which are severely underrepresented in basic and clinical research, can particularly benefit from AI technologies. Of the more than 7000 RDs described worldwide, only 5% have a treatment. The ability of AI technologies to integrate and analyze data from different sources (e.g., multi-omics, patient registries, and so on) can be used to overcome RDs’ challenges (e.g., low diagnostic rates, reduced number of patients, geographical dispersion, and so on). Ultimately, RDs’ AI-mediated knowledge could significantly boost therapy development. Presently, there are AI approaches being used in RDs and this review aims to collect and summarize these advances. A section dedicated to congenital disorders of glycosylation (CDG), a particular group of orphan RDs that can serve as a potential study model for other common diseases and RDs, has also been included.
Collapse
Affiliation(s)
- Sandra Brasil
- Portuguese Association for CDG, 2820-381 Lisboa, Portugal; (S.B.); (C.P.); (R.F.); (P.A.V.)
- CDG & Allies—Professionals and Patient Associations International Network (CDG & Allies—PPAIN), Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Lisboa, Portugal
| | - Carlota Pascoal
- Portuguese Association for CDG, 2820-381 Lisboa, Portugal; (S.B.); (C.P.); (R.F.); (P.A.V.)
- CDG & Allies—Professionals and Patient Associations International Network (CDG & Allies—PPAIN), Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Lisboa, Portugal
- UCIBIO, Departamento Ciências da Vida, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Lisboa, Portugal
| | - Rita Francisco
- Portuguese Association for CDG, 2820-381 Lisboa, Portugal; (S.B.); (C.P.); (R.F.); (P.A.V.)
- CDG & Allies—Professionals and Patient Associations International Network (CDG & Allies—PPAIN), Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Lisboa, Portugal
- UCIBIO, Departamento Ciências da Vida, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Lisboa, Portugal
| | - Vanessa dos Reis Ferreira
- Portuguese Association for CDG, 2820-381 Lisboa, Portugal; (S.B.); (C.P.); (R.F.); (P.A.V.)
- CDG & Allies—Professionals and Patient Associations International Network (CDG & Allies—PPAIN), Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Lisboa, Portugal
- Correspondence:
| | - Paula A. Videira
- Portuguese Association for CDG, 2820-381 Lisboa, Portugal; (S.B.); (C.P.); (R.F.); (P.A.V.)
- CDG & Allies—Professionals and Patient Associations International Network (CDG & Allies—PPAIN), Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Lisboa, Portugal
- UCIBIO, Departamento Ciências da Vida, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Lisboa, Portugal
| | - Gonçalo Valadão
- Instituto de Telecomunicações, 1049-001 Lisboa, Portugal;
- Departamento de Ciências e Tecnologias, Autónoma Techlab–Universidade Autónoma de Lisboa, 1169-023 Lisboa, Portugal
- Electronics, Telecommunications and Computers Engineering Department, Instituto Superior de Engenharia de Lisboa, 1959-007 Lisboa, Portugal
| |
Collapse
|
19
|
High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc 2019; 14:3426-3444. [PMID: 31748751 DOI: 10.1038/s41596-019-0227-6] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 07/22/2019] [Indexed: 01/12/2023]
Abstract
Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).
Collapse
|
20
|
Denburg MR, Razzaghi H, Bailey LC, Soranno DE, Pollack AH, Dharnidharka VR, Mitsnefes MM, Smoyer WE, Somers MJG, Zaritsky JJ, Flynn JT, Claes DJ, Dixon BP, Benton M, Mariani LH, Forrest CB, Furth SL. Using Electronic Health Record Data to Rapidly Identify Children with Glomerular Disease for Clinical Research. J Am Soc Nephrol 2019; 30:2427-2435. [PMID: 31732612 DOI: 10.1681/asn.2019040365] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 08/27/2019] [Indexed: 11/03/2022] Open
Abstract
BACKGROUND The rarity of pediatric glomerular disease makes it difficult to identify sufficient numbers of participants for clinical trials. This leaves limited data to guide improvements in care for these patients. METHODS The authors developed and tested an electronic health record (EHR) algorithm to identify children with glomerular disease. We used EHR data from 231 patients with glomerular disorders at a single center to develop a computerized algorithm comprising diagnosis, kidney biopsy, and transplant procedure codes. The algorithm was tested using PEDSnet, a national network of eight children's hospitals with data on >6.5 million children. Patients with three or more nephrologist encounters (n=55,560) not meeting the computable phenotype definition of glomerular disease were defined as nonglomerular cases. A reviewer blinded to case status used a standardized form to review random samples of cases (n=800) and nonglomerular cases (n=798). RESULTS The final algorithm consisted of two or more diagnosis codes from a qualifying list or one diagnosis code and a pretransplant biopsy. Performance characteristics among the population with three or more nephrology encounters were sensitivity, 96% (95% CI, 94% to 97%); specificity, 93% (95% CI, 91% to 94%); positive predictive value (PPV), 89% (95% CI, 86% to 91%); negative predictive value, 97% (95% CI, 96% to 98%); and area under the receiver operating characteristics curve, 94% (95% CI, 93% to 95%). Requiring that the sum of nephrotic syndrome diagnosis codes exceed that of glomerulonephritis codes identified children with nephrotic syndrome or biopsy-based minimal change nephropathy, FSGS, or membranous nephropathy, with 94% sensitivity and 92% PPV. The algorithm identified 6657 children with glomerular disease across PEDSnet, ≥50% of whom were seen within 18 months. CONCLUSIONS The authors developed an EHR-based algorithm and demonstrated that it had excellent classification accuracy across PEDSnet. This tool may enable faster identification of cohorts of pediatric patients with glomerular disease for observational or prospective studies.
Collapse
Affiliation(s)
- Michelle R Denburg
- Division of Nephrology, .,Department of Pediatrics and.,Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania.,Center for Pediatric Clinical Effectiveness
| | | | - L Charles Bailey
- Department of Pediatrics and.,Applied Clinical Research Center, and.,Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
| | - Danielle E Soranno
- Renal Section, Department of Pediatrics, University of Colorado School of Medicine, Aurora, Colorado
| | - Ari H Pollack
- Division of Nephrology, Department of Pediatrics, Seattle Children's Hospital, University of Washington, Seattle, Washington
| | - Vikas R Dharnidharka
- Division of Nephrology, Department of Pediatrics, St. Louis Children's Hospital, Washington University in St. Louis, St. Louis, Missouri
| | - Mark M Mitsnefes
- Division of Nephrology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio
| | - William E Smoyer
- Division of Nephrology, Department of Pediatrics, Nationwide Children's Hospital, The Ohio State University, Columbus, Ohio
| | - Michael J G Somers
- Division of Nephrology, Department of Medicine, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Joshua J Zaritsky
- Division of Nephrology, Nemours/Alfred I. DuPont Hospital for Children, Wilmington, Delaware; and
| | - Joseph T Flynn
- Division of Nephrology, Department of Pediatrics, Seattle Children's Hospital, University of Washington, Seattle, Washington
| | - Donna J Claes
- Division of Nephrology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio
| | - Bradley P Dixon
- Renal Section, Department of Pediatrics, University of Colorado School of Medicine, Aurora, Colorado
| | | | - Laura H Mariani
- Division of Nephrology, Department of Medicine, University of Michigan, Ann Arbor, Michigan
| | - Christopher B Forrest
- Department of Pediatrics and.,Applied Clinical Research Center, and.,Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
| | - Susan L Furth
- Division of Nephrology.,Department of Pediatrics and.,Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
21
|
Hansmann G, Koestenberger M, Alastalo TP, Apitz C, Austin ED, Bonnet D, Budts W, D'Alto M, Gatzoulis MA, Hasan BS, Kozlik-Feldmann R, Kumar RK, Lammers AE, Latus H, Michel-Behnke I, Miera O, Morrell NW, Pieles G, Quandt D, Sallmon H, Schranz D, Tran-Lundmark K, Tulloh RMR, Warnecke G, Wåhlander H, Weber SC, Zartner P. 2019 updated consensus statement on the diagnosis and treatment of pediatric pulmonary hypertension: The European Pediatric Pulmonary Vascular Disease Network (EPPVDN), endorsed by AEPC, ESPR and ISHLT. J Heart Lung Transplant 2019; 38:879-901. [PMID: 31495407 DOI: 10.1016/j.healun.2019.06.022] [Citation(s) in RCA: 268] [Impact Index Per Article: 44.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 06/14/2019] [Accepted: 06/15/2019] [Indexed: 02/03/2023] Open
Abstract
The European Pediatric Pulmonary Vascular Disease Network is a registered, non-profit organization that strives to define and develop effective, innovative diagnostic methods and treatment options in all forms of pediatric pulmonary hypertensive vascular disease, including pulmonary hypertension (PH) associated with bronchopulmonary dysplasia, PH associated with congenital heart disease (CHD), persistent PH of the newborn, and related cardiac dysfunction. The executive writing group members conducted searches of the PubMed/MEDLINE bibliographic database (1990-2018) and held face-to-face and web-based meetings. Ten section task forces voted on the updated recommendations, based on the 2016 executive summary. Clinical trials, meta-analyses, guidelines, and other articles that include pediatric data were searched using the term "pulmonary hypertension" and other keywords. Class of recommendation (COR) and level of evidence (LOE) were assigned based on European Society of Cardiology/American Heart Association definitions and on pediatric data only, or on adult studies that included >10% children or studies that enrolled adults with CHD. New definitions by the World Symposium on Pulmonary Hypertension 2018 were included. We generated 10 tables with graded recommendations (COR/LOE). The topics include diagnosis/monitoring, genetics/biomarkers, cardiac catheterization, echocardiography, cardiac magnetic resonance/chest computed tomography, associated forms of PH, intensive care unit/lung transplantation, and treatment of pediatric PH. For the first time, a set of specific recommendations on the management of PH in middle- and low-income regions was developed. Taken together, these executive, up-to-date guidelines provide a specific, comprehensive, detailed but practical framework for the optimal clinical care of children and young adults with PH.
Collapse
Affiliation(s)
- Georg Hansmann
- Department of Pediatric Cardiology and Critical Care, Hannover Medical School, Hannover, Germany.
| | - Martin Koestenberger
- Division of Pediatric Cardiology, Department of Pediatrics, Medical University Graz, Graz, Austria
| | | | - Christian Apitz
- Division of Pediatric Cardiology, Children's University Hospital Ulm, Ulm, Germany
| | - Eric D Austin
- Department of Pediatrics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Damien Bonnet
- Unité Médico-Chirurgicale de Cardiologie Congénital et Pédiatrique, Hôspital Necker Enfants Malades, Université Paris Descartes, Sorbonne, Paris, France
| | - Werner Budts
- Congenital and Structural Cardiology, University Hospitals Leuven, Leuven, Belgium
| | - Michele D'Alto
- Cardiology, University L. Vanvitelli - Monaldi Hospital, Naples, Italy
| | - Michael A Gatzoulis
- Adult Congenital Heart Centre and National Centre for Pulmonary Hypertension, Royal Brompton Hospital, London, United Kingdom
| | - Babar S Hasan
- Department of Pediatrics and Child Health, The Aga Khan University, Karachi, Pakistan
| | - Rainer Kozlik-Feldmann
- Department of Pediatric Cardiology, University Heart Center, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - R Krishna Kumar
- Department of Pediatric Cardiology, Amrita Institute of Medical Sciences, Amrita Vishwa Vidyapeetham, Kochi, Kerala, India
| | - Astrid E Lammers
- Department of Pediatric Cardiology, University of Münster, Münster, Germany
| | - Heiner Latus
- Department of Paediatric Cardiology and Congenital Heart Defects, German Heart Centre, Munich, Germany
| | - Ina Michel-Behnke
- Pediatric Heart Center, Division of Pediatric Cardiology, University Hospital for Children and Adolescents, Medical University Vienna, Vienna, Austria
| | - Oliver Miera
- Department of Congenital Heart Disease and Pediatric Cardiology, German Heart Institute Berlin (DHZB), Berlin, Germany
| | - Nicholas W Morrell
- Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, United Kingdom
| | - Guido Pieles
- National Institute for Health Research (NIHR) Cardiovascular Biomedical Research Centre, Congenital Heart Unit, Bristol Royal Hospital for Children and Bristol Heart Institute, Bristol, United Kingdom
| | - Daniel Quandt
- Pediatric Cardiology, Pediatric Heart Center, Department of Surgery, University Children's Hospital Zurich, Zurich, Switzerland
| | - Hannes Sallmon
- Department of Pediatric Cardiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Dietmar Schranz
- Hessen Pediatric Heart Center Giessen & Frankfurt, Goethe University Frankfurt, Frankfurt, Germany
| | - Karin Tran-Lundmark
- The Pediatric Heart Center and the Department of Experimental Medical Science, Lund University, Lund, Sweden
| | - Robert M R Tulloh
- Bristol Heart Institute, University Hospitals Bristol, Bristol, United Kingdom
| | - Gregor Warnecke
- Department of Cardiothoracic, Transplantation and Vascular Surgery, Hannover Medical School, Hannover, Germany
| | - Håkan Wåhlander
- The Queen Silvia Children's Hospital, Sahlgrenska University Hospital, Institution of Clinical Sciences, Gothenburg University, Gothenburg, Sweden
| | - Sven C Weber
- Department of Pediatric Cardiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Peter Zartner
- Department of Paediatric Cardiology, German Pediatric Heart Centre, Sankt Augustin, Germany
| |
Collapse
|
22
|
Gleason KT, Dennison Himmelfarb CR, Ford DE, Lehmann H, Samuel L, Han HR, Jain SK, Naccarelli GV, Aggarwal V, Nazarian S. Association of sex, age and education level with patient reported outcomes in atrial fibrillation. BMC Cardiovasc Disord 2019; 19:85. [PMID: 30953478 PMCID: PMC6451250 DOI: 10.1186/s12872-019-1059-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 03/21/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND In atrial fibrillation (AF), there are known sex and sociodemographic disparities in clinical outcomes such as stroke. We investigate whether disparities also exist with respect to patient-reported outcomes. We explored the association of sex, age, and education level with patient-reported outcomes (AF-related quality of life, symptom severity, and emotional and functional status). METHODS The PaTH AF cohort study recruited participants (N = 953) with an AF diagnosis and age ≥ 18 years across 4 academic medical centers. We performed longitudinal multiple regression with random effects to determine if individual characteristics were associated with patient-reported outcomes. RESULTS Women reported poorer functional status (β - 2.23, 95% CI: -3.52, - 0.94) and AF-related quality of life (β - 4.12, 95% CI: -8.10, - 0.14), and higher symptoms of anxiety (β 2.08, 95% CI: 0.76, 3.40), depression (β 1.44, 95% CI: 0.25, 2.63), and AF (β 0.29, 95% CI: 0.08, 0.50). Individuals < 60 years were significantly (p < 0.05) more likely to report higher symptoms of depression, anxiety, and AF, and poorer AF-related quality of life. Lack of college education was associated with reporting higher symptoms of AF (β 0.42, 95% CI: 0.17, 0.68), anxiety (β 1.86, 95% CI: 0.26, 3.45), and depression (β 1.11, 95% CI: 0.15, 2.38), and lower AF-related quality of life (β - 4.41, 95% CI: -8.25, - 0.57) and functional status. CONCLUSION Women, younger adults, and individuals with lower levels of education reported comparatively poor patient-reported outcomes. These findings highlight the importance of understanding why individuals experience AF differently based on certain characteristics.
Collapse
Affiliation(s)
- Kelly T. Gleason
- School of Nursing, Johns Hopkins University, 525 N Wolfe Street, Baltimore, MD 21205 USA
| | | | - Daniel E. Ford
- School of Medicine, Johns Hopkins University, Baltimore, MD USA
| | - Harold Lehmann
- School of Medicine, Johns Hopkins University, Baltimore, MD USA
| | - Laura Samuel
- School of Nursing, Johns Hopkins University, 525 N Wolfe Street, Baltimore, MD 21205 USA
| | - Hae Ra Han
- School of Nursing, Johns Hopkins University, 525 N Wolfe Street, Baltimore, MD 21205 USA
| | - Sandeep K. Jain
- School of Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA USA
| | | | - Vikas Aggarwal
- University of Michigan Health System/Frankel Cardiovascular Center, Ann Harbor, MI USA
| | - Saman Nazarian
- School of Medicine, Johns Hopkins University, Baltimore, MD USA
- School of Medicine, University of Pennsylvania, Philadelphia, PA USA
| |
Collapse
|
23
|
Identification of patients with hemoglobin SS/Sβ 0 thalassemia disease and pain crises within electronic health records. Blood Adv 2019; 2:1172-1179. [PMID: 29792312 DOI: 10.1182/bloodadvances.2018017541] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 04/16/2018] [Indexed: 11/20/2022] Open
Abstract
Electronic health records (EHRs) are a source of big data that provide opportunities for conducting population-based studies and creating learning health systems, especially for rare conditions such as sickle cell disease (SCD). The objective of our study is to validate algorithms for accurate identification of patients with hemoglobin (Hb) SS/Sβ0 thalassemia and acute care encounters for pain among SCD patients within EHR warehouse. We used data for children receiving care at Children's Hospital of Wisconsin from 2013 to 2016 to test the accuracy of the 2 algorithms. The algorithm for genotype identification used composite information (blood test results, transcranial Doppler) along with diagnoses codes. Acute pain encounters were identified using diagnoses codes and further refined by using prescription of IV pain medications. Sensitivities and specificities were calculated for the algorithms. Predictive values for the algorithm to identify SCD genotype were calculated. For all assessments, the local SCD registry and patients' charts were considered gold standards. These included 360 children with SCD, of whom 51% were females. Our algorithm to identify patients with HbSS/Sβ0 thalassemia demonstrated sensitivity of 89.9% (confidence interval [CI], 85.1%-93.7%) and specificity of 97.1% (CI, 92.7%-99.2%). This algorithm had a positive and negative predictive value of 97.9% (CI, 94.8%-99.9%) and 88.7% (CI, 82.6%-93.3%), respectively. Acute pain crises encounters were identified with a sensitivity and specificity of 95.1% (CI, 86.3%-99.0%) and 96.1% (CI, 88.3%-99.6%). This study demonstrates the feasibility to accurately identify patients with specific types of SCD and pain crises within an EHR.
Collapse
|
24
|
Ning W, Chan S, Beam A, Yu M, Geva A, Liao K, Mullen M, Mandl KD, Kohane I, Cai T, Yu S. Feature extraction for phenotyping from semantic and knowledge resources. J Biomed Inform 2019; 91:103122. [PMID: 30738949 PMCID: PMC6424621 DOI: 10.1016/j.jbi.2019.103122] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
OBJECTIVE Phenotyping algorithms can efficiently and accurately identify patients with a specific disease phenotype and construct electronic health records (EHR)-based cohorts for subsequent clinical or genomic studies. Previous studies have introduced unsupervised EHR-based feature selection methods that yielded algorithms with high accuracy. However, those selection methods still require expert intervention to tweak the parameter settings according to the EHR data distribution for each phenotype. To further accelerate the development of phenotyping algorithms, we propose a fully automated and robust unsupervised feature selection method that leverages only publicly available medical knowledge sources, instead of EHR data. METHODS SEmantics-Driven Feature Extraction (SEDFE) collects medical concepts from online knowledge sources as candidate features and gives them vector-form distributional semantic representations derived with neural word embedding and the Unified Medical Language System Metathesaurus. A number of features that are semantically closest and that sufficiently characterize the target phenotype are determined by a linear decomposition criterion and are selected for the final classification algorithm. RESULTS SEDFE was compared with the EHR-based SAFE algorithm and domain experts on feature selection for the classification of five phenotypes including coronary artery disease, rheumatoid arthritis, Crohn's disease, ulcerative colitis, and pediatric pulmonary arterial hypertension using both supervised and unsupervised approaches. Algorithms yielded by SEDFE achieved comparable accuracy to those yielded by SAFE and expert-curated features. SEDFE is also robust to the input semantic vectors. CONCLUSION SEDFE attains satisfying performance in unsupervised feature selection for EHR phenotyping. Both fully automated and EHR-independent, this method promises efficiency and accuracy in developing algorithms for high-throughput phenotyping.
Collapse
Affiliation(s)
- Wenxin Ning
- Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Stephanie Chan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Andrew Beam
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Ming Yu
- Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Alon Geva
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA; Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, MA, USA; Department of Anesthesia, Harvard Medical School, Boston, MA, USA
| | - Katherine Liao
- Department of Medicine, Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Mary Mullen
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Isaac Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Sheng Yu
- Center for Statistical Science, Tsinghua University, Beijing, China; Department of Industrial Engineering, Tsinghua University, Beijing, China; Institute for Data Science, Tsinghua University, Beijing, China.
| |
Collapse
|
25
|
Papani R, Sharma G, Agarwal A, Callahan SJ, Chan WJ, Kuo YF, Shim YM, Mihalek AD, Duarte AG. Validation of claims-based algorithms for pulmonary arterial hypertension. Pulm Circ 2018; 8:2045894018759246. [PMID: 29480064 PMCID: PMC5833187 DOI: 10.1177/2045894018759246] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Administrative claims studies do not adequately distinguish pulmonary arterial hypertension (PAH) from other forms of pulmonary hypertension (PH). Our aim is to develop and validate a set of algorithms using International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes and electronic medical records (EMR), to identify patients with PAH. From January 2012 to August 2015, the EMRs of patients with ICD-9-CM codes for PH with an outpatient visit at the University of Texas Medical Branch were reviewed. Patients were divided into PAH or non-PAH groups according to EMR encounter diagnosis. Patient demographics, echocardiography, right heart catheterization (RHC) results, and PAH-specific therapies were assessed. RHC measurements were reviewed to categorize cases as hemodynamically determined PAH or not PAH. Weighted sensitivity, specificity, and positive and negative predictive values were calculated for the developed algorithms. A logistic regression analysis was conducted to determine how well the algorithms performed. External validation was performed at the University of Virginia Health System. The cohort for the development algorithms consisted of 683 patients with PH, PAH group (n = 191) and non-PAH group (n = 492). A hemodynamic diagnosis of PAH determined by RHC was recorded in the PAH (26%) and non-PAH (3%) groups. The positive predictive value for the algorithm that included ICD-9-CM and PAH-specific medications was 66.9% and sensitivity was 28.2% with a c-statistic of 0.66. The positive predictive value for the EMR-based algorithm that included ICD-9-CM, EMR encounter diagnosis, echocardiography, RHC, and PAH-specific medication was 69.4% and a c-statistic of 0.87. A validation cohort of 177 patients with PH examined from August 2015 to August 2016 using EMR-based algorithms yielded a similar positive predictive value of 62.5%. In conclusion, claims-based algorithms that included ICD-9-CM codes, EMR encounter diagnosis, echocardiography, RHC, and PAH-specific medications better-identified patients with PAH than ICD-9-CM codes alone.
Collapse
Affiliation(s)
- Ravikanth Papani
- 1 Division of Pulmonary, Critical Care, and Sleep Medicine, University of Texas Medical Branch, Galveston, TX, USA
| | - Gulshan Sharma
- 1 Division of Pulmonary, Critical Care, and Sleep Medicine, University of Texas Medical Branch, Galveston, TX, USA
| | - Amitesh Agarwal
- 2 Division of Pulmonary, Critical Care, and Sleep Medicine, University of Florida College of Medicine, Jacksonville, FL, USA
| | - Sean J Callahan
- 3 Division of Pulmonary and Critical Care Medicine, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Winston J Chan
- 4 Office of Biostatistics, Department of Preventive Medicine and Community Health, University of Texas Medical Branch, Galveston, TX, USA
| | - Yong-Fang Kuo
- 4 Office of Biostatistics, Department of Preventive Medicine and Community Health, University of Texas Medical Branch, Galveston, TX, USA
| | - Yun M Shim
- 3 Division of Pulmonary and Critical Care Medicine, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Andrew D Mihalek
- 3 Division of Pulmonary and Critical Care Medicine, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Alexander G Duarte
- 1 Division of Pulmonary, Critical Care, and Sleep Medicine, University of Texas Medical Branch, Galveston, TX, USA
| |
Collapse
|