1
|
Azhir A, Hügel J, Tian J, Cheng J, Bassett IV, Bell DS, Bernstam EV, Farhat MR, Henderson DW, Lau ES, Morris M, Semenov YR, Triant VA, Visweswaran S, Strasser ZH, Klann JG, Murphy SN, Estiri H. Precision Phenotyping for Curating Research Cohorts of Patients with Post-Acute Sequelae of COVID-19 (PASC) as a Diagnosis of Exclusion. medRxiv 2024:2024.04.13.24305771. [PMID: 38699316 PMCID: PMC11065031 DOI: 10.1101/2024.04.13.24305771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Scalable identification of patients with the post-acute sequelae of COVID-19 (PASC) is challenging due to a lack of reproducible precision phenotyping algorithms and the suboptimal accuracy, demographic biases, and underestimation of the PASC diagnosis code (ICD-10 U09.9). In a retrospective case-control study, we developed a precision phenotyping algorithm for identifying research cohorts of PASC patients, defined as a diagnosis of exclusion. We used longitudinal electronic health records (EHR) data from over 295 thousand patients from 14 hospitals and 20 community health centers in Massachusetts. The algorithm employs an attention mechanism to exclude sequelae that prior conditions can explain. We performed independent chart reviews to tune and validate our precision phenotyping algorithm. Our PASC phenotyping algorithm improves precision and prevalence estimation and reduces bias in identifying Long COVID patients compared to the U09.9 diagnosis code. Our algorithm identified a PASC research cohort of over 24 thousand patients (compared to about 6 thousand when using the U09.9 diagnosis code), with a 79.9 percent precision (compared to 77.8 percent from the U09.9 diagnosis code). Our estimated prevalence of PASC was 22.8 percent, which is close to the national estimates for the region. We also provide an in-depth analysis outlining the clinical attributes, encompassing identified lingering effects by organ, comorbidity profiles, and temporal differences in the risk of PASC. The PASC phenotyping method presented in this study boasts superior precision, accurately gauges the prevalence of PASC without underestimating it, and exhibits less bias in pinpointing Long COVID patients. The PASC cohort derived from our algorithm will serve as a springboard for delving into Long COVID's genetic, metabolomic, and clinical intricacies, surmounting the constraints of recent PASC cohort studies, which were hampered by their limited size and available outcome data.
Collapse
|
2
|
Foer D, Strasser ZH, Cui J, Cahill KN, Boyce JA, Murphy SN, Karlson EW. Reply to Li et al.. Am J Respir Crit Care Med 2023; 208:1346-1347. [PMID: 37855723 PMCID: PMC10765385 DOI: 10.1164/rccm.202310-1721le] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 10/18/2023] [Indexed: 10/20/2023] Open
Affiliation(s)
- Dinah Foer
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Zachary H. Strasser
- Harvard Medical School, Boston, Massachusetts
- MGH Laboratory of Computer Science and
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | - Jing Cui
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Katherine N. Cahill
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee; and
| | - Joshua A. Boyce
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Shawn N. Murphy
- Harvard Medical School, Boston, Massachusetts
- MGH Laboratory of Computer Science and
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Elizabeth W. Karlson
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
3
|
Foer D, Strasser ZH, Cui J, Cahill KN, Boyce JA, Murphy SN, Karlson EW. Association of GLP-1 Receptor Agonists with Chronic Obstructive Pulmonary Disease Exacerbations among Patients with Type 2 Diabetes. Am J Respir Crit Care Med 2023; 208:1088-1100. [PMID: 37647574 PMCID: PMC10867930 DOI: 10.1164/rccm.202303-0491oc] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 08/30/2023] [Indexed: 09/01/2023] Open
Abstract
Rationale: Patients with chronic obstructive pulmonary disease (COPD) and type 2 diabetes (T2D) have worse clinical outcomes compared with patients without metabolic dysregulation. GLP-1 (glucagon-like peptide 1) receptor agonists (GLP-1RAs) reduce asthma exacerbation risk and improve FVC in patients with COPD. Objectives: To determine whether GLP-1RA use is associated with reduced COPD exacerbation rates, and severe and moderate exacerbation risk, compared with other T2D therapies. Methods: A retrospective, observational, electronic health records-based study was conducted using an active comparator, new-user design of 1,642 patients with COPD in a U.S. health system from 2012 to 2022. The COPD cohort was identified using a previously validated machine learning algorithm that includes a natural language processing tool. Exposures were defined as prescriptions for GLP-1RAs (reference group), DPP-4 (dipeptidyl peptidase 4) inhibitors (DPP-4is), SGLT2 (sodium-glucose cotransporter 2) inhibitors, or sulfonylureas. Measurements and Main Results: Unadjusted COPD exacerbation counts were lower in GLP-1RA users. Adjusted exacerbation rates were significantly higher in DPP-4i (incidence rate ratio, 1.48 [95% confidence interval, 1.08-2.04]; P = 0.02) and sulfonylurea (incidence rate ratio, 2.09 [95% confidence interval, 1.62-2.69]; P < 0.0001) users compared with GLP-1RA users. GLP-1RA use was also associated with significantly reduced risk of severe exacerbations compared with DPP-4i and sulfonylurea use, and of moderate exacerbations compared with sulfonylurea use. After adjustment for clinical covariates, moderate exacerbation risk was also lower in GLP-1RA users compared with DPP-4i users. No statistically significant difference in exacerbation outcomes was seen between GLP-1RA and SGLT2 inhibitor users. Conclusions: Prospective studies of COPD exacerbations in patients with comorbid T2D are warranted. Additional research may elucidate the mechanisms underlying these observed associations with T2D medications.
Collapse
Affiliation(s)
- Dinah Foer
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Zachary H. Strasser
- Harvard Medical School, Boston, Massachusetts
- MGH Laboratory of Computer Science and
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | - Jing Cui
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Katherine N. Cahill
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee; and
| | - Joshua A. Boyce
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Shawn N. Murphy
- Harvard Medical School, Boston, Massachusetts
- MGH Laboratory of Computer Science and
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Elizabeth W. Karlson
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
4
|
Dagliati A, Strasser ZH, Hossein Abad ZS, Klann JG, Wagholikar KB, Mesa R, Visweswaran S, Morris M, Luo Y, Henderson DW, Samayamuthu MJ, Tan BW, Verdy G, Omenn GS, Xia Z, Bellazzi R, Murphy SN, Holmes JH, Estiri H. Characterization of long COVID temporal sub-phenotypes by distributed representation learning from electronic health record data: a cohort study. EClinicalMedicine 2023; 64:102210. [PMID: 37745021 PMCID: PMC10511779 DOI: 10.1016/j.eclinm.2023.102210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 08/29/2023] [Accepted: 08/29/2023] [Indexed: 09/26/2023] Open
Abstract
Background Characterizing Post-Acute Sequelae of COVID (SARS-CoV-2 Infection), or PASC has been challenging due to the multitude of sub-phenotypes, temporal attributes, and definitions. Scalable characterization of PASC sub-phenotypes can enhance screening capacities, disease management, and treatment planning. Methods We conducted a retrospective multi-centre observational cohort study, leveraging longitudinal electronic health record (EHR) data of 30,422 patients from three healthcare systems in the Consortium for the Clinical Characterization of COVID-19 by EHR (4CE). From the total cohort, we applied a deductive approach on 12,424 individuals with follow-up data and developed a distributed representation learning process for providing augmented definitions for PASC sub-phenotypes. Findings Our framework characterized seven PASC sub-phenotypes. We estimated that on average 15.7% of the hospitalized COVID-19 patients were likely to suffer from at least one PASC symptom and almost 5.98%, on average, had multiple symptoms. Joint pain and dyspnea had the highest prevalence, with an average prevalence of 5.45% and 4.53%, respectively. Interpretation We provided a scalable framework to every participating healthcare system for estimating PASC sub-phenotypes prevalence and temporal attributes, thus developing a unified model that characterizes augmented sub-phenotypes across the different systems. Funding Authors are supported by National Institute of Allergy and Infectious Diseases, National Institute on Aging, National Center for Advancing Translational Sciences, National Medical Research Council, National Institute of Neurological Disorders and Stroke, European Union, National Institutes of Health, National Center for Advancing Translational Sciences.
Collapse
Affiliation(s)
- Arianna Dagliati
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Zachary H. Strasser
- Department of Medicine, Massachusetts General Hospital, Boston, United States
| | | | - Jeffrey G. Klann
- Department of Medicine, Massachusetts General Hospital, Boston, United States
| | | | - Rebecca Mesa
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, United States
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, United States
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, United States
| | - Darren W. Henderson
- University of Kentucky, Center for Clinical and Translational Science, Lexington, United States
| | | | - Bryce W.Q. Tan
- National University Hospital, Singapore Department of Medicine, Singapore
| | - Guillame Verdy
- Bordeaux University Hospital, IAM Unit, Bordeaux, France
| | - Gilbert S. Omenn
- University of Michigan, Department of Computational Medicine and Bioinformatics, Internal Medicine, Human Genetics, and School of Public Health, Ann Arbor, United States
| | - Zongqi Xia
- University of Pittsburgh Department of Neurology, Pittsburgh, United States
| | - Riccardo Bellazzi
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Shawn N. Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, United States
| | - John H. Holmes
- University of Pennsylvania Perelman School of Medicine, Department of Biostatistics, Epidemiology, and Informatics, Institute for Biomedical Informatics, Philadelphia, United States
| | - Hossein Estiri
- Department of Medicine, Massachusetts General Hospital, Boston, United States
| |
Collapse
|
5
|
Strasser ZH, Dagliati A, Shakeri Hossein Abad Z, Klann JG, Wagholikar KB, Mesa R, Visweswaran S, Morris M, Luo Y, Henderson DW, Samayamuthu MJ, Omenn GS, Xia Z, Holmes JH, Estiri H, Murphy SN. A retrospective cohort analysis leveraging augmented intelligence to characterize long COVID in the electronic health record: A precision medicine framework. PLOS Digit Health 2023; 2:e0000301. [PMID: 37490472 PMCID: PMC10368277 DOI: 10.1371/journal.pdig.0000301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 06/16/2023] [Indexed: 07/27/2023]
Abstract
Physical and psychological symptoms lasting months following an acute COVID-19 infection are now recognized as post-acute sequelae of COVID-19 (PASC). Accurate tools for identifying such patients could enhance screening capabilities for the recruitment for clinical trials, improve the reliability of disease estimates, and allow for more accurate downstream cohort analysis. In this retrospective cohort study, we analyzed the EHR of hospitalized COVID-19 patients across three healthcare systems to develop a pipeline for better identifying patients with persistent PASC symptoms (dyspnea, fatigue, or joint pain) after their SARS-CoV-2 infection. We implemented distributed representation learning powered by the Machine Learning for modeling Health Outcomes (MLHO) to identify novel EHR features that could suggest PASC symptoms outside of typical diagnosis codes. MLHO applies an entropy-based feature selection and boosting algorithms for representation mining. These improved definitions were then used for estimating PASC among hospitalized patients. 30,422 hospitalized patients were diagnosed with COVID-19 across three healthcare systems between March 13, 2020 and February 28, 2021. The mean age of the population was 62.3 years (SD, 21.0 years) and 15,124 (49.7%) were female. We implemented the distributed representation learning technique to augment PASC definitions. These definitions were found to have positive predictive values of 0.73, 0.74, and 0.91 for dyspnea, fatigue, and joint pain, respectively. We estimated that 25 percent (CI 95%: 6-48), 11 percent (CI 95%: 6-15), and 13 percent (CI 95%: 8-17) of hospitalized COVID-19 patients will have dyspnea, fatigue, and joint pain, respectively, 3 months or longer after a COVID-19 diagnosis. We present a validated framework for screening and identifying patients with PASC in the EHR and then use the tool to estimate its prevalence among hospitalized COVID-19 patients.
Collapse
Affiliation(s)
- Zachary H. Strasser
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Arianna Dagliati
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Zahra Shakeri Hossein Abad
- Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Jeffrey G. Klann
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Kavishwar B. Wagholikar
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Rebecca Mesa
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Darren W. Henderson
- Center for Clinical and Translation Science, University of Kentucky, Lexington, Kentucky, United States of America
| | | | | | - Gilbert S. Omenn
- Dept of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics, and School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - John H. Holmes
- Department of Biostatistics, Epidemiology, and Informatics; Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Hossein Estiri
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Shawn N. Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| |
Collapse
|
6
|
Azhir A, Strasser ZH, Murphy SN, Estiri H. Severity of COVID-19-Related Illness in Massachusetts, July 2021 to December 2022. JAMA Netw Open 2023; 6:e238203. [PMID: 37052921 PMCID: PMC10102873 DOI: 10.1001/jamanetworkopen.2023.8203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/14/2023] Open
Abstract
This cohort study uses hospitalization and 30-day mortality risks to create a temporal profile of the severity of COVID-19 in Massachusetts from July 2021 to December 2022.
Collapse
Affiliation(s)
- Alaleh Azhir
- Harvard-MIT (Massachusetts Institute of Technology) Program in Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts
- Clinical Augmented Intelligence Group, Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | - Zachary H Strasser
- Clinical Augmented Intelligence Group, Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts
- Department of Medicine, Harvard Medical School, Boston, Massachusetts
| | - Shawn N Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts
| | - Hossein Estiri
- Clinical Augmented Intelligence Group, Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts
- Department of Medicine, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
7
|
Tan ALM, Getzen EJ, Hutch MR, Strasser ZH, Gutiérrez-Sacristán A, Le TT, Dagliati A, Morris M, Hanauer DA, Moal B, Bonzel CL, Yuan W, Chiudinelli L, Das P, Zhang HG, Aronow BJ, Avillach P, Brat GA, Cai T, Hong C, La Cava WG, Hooi Will Loh H, Luo Y, Murphy SN, Yuan Hgiam K, Omenn GS, Patel LP, Jebathilagam Samayamuthu M, Shriver ER, Shakeri Hossein Abad Z, Tan BWL, Visweswaran S, Wang X, Weber GM, Xia Z, Verdy B, Long Q, Mowery DL, Holmes JH. Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record? J Biomed Inform 2023; 139:104306. [PMID: 36738870 PMCID: PMC10849195 DOI: 10.1016/j.jbi.2023.104306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 01/21/2023] [Accepted: 01/29/2023] [Indexed: 02/05/2023]
Abstract
BACKGROUND In electronic health records, patterns of missing laboratory test results could capture patients' course of disease as well as reflect clinician's concerns or worries for possible conditions. These patterns are often understudied and overlooked. This study aims to identify informative patterns of missingness among laboratory data collected across 15 healthcare system sites in three countries for COVID-19 inpatients. METHODS We collected and analyzed demographic, diagnosis, and laboratory data for 69,939 patients with positive COVID-19 PCR tests across three countries from 1 January 2020 through 30 September 2021. We analyzed missing laboratory measurements across sites, missingness stratification by demographic variables, temporal trends of missingness, correlations between labs based on missingness indicators over time, and clustering of groups of labs based on their missingness/ordering pattern. RESULTS With these analyses, we identified mapping issues faced in seven out of 15 sites. We also identified nuances in data collection and variable definition for the various sites. Temporal trend analyses may support the use of laboratory test result missingness patterns in identifying severe COVID-19 patients. Lastly, using missingness patterns, we determined relationships between various labs that reflect clinical behaviors. CONCLUSION In this work, we use computational approaches to relate missingness patterns to hospital treatment capacity and highlight the heterogeneity of looking at COVID-19 over time and at multiple sites, where there might be different phases, policies, etc. Changes in missingness could suggest a change in a patient's condition, and patterns of missingness among laboratory measurements could potentially identify clinical outcomes. This allows sites to consider missing data as informative to analyses and help researchers identify which sites are better poised to study particular questions.
Collapse
Affiliation(s)
| | - Emily J Getzen
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | | | | | | | - Trang T Le
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | | | | | | | | | | | | | | | - Priam Das
- Harvard Medical School, Cambridge, MA, USA
| | | | - Bruce J Aronow
- Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | | | | | - Tianxi Cai
- Harvard Medical School, Cambridge, MA, USA
| | - Chuan Hong
- Harvard Medical School, Cambridge, MA, USA; Duke University, Durham, NC, USA
| | - William G La Cava
- Harvard Medical School, Cambridge, MA, USA; Boston Children's Hospital, Boston, MA, USA
| | | | - Yuan Luo
- Northwestern University, Chicago, IL, USA
| | | | | | | | - Lav P Patel
- University of Kansas Medical Center, United States
| | | | - Emily R Shriver
- University of Pennsylvania Health System, Philadelphia, PA, USA
| | | | | | | | - Xuan Wang
- Harvard Medical School, Cambridge, MA, USA
| | | | - Zongqi Xia
- University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Qi Long
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Danielle L Mowery
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - John H Holmes
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| |
Collapse
|
8
|
Abstract
IMPORTANCE The SARS-CoV-2 Omicron subvariant, BA.2, may be less severe than previous variants; however, confounding factors make interpreting the intrinsic severity challenging. OBJECTIVE To compare the adjusted risks of mortality, hospitalization, intensive care unit admission, and invasive ventilation between the BA.2 subvariant and the Omicron and Delta variants, after accounting for multiple confounders. DESIGN, SETTING, AND PARTICIPANTS This was a retrospective cohort study that applied an entropy balancing approach. Patients in a multicenter inpatient and outpatient system in New England with COVID-19 between March 3, 2020, and June 20, 2022, were identified. EXPOSURES Cases were assigned as being exposed to the Delta (B.1.617.2) variant, the Omicron (B.1.1.529) variant, or the Omicron BA.2 lineage subvariants. MAIN OUTCOMES AND MEASURES The primary study outcome planned before analysis was risk of 30-day mortality. Secondary outcomes included the risks of hospitalization, invasive ventilation, and intensive care unit admissions. RESULTS Of 102 315 confirmed COVID-19 cases (mean [SD] age, 44.2 [21.6] years; 63 482 women [62.0%]), 20 770 were labeled as Delta variants, 52 605 were labeled as the Omicron B.1.1.529 variant, and 28 940 were labeled as Omicron BA.2 subvariants. Patient cases were excluded if they occurred outside the prespecified temporal windows associated with the variants or had minimal longitudinal data in the Mass General Brigham system before COVID-19. Mortality rates were 0.7% for Delta (B.1.617.2), 0.4% for Omicron (B.1.1.529), and 0.3% for Omicron (BA.2). The adjusted odds ratio of mortality from the Delta variant compared with the Omicron BA.2 subvariants was 2.07 (95% CI, 1.04-4.10) and that of the original Omicron variant compared with the Omicron BA.2 subvariant was 2.20 (95% CI, 1.56-3.11). For all outcomes, the Omicron BA.2 subvariants were significantly less severe than that of the Omicron and Delta variants. CONCLUSIONS AND RELEVANCE In this cohort study, after having accounted for a variety of confounding factors associated with SARS-CoV-2 outcomes, the Omicron BA.2 subvariant was found to be intrinsically less severe than both the Delta and Omicron variants. With respect to these variants, the severity profile of SARS-CoV-2 appears to be diminishing after taking into account various factors including therapeutics, vaccinations, and prior infections.
Collapse
Affiliation(s)
- Zachary H. Strasser
- MGH Laboratory of Computer Science, Massachusetts General Hospital, Boston
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Noah Greifer
- Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts
| | - Aboozar Hadavand
- College of Computational Science, Minerva University, San Francisco, California
| | - Shawn N. Murphy
- MGH Laboratory of Computer Science, Massachusetts General Hospital, Boston
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston
| | - Hossein Estiri
- MGH Laboratory of Computer Science, Massachusetts General Hospital, Boston
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
9
|
Zhang HG, Dagliati A, Shakeri Hossein Abad Z, Xiong X, Bonzel CL, Xia Z, Tan BWQ, Avillach P, Brat GA, Hong C, Morris M, Visweswaran S, Patel LP, Gutiérrez-Sacristán A, Hanauer DA, Holmes JH, Samayamuthu MJ, Bourgeois FT, L'Yi S, Maidlow SE, Moal B, Murphy SN, Strasser ZH, Neuraz A, Ngiam KY, Loh NHW, Omenn GS, Prunotto A, Dalvin LA, Klann JG, Schubert P, Vidorreta FJS, Benoit V, Verdy G, Kavuluru R, Estiri H, Luo Y, Malovini A, Tibollo V, Bellazzi R, Cho K, Ho YL, Tan ALM, Tan BWL, Gehlenborg N, Lozano-Zahonero S, Jouhet V, Chiovato L, Aronow BJ, Toh EMS, Wong WGS, Pizzimenti S, Wagholikar KB, Bucalo M, Cai T, South AM, Kohane IS, Weber GM. International electronic health record-derived post-acute sequelae profiles of COVID-19 patients. NPJ Digit Med 2022; 5:81. [PMID: 35768548 PMCID: PMC9242995 DOI: 10.1038/s41746-022-00623-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 05/19/2022] [Indexed: 11/10/2022] Open
Abstract
The risk profiles of post-acute sequelae of COVID-19 (PASC) have not been well characterized in multi-national settings with appropriate controls. We leveraged electronic health record (EHR) data from 277 international hospitals representing 414,602 patients with COVID-19, 2.3 million control patients without COVID-19 in the inpatient and outpatient settings, and over 221 million diagnosis codes to systematically identify new-onset conditions enriched among patients with COVID-19 during the post-acute period. Compared to inpatient controls, inpatient COVID-19 cases were at significant risk for angina pectoris (RR 1.30, 95% CI 1.09–1.55), heart failure (RR 1.22, 95% CI 1.10–1.35), cognitive dysfunctions (RR 1.18, 95% CI 1.07–1.31), and fatigue (RR 1.18, 95% CI 1.07–1.30). Relative to outpatient controls, outpatient COVID-19 cases were at risk for pulmonary embolism (RR 2.10, 95% CI 1.58–2.76), venous embolism (RR 1.34, 95% CI 1.17–1.54), atrial fibrillation (RR 1.30, 95% CI 1.13–1.50), type 2 diabetes (RR 1.26, 95% CI 1.16–1.36) and vitamin D deficiency (RR 1.19, 95% CI 1.09–1.30). Outpatient COVID-19 cases were also at risk for loss of smell and taste (RR 2.42, 95% CI 1.90–3.06), inflammatory neuropathy (RR 1.66, 95% CI 1.21–2.27), and cognitive dysfunction (RR 1.18, 95% CI 1.04–1.33). The incidence of post-acute cardiovascular and pulmonary conditions decreased across time among inpatient cases while the incidence of cardiovascular, digestive, and metabolic conditions increased among outpatient cases. Our study, based on a federated international network, systematically identified robust conditions associated with PASC compared to control groups, underscoring the multifaceted cardiovascular and neurological phenotype profiles of PASC.
Collapse
Affiliation(s)
- Harrison G Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Arianna Dagliati
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | | | - Xin Xiong
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bryce W Q Tan
- Department of Medicine, National University Hospital, Singapore, Singapore
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Gabriel A Brat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Chuan Hong
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Lav P Patel
- Department of Internal Medicine, Division of Medical Informatics, University Of Kansas Medical Center, Kansas City, MO, USA
| | | | - David A Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA
| | - John H Holmes
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.,Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | | | | | - Sehi L'Yi
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Sarah E Maidlow
- Michigan Institute for Clinical and Health Research (MICHR) Informatics, University of Michigan, Ann Arbor, MI, USA
| | - Bertrand Moal
- IAM unit, Bordeaux University Hospital, Bordeaux, France
| | - Shawn N Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | | | - Antoine Neuraz
- Department of biomedical informatics, Hôpital Necker-Enfants Malade, Assistance Publique Hôpitaux de Paris (APHP), University of Paris, Paris, France
| | - Kee Yuan Ngiam
- Department of Biomedical informatics, WiSDM, National University Health Systems Singapore, Singapore, Singapore
| | - Ne Hooi Will Loh
- Department of Anaesthesia, National University Health Systems Singapore, Singapore, Singapore
| | - Gilbert S Omenn
- Department of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics, and School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Andrea Prunotto
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Lauren A Dalvin
- Department of Ophthalmology, Mayo Clinic, Rochester, NY, USA
| | - Jeffrey G Klann
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Petra Schubert
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | | | - Vincent Benoit
- IT Department, Innovation & Data, APHP Greater Paris University Hospital, Paris, France
| | | | - Ramakanth Kavuluru
- Division of Biomedical Informatics (Department of Internal Medicine), University of Kentucky, Lexington, KY, USA
| | - Hossein Estiri
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Alberto Malovini
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Valentina Tibollo
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Kelly Cho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA.,Population Health and Data Science, VA Boston Healthcare System, Boston, MA, USA
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | - Amelia L M Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Byorn W L Tan
- Department of Medicine, National University Hospital, Singapore, Singapore
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Sara Lozano-Zahonero
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Vianney Jouhet
- IAM unit, INSERM Bordeaux Population Health ERIAS TEAM, Bordeaux University Hospital / ERIAS - Inserm, U1219 BPH, Bordeaux, France
| | - Luca Chiovato
- Unit of Internal Medicine and Endocrinology, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Bruce J Aronow
- Departments of Biomedical Informatics, Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | - Emma M S Toh
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Wei Gen Scott Wong
- Department of Medicine, National University Health Systems Singapore, Singapore, Singapore
| | - Sara Pizzimenti
- Scientific Direction, IRCCS Ca' Granda Ospedale Maggiore Policlinico di Milano, Milan, Italy
| | | | - Mauro Bucalo
- BIOMERIS (BIOMedical Research Informatics Solutions), Pavia, Italy
| | | | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Andrew M South
- Department of Pediatrics-Section of Nephrology, Brenner Children's, Wake Forest School of Medicine, Winston Salem, NC, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
10
|
Klann JG, Strasser ZH, Hutch MR, Kennedy CJ, Marwaha JS, Morris M, Samayamuthu MJ, Pfaff AC, Estiri H, South AM, Weber GM, Yuan W, Avillach P, Wagholikar KB, Luo Y, Omenn GS, Visweswaran S, Holmes JH, Xia Z, Brat GA, Murphy SN. Distinguishing Admissions Specifically for COVID-19 From Incidental SARS-CoV-2 Admissions: National Retrospective Electronic Health Record Study. J Med Internet Res 2022; 24:e37931. [PMID: 35476727 PMCID: PMC9119395 DOI: 10.2196/37931] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 04/22/2022] [Accepted: 04/22/2022] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Admissions are generally classified as COVID-19 hospitalizations if the patient has a positive SARS-CoV-2 polymerase chain reaction (PCR) test. However, because 35% of SARS-CoV-2 infections are asymptomatic, patients admitted for unrelated indications with an incidentally positive test could be misclassified as a COVID-19 hospitalization. Electronic health record (EHR)-based studies have been unable to distinguish between a hospitalization specifically for COVID-19 versus an incidental SARS-CoV-2 hospitalization. Although the need to improve classification of COVID-19 versus incidental SARS-CoV-2 is well understood, the magnitude of the problems has only been characterized in small, single-center studies. Furthermore, there have been no peer-reviewed studies evaluating methods for improving classification. OBJECTIVE The aims of this study are to, first, quantify the frequency of incidental hospitalizations over the first 15 months of the pandemic in multiple hospital systems in the United States and, second, to apply electronic phenotyping techniques to automatically improve COVID-19 hospitalization classification. METHODS From a retrospective EHR-based cohort in 4 US health care systems in Massachusetts, Pennsylvania, and Illinois, a random sample of 1123 SARS-CoV-2 PCR-positive patients hospitalized from March 2020 to August 2021 was manually chart-reviewed and classified as "admitted with COVID-19" (incidental) versus specifically admitted for COVID-19 ("for COVID-19"). EHR-based phenotyping was used to find feature sets to filter out incidental admissions. RESULTS EHR-based phenotyped feature sets filtered out incidental admissions, which occurred in an average of 26% of hospitalizations (although this varied widely over time, from 0% to 75%). The top site-specific feature sets had 79%-99% specificity with 62%-75% sensitivity, while the best-performing across-site feature sets had 71%-94% specificity with 69%-81% sensitivity. CONCLUSIONS A large proportion of SARS-CoV-2 PCR-positive admissions were incidental. Straightforward EHR-based phenotypes differentiated admissions, which is important to assure accurate public health reporting and research.
Collapse
Affiliation(s)
- Jeffrey G Klann
- Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Zachary H Strasser
- Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Meghan R Hutch
- Department of Preventive Medicine, Northwestern University, Chicago, IL, United States
| | - Chris J Kennedy
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, United States
| | - Jayson S Marwaha
- Department of Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, United States
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | | | - Ashley C Pfaff
- Department of Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, United States
| | - Hossein Estiri
- Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Andrew M South
- Section of Nephrology, Department of Pediatrics, Brenner Children's, Wake Forest School of Medicine, Winston Salem, NC, United States
| | | | | | | | - Kavishwar B Wagholikar
- Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, IL, United States
| | - Gilbert S Omenn
- Center for Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, United States
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | - John H Holmes
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, PA, United States
| | | | - Shawn N Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| |
Collapse
|
11
|
Estiri H, Strasser ZH, Rashidian S, Klann JG, Wagholikar KB, McCoy TH, Murphy SN. An Objective Framework for Evaluating Unrecognized Bias in Medical AI Models Predicting COVID-19 Outcomes. J Am Med Inform Assoc 2022; 29:1334-1341. [PMID: 35511151 PMCID: PMC9277645 DOI: 10.1093/jamia/ocac070] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 04/04/2022] [Accepted: 04/27/2022] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVE The increasing translation of artificial intelligence (AI)/machine learning (ML) models into clinical practice brings an increased risk of direct harm from modeling bias; however, bias remains incompletely measured in many medical AI applications. This article aims to provide a framework for objective evaluation of medical AI from multiple aspects, focusing on binary classification models. MATERIALS AND METHODS Using data from over 56 thousand Mass General Brigham (MGB) patients with confirmed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), we evaluate unrecognized bias in four AI models developed during the early months of the pandemic in Boston, Massachusetts that predict risks of hospital admission, ICU admission, mechanical ventilation, and death after a SARS-CoV-2 infection purely based on their pre-infection longitudinal medical records. Models were evaluated both retrospectively and prospectively using model-level metrics of discrimination, accuracy, and reliability, and a novel individual-level metric for error. RESULTS We found inconsistent instances of model-level bias in the prediction models. From an individual-level aspect, however, we found most all models performing with slightly higher error rates for older patients. DISCUSSION While a model can be biased against certain protected groups (i.e., perform worse) in certain tasks, it can be at the same time biased towards another protected group (i.e., perform better). As such, current bias evaluation studies may lack a full depiction of the variable effects of a model on its subpopulations. CONCLUSION Only a holistic evaluation, a diligent search for unrecognized bias, can provide enough information for an unbiased judgment of AI bias that can invigorate follow-up investigations on identifying the underlying roots of bias and ultimately make a change.
Collapse
Affiliation(s)
- Hossein Estiri
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02144, USA.,Department of Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Zachary H Strasser
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02144, USA.,Department of Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
| | | | - Jeffrey G Klann
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02144, USA.,Department of Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA.,Research Information Science and Computing, Mass General Brigham, Somerville, MA, 02145, USA
| | - Kavishwar B Wagholikar
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02144, USA.,Department of Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Thomas H McCoy
- Center for Quantitative Health, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Shawn N Murphy
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02144, USA.,Research Information Science and Computing, Mass General Brigham, Somerville, MA, 02145, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA.,Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA
| |
Collapse
|
12
|
Klann JG, Strasser ZH, Hutch MR, Kennedy CJ, Marwaha JS, Morris M, Samayamuthu MJ, Pfaff AC, Estiri H, South AM, Weber GM, Yuan W, Avillach P, Wagholikar KB, Luo Y, Omenn GS, Visweswaran S, Holmes JH, Xia Z, Brat GA, Murphy SN. Distinguishing Admissions Specifically for COVID-19 from Incidental SARS-CoV-2 Admissions: A National EHR Research Consortium Study. medRxiv 2022:2022.02.10.22270728. [PMID: 35350202 PMCID: PMC8963684 DOI: 10.1101/2022.02.10.22270728] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Admissions are generally classified as COVID-19 hospitalizations if the patient has a positive SARS-CoV-2 polymerase chain reaction (PCR) test. However, because 35% of SARS-CoV-2 infections are asymptomatic, patients admitted for unrelated indications with an incidentally positive test could be misclassified as a COVID-19 hospitalization. EHR-based studies have been unable to distinguish between a hospitalization specifically for COVID-19 versus an incidental SARS-CoV-2 hospitalization. From a retrospective EHR-based cohort in four US healthcare systems, a random sample of 1,123 SARS-CoV-2 PCR-positive patients hospitalized between 3/2020â€"8/2021 was manually chart-reviewed and classified as admitted-with-COVID-19 (incidental) vs. specifically admitted for COVID-19 (for-COVID-19). EHR-based phenotyped feature sets filtered out incidental admissions, which occurred in 26%. The top site-specific feature sets had 79-99% specificity with 62-75% sensitivity, while the best performing across-site feature set had 71-94% specificity with 69-81% sensitivity. A large proportion of SARS-CoV-2 PCR-positive admissions were incidental. Straightforward EHR-based phenotypes differentiated admissions, which is important to assure accurate public health reporting and research.
Collapse
|
13
|
Estiri H, Strasser ZH, Brat GA, Semenov YR, Patel CJ, Murphy SN. Evolving phenotypes of non-hospitalized patients that indicate long COVID. BMC Med 2021; 19:249. [PMID: 34565368 PMCID: PMC8474909 DOI: 10.1186/s12916-021-02115-0] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 09/01/2021] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND For some SARS-CoV-2 survivors, recovery from the acute phase of the infection has been grueling with lingering effects. Many of the symptoms characterized as the post-acute sequelae of COVID-19 (PASC) could have multiple causes or are similarly seen in non-COVID patients. Accurate identification of PASC phenotypes will be important to guide future research and help the healthcare system focus its efforts and resources on adequately controlled age- and gender-specific sequelae of a COVID-19 infection. METHODS In this retrospective electronic health record (EHR) cohort study, we applied a computational framework for knowledge discovery from clinical data, MLHO, to identify phenotypes that positively associate with a past positive reverse transcription-polymerase chain reaction (RT-PCR) test for COVID-19. We evaluated the post-test phenotypes in two temporal windows at 3-6 and 6-9 months after the test and by age and gender. Data from longitudinal diagnosis records stored in EHRs from Mass General Brigham in the Boston Metropolitan Area was used for the analyses. Statistical analyses were performed on data from March 2020 to June 2021. Study participants included over 96 thousand patients who had tested positive or negative for COVID-19 and were not hospitalized. RESULTS We identified 33 phenotypes among different age/gender cohorts or time windows that were positively associated with past SARS-CoV-2 infection. All identified phenotypes were newly recorded in patients' medical records 2 months or longer after a COVID-19 RT-PCR test in non-hospitalized patients regardless of the test result. Among these phenotypes, a new diagnosis record for anosmia and dysgeusia (OR 2.60, 95% CI [1.94-3.46]), alopecia (OR 3.09, 95% CI [2.53-3.76]), chest pain (OR 1.27, 95% CI [1.09-1.48]), chronic fatigue syndrome (OR 2.60, 95% CI [1.22-2.10]), shortness of breath (OR 1.41, 95% CI [1.22-1.64]), pneumonia (OR 1.66, 95% CI [1.28-2.16]), and type 2 diabetes mellitus (OR 1.41, 95% CI [1.22-1.64]) is one of the most significant indicators of a past COVID-19 infection. Additionally, more new phenotypes were found with increased confidence among the cohorts who were younger than 65. CONCLUSIONS The findings of this study confirm many of the post-COVID-19 symptoms and suggest that a variety of new diagnoses, including new diabetes mellitus and neurological disorder diagnoses, are more common among those with a history of COVID-19 than those without the infection. Additionally, more than 63% of PASC phenotypes were observed in patients under 65 years of age, pointing out the importance of vaccination to minimize the risk of debilitating post-acute sequelae of COVID-19 among younger adults.
Collapse
Affiliation(s)
- Hossein Estiri
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02114, USA. .,Department of Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA.
| | - Zachary H Strasser
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02114, USA.,Department of Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Gabriel A Brat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yevgeniy R Semenov
- Department of Dermatology, Massachusetts General Hospital, Boston, MA, 02114, USA
| | | | - Chirag J Patel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Shawn N Murphy
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02114, USA.,Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Research Information Science and Computing, Mass General Brigham, Boston, MA, USA
| |
Collapse
|
14
|
Estiri H, Strasser ZH, Murphy SN. High-throughput phenotyping with temporal sequences. J Am Med Inform Assoc 2021; 28:772-781. [PMID: 33313899 DOI: 10.1093/jamia/ocaa288] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 11/04/2020] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVE High-throughput electronic phenotyping algorithms can accelerate translational research using data from electronic health record (EHR) systems. The temporal information buried in EHRs is often underutilized in developing computational phenotypic definitions. This study aims to develop a high-throughput phenotyping method, leveraging temporal sequential patterns from EHRs. MATERIALS AND METHODS We develop a representation mining algorithm to extract 5 classes of representations from EHR diagnosis and medication records: the aggregated vector of the records (aggregated vector representation), the standard sequential patterns (sequential pattern mining), the transitive sequential patterns (transitive sequential pattern mining), and 2 hybrid classes. Using EHR data on 10 phenotypes from the Mass General Brigham Biobank, we train and validate phenotyping algorithms. RESULTS Phenotyping with temporal sequences resulted in a superior classification performance across all 10 phenotypes compared with the standard representations in electronic phenotyping. The high-throughput algorithm's classification performance was superior or similar to the performance of previously published electronic phenotyping algorithms. We characterize and evaluate the top transitive sequences of diagnosis records paired with the records of risk factors, symptoms, complications, medications, or vaccinations. DISCUSSION The proposed high-throughput phenotyping approach enables seamless discovery of sequential record combinations that may be difficult to assume from raw EHR data. Transitive sequences offer more accurate characterization of the phenotype, compared with its individual components, and reflect the actual lived experiences of the patients with that particular disease. CONCLUSION Sequential data representations provide a precise mechanism for incorporating raw EHR records into downstream machine learning. Our approach starts with user interpretability and works backward to the technology.
Collapse
Affiliation(s)
- Hossein Estiri
- Harvard Medical School, Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA.,Mass General Brigham, Boston, Massachusetts, USA
| | - Zachary H Strasser
- Harvard Medical School, Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA.,Mass General Brigham, Boston, Massachusetts, USA
| | - Shawn N Murphy
- Harvard Medical School, Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA.,Mass General Brigham, Boston, Massachusetts, USA
| |
Collapse
|
15
|
Estiri H, Strasser ZH, Brat GA, Semenov YR, Patel CJ, Murphy SN. Evolving Phenotypes of non-hospitalized Patients that Indicate Long Covid. medRxiv 2021. [PMID: 33948602 DOI: 10.1101/2021.04.25.21255923] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
For some SARS-CoV-2 survivors, recovery from the acute phase of the infection has been grueling with lingering effects. Many of the symptoms characterized as the post-acute sequelae of COVID-19 (PASC) could have multiple causes or are similarly seen in non-COVID patients. Accurate identification of phenotypes will be important to guide future research and help the healthcare system focus its efforts and resources on adequately controlled age- and gender-specific sequelae of a COVID-19 infection. In this retrospective electronic health records (EHR) cohort study, we applied a computational framework for knowledge discovery from clinical data, MLHO, to identify phenotypes that positively associate with a past positive reverse transcription-polymerase chain reaction (RT-PCR) test for COVID-19. We evaluated the post-test phenotypes in two temporal windows at 3-6 and 6-9 months after the test and by age and gender. Data from longitudinal diagnosis records stored in EHRs from Mass General Brigham in the Boston metropolitan area was used for the analyses. Statistical analyses were performed on data from March 2020 to June 2021. Study participants included over 96 thousand patients who had tested positive or negative for COVID-19 and were not hospitalized. We identified 33 phenotypes among different age/gender cohorts or time windows that were positively associated with past SARS-CoV-2 infection. All identified phenotypes were newly recorded in patients’ medical records two months or longer after a COVID-19 RT-PCR test in non-hospitalized patients regardless of the test result. Among these phenotypes, a new diagnosis record for anosmia and dysgeusia (OR: 2.60, 95% CI [1.94 - 3.46]), alopecia (OR: 3.09, 95% CI [2.53 - 3.76]), chest pain (OR: 1.27, 95% CI [1.09 - 1.48]), chronic fatigue syndrome (OR 2.60, 95% CI [1.22-2.10]), shortness of breath (OR 1.41, 95% CI [1.22 - 1.64]), pneumonia (OR 1.66, 95% CI [1.28 - 2.16]), and type 2 diabetes mellitus (OR 1.41, 95% CI [1.22 - 1.64]) are some of the most significant indicators of a past COVID-19 infection. Additionally, more new phenotypes were found with increased confidence among the cohorts who were younger than 65. Our approach avoids a flood of false positive discoveries while offering a more robust probabilistic approach compared to the standard linear phenome-wide association study (PheWAS). The findings of this study confirm many of the post-COVID symptoms and suggest that a variety of new diagnoses, including new diabetes mellitus and neurological disorder diagnoses, are more common among those with a history of COVID-19 than those without the infection. Additionally, more than 63 percent of PASC phenotypes were observed in patients under 65 years of age, pointing out the importance of vaccination to minimize the risk of debilitating post-acute sequelae of COVID-19 among younger adults.
Collapse
|
16
|
Abstract
The COVID-19 pandemic has devastated the world with health and economic wreckage. Precise estimates of adverse outcomes from COVID-19 could have led to better allocation of healthcare resources and more efficient targeted preventive measures, including insight into prioritizing how to best distribute a vaccination. We developed MLHO (pronounced as melo), an end-to-end Machine Learning framework that leverages iterative feature and algorithm selection to predict Health Outcomes. MLHO implements iterative sequential representation mining, and feature and model selection, for predicting patient-level risk of hospitalization, ICU admission, need for mechanical ventilation, and death. It bases this prediction on data from patients' past medical records (before their COVID-19 infection). MLHO's architecture enables a parallel and outcome-oriented model calibration, in which different statistical learning algorithms and vectors of features are simultaneously tested to improve prediction of health outcomes. Using clinical and demographic data from a large cohort of over 13,000 COVID-19-positive patients, we modeled the four adverse outcomes utilizing about 600 features representing patients' pre-COVID health records and demographics. The mean AUC ROC for mortality prediction was 0.91, while the prediction performance ranged between 0.80 and 0.81 for the ICU, hospitalization, and ventilation. We broadly describe the clusters of features that were utilized in modeling and their relative influence for predicting each outcome. Our results demonstrated that while demographic variables (namely age) are important predictors of adverse outcomes after a COVID-19 infection, the incorporation of the past clinical records are vital for a reliable prediction model. As the COVID-19 pandemic unfolds around the world, adaptable and interpretable machine learning frameworks (like MLHO) are crucial to improve our readiness for confronting the potential future waves of COVID-19, as well as other novel infectious diseases that may emerge.
Collapse
Affiliation(s)
- Hossein Estiri
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02144, USA.
- Department of Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA.
- Harvard Medical School, Boston, MA, 02115, USA.
| | - Zachary H Strasser
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02144, USA
- Department of Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Harvard Medical School, Boston, MA, 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Shawn N Murphy
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02144, USA
- Department of Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Harvard Medical School, Boston, MA, 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA
| |
Collapse
|
17
|
Estiri H, Strasser ZH, Klann JG, Naseri P, Wagholikar KB, Murphy SN. Predicting COVID-19 mortality with electronic medical records. NPJ Digit Med 2021; 4:15. [PMID: 33542473 PMCID: PMC7862405 DOI: 10.1038/s41746-021-00383-x] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 12/24/2020] [Indexed: 01/31/2023] Open
Abstract
This study aims to predict death after COVID-19 using only the past medical information routinely collected in electronic health records (EHRs) and to understand the differences in risk factors across age groups. Combining computational methods and clinical expertise, we curated clusters that represent 46 clinical conditions as potential risk factors for death after a COVID-19 infection. We trained age-stratified generalized linear models (GLMs) with component-wise gradient boosting to predict the probability of death based on what we know from the patients before they contracted the virus. Despite only relying on previously documented demographics and comorbidities, our models demonstrated similar performance to other prognostic models that require an assortment of symptoms, laboratory values, and images at the time of diagnosis or during the course of the illness. In general, we found age as the most important predictor of mortality in COVID-19 patients. A history of pneumonia, which is rarely asked in typical epidemiology studies, was one of the most important risk factors for predicting COVID-19 mortality. A history of diabetes with complications and cancer (breast and prostate) were notable risk factors for patients between the ages of 45 and 65 years. In patients aged 65–85 years, diseases that affect the pulmonary system, including interstitial lung disease, chronic obstructive pulmonary disease, lung cancer, and a smoking history, were important for predicting mortality. The ability to compute precise individual-level risk scores exclusively based on the EHR is crucial for effectively allocating and distributing resources, such as prioritizing vaccination among the general population.
Collapse
Affiliation(s)
- Hossein Estiri
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02144, USA. .,Department of Medicine, Massachusetts General Hospital, Boston, MA, 02144, USA. .,Harvard Medical School, Boston, MA, 02115, USA.
| | - Zachary H Strasser
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02144, USA.,Department of Medicine, Massachusetts General Hospital, Boston, MA, 02144, USA.,Harvard Medical School, Boston, MA, 02115, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Jeffy G Klann
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02144, USA.,Department of Medicine, Massachusetts General Hospital, Boston, MA, 02144, USA.,Harvard Medical School, Boston, MA, 02115, USA
| | | | - Kavishwar B Wagholikar
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02144, USA.,Department of Medicine, Massachusetts General Hospital, Boston, MA, 02144, USA.,Harvard Medical School, Boston, MA, 02115, USA
| | - Shawn N Murphy
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, 02144, USA.,Department of Medicine, Massachusetts General Hospital, Boston, MA, 02144, USA.,Harvard Medical School, Boston, MA, 02115, USA.,Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA
| |
Collapse
|
18
|
Estiri H, Strasser ZH, Klann JG, McCoy TH, Wagholikar KB, Vasey S, Castro VM, Murphy ME, Murphy SN. Transitive Sequencing Medical Records for Mining Predictive and Interpretable Temporal Representations. Patterns (N Y) 2020; 1:100051. [PMID: 32835307 PMCID: PMC7301790 DOI: 10.1016/j.patter.2020.100051] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 04/27/2020] [Accepted: 05/26/2020] [Indexed: 12/13/2022]
Abstract
Electronic health records (EHRs) contain important temporal information about the progression of disease and treatment outcomes. This paper proposes a transitive sequencing approach for constructing temporal representations from EHR observations for downstream machine learning. Using clinical data from a cohort of patients with congestive heart failure, we mined temporal representations by transitive sequencing of EHR medication and diagnosis records for classification and prediction tasks. We compared the classification and prediction performances of the transitive sequential representations (bag-of-sequences approach) with the conventional approach of using aggregated vectors of EHR data (aggregated vector representation) across different classifiers. We found that the transitive sequential representations are better phenotype "differentiators" and predictors than the "atemporal" EHR records. Our results also demonstrated that data representations obtained from transitive sequencing of EHR observations can present novel insights about the progression of the disease that are difficult to discern when clinical data are treated independently of the patient's history.
Collapse
Affiliation(s)
- Hossein Estiri
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Zachary H. Strasser
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Jeffery G. Klann
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Thomas H. McCoy
- Harvard Medical School, Boston, MA 02115, USA
- Center for Quantitative Health, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Kavishwar B. Wagholikar
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Sebastien Vasey
- Department of Mathematics, Harvard University, Cambridge, MA 02138, USA
| | - Victor M. Castro
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
| | - MaryKate E. Murphy
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
| | - Shawn N. Murphy
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| |
Collapse
|