Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhang Y, Cai T, Yu S, Cho K, Hong C, Sun J, Huang J, Ho YL, Ananthakrishnan AN, Xia Z, Shaw SY, Gainer V, Castro V, Link N, Honerlaw J, Huang S, Gagnon D, Karlson EW, Plenge RM, Szolovits P, Savova G, Churchill S, O'Donnell C, Murphy SN, Gaziano JM, Kohane I, Cai T, Liao KP. High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc 2019;14:3426-44. [PMID: 31748751 DOI: 10.1038/s41596-019-0227-6] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 07/22/2019] [Indexed: 01/12/2023]

For:	Zhang Y, Cai T, Yu S, Cho K, Hong C, Sun J, Huang J, Ho YL, Ananthakrishnan AN, Xia Z, Shaw SY, Gainer V, Castro V, Link N, Honerlaw J, Huang S, Gagnon D, Karlson EW, Plenge RM, Szolovits P, Savova G, Churchill S, O'Donnell C, Murphy SN, Gaziano JM, Kohane I, Cai T, Liao KP. High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc 2019;14:3426-44. [PMID: 31748751 DOI: 10.1038/s41596-019-0227-6] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 07/22/2019] [Indexed: 01/12/2023]

Number

Cited by Other Article(s)

Sadeghi P, Karimi H, Lavafian A, Rashedi R, Samieefar N, Shafiekhani S, Rezaei N. Machine learning and artificial intelligence within pediatric autoimmune diseases: applications, challenges, future perspective. Expert Rev Clin Immunol 2024:1-18. [PMID: 38771915 DOI: 10.1080/1744666x.2024.2359019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 05/20/2024] [Indexed: 05/23/2024]

McCaw ZR, Gao J, Lin X, Gronsbell J. Synthetic surrogates improve power for genome-wide association studies of partially missing phenotypes in population biobanks. Nat Genet 2024:10.1038/s41588-024-01793-9. [PMID: 38872030 DOI: 10.1038/s41588-024-01793-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 05/08/2024] [Indexed: 06/15/2024]

Xiao T, Kong S, Zhang Z, Hua D, Liu F. A review of big data technology and its application in cancer care. Comput Biol Med 2024;176:108577. [PMID: 38739981 DOI: 10.1016/j.compbiomed.2024.108577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 05/07/2024] [Accepted: 05/07/2024] [Indexed: 05/16/2024]

Jiang K, Cao T. Automated HIV Case Identification from the MIMIC-IV Database. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024;2024:555-564. [PMID: 38827090 PMCID: PMC11141847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]

Steinfeldt J, Wild B, Buergel T, Pietzner M, Upmeier Zu Belzen J, Vauvelle A, Hegselmann S, Denaxas S, Hemingway H, Langenberg C, Landmesser U, Deanfield J, Eils R. Medical history predicts phenome-wide disease onset and enables the rapid response to emerging health threats. Nat Commun 2024;15:4257. [PMID: 38763986 PMCID: PMC11102902 DOI: 10.1038/s41467-024-48568-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 05/03/2024] [Indexed: 05/21/2024] Open

Affiliation(s)

Jakob Steinfeldt Department of Cardiology, Angiology and Intensive Care Medicine, Deutsches Herzzentrum der Charité (DHZC), Berlin, Germany Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Klinik/Centrum, Charitéplatz 1, 10117, Berlin, Germany Computational Medicine, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany Friede Springer Cardiovascular Prevention Center@Charite, Charite - University Medicine Berlin, Berlin, Germany Institute of Cardiovascular Sciences, University College London, London, UK
Benjamin Wild Center for Digital Health, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany
Thore Buergel Institute of Cardiovascular Sciences, University College London, London, UK Center for Digital Health, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany
Maik Pietzner Computational Medicine, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, UK Precision Health University Research Institute, Queen Mary University of London and Barts NHS Trust, London, UK
Julius Upmeier Zu Belzen Center for Digital Health, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany
Andre Vauvelle Institute of Health Informatics, University College London, London, UK
Stefan Hegselmann Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Massachusetts, USA Pattern Recognition and Image Analysis Lab, University of Münster, Münster, Germany
Spiros Denaxas Institute of Health Informatics, University College London, London, UK British Heart Foundation Data Science Centre, London, UK Health Data Research UK, London, UK National Institute for Health Research, Biomedical Research Centre at University College London Hospitals National Institute for Health Research, Biomedical Research Centre, London, UK
Harry Hemingway Institute of Health Informatics, University College London, London, UK Health Data Research UK, London, UK National Institute for Health Research, Biomedical Research Centre at University College London Hospitals National Institute for Health Research, Biomedical Research Centre, London, UK
Claudia Langenberg Computational Medicine, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, UK Precision Health University Research Institute, Queen Mary University of London and Barts NHS Trust, London, UK
Ulf Landmesser Department of Cardiology, Angiology and Intensive Care Medicine, Deutsches Herzzentrum der Charité (DHZC), Berlin, Germany Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Klinik/Centrum, Charitéplatz 1, 10117, Berlin, Germany Friede Springer Cardiovascular Prevention Center@Charite, Charite - University Medicine Berlin, Berlin, Germany Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, Berlin, Berlin, Germany
John Deanfield Institute of Cardiovascular Sciences, University College London, London, UK
Roland Eils Center for Digital Health, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany. Health Data Science Unit, Heidelberg University Hospital and BioQuant, Heidelberg, Germany.

Collapse

Lee HJ, Schwamm LH, Sansing LH, Kamel H, de Havenon A, Turner AC, Sheth KN, Krishnaswamy S, Brandt C, Zhao H, Krumholz H, Sharma R. StrokeClassifier: ischemic stroke etiology classification by ensemble consensus modeling using electronic health records. NPJ Digit Med 2024;7:130. [PMID: 38760474 PMCID: PMC11101464 DOI: 10.1038/s41746-024-01120-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 04/23/2024] [Indexed: 05/19/2024] Open

Abstract

Determining acute ischemic stroke (AIS) etiology is fundamental to secondary stroke prevention efforts but can be diagnostically challenging. We trained and validated an automated classification tool, StrokeClassifier, using electronic health record (EHR) text from 2039 non-cryptogenic AIS patients at 2 academic hospitals to predict the 4-level outcome of stroke etiology adjudicated by agreement of at least 2 board-certified vascular neurologists' review of the EHR. StrokeClassifier is an ensemble consensus meta-model of 9 machine learning classifiers applied to features extracted from discharge summary texts by natural language processing. StrokeClassifier was externally validated in 406 discharge summaries from the MIMIC-III dataset reviewed by a vascular neurologist to ascertain stroke etiology. Compared with vascular neurologists' diagnoses, StrokeClassifier achieved the mean cross-validated accuracy of 0.74 and weighted F1 of 0.74 for multi-class classification. In MIMIC-III, its accuracy and weighted F1 were 0.70 and 0.71, respectively. In binary classification, the two metrics ranged from 0.77 to 0.96. The top 5 features contributing to stroke etiology prediction were atrial fibrillation, age, middle cerebral artery occlusion, internal carotid artery occlusion, and frontal stroke location. We designed a certainty heuristic to grade the confidence of StrokeClassifier's diagnosis as non-cryptogenic by the degree of consensus among the 9 classifiers and applied it to 788 cryptogenic patients, reducing cryptogenic diagnoses from 25.2% to 7.2%. StrokeClassifier is a validated artificial intelligence tool that rivals the performance of vascular neurologists in classifying ischemic stroke etiology. With further training, StrokeClassifier may have downstream applications including its use as a clinical decision support system.

Collapse

Shyr C, Sulieman L, Harris PA. Illuminating the landscape of high-level clinical trial opportunities in the All of Us Research Program. J Am Med Inform Assoc 2024:ocae062. [PMID: 38622899 DOI: 10.1093/jamia/ocae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 03/02/2024] [Accepted: 03/07/2024] [Indexed: 04/17/2024] Open

Abstract

OBJECTIVE

With its size and diversity, the All of Us Research Program has the potential to power and improve representation in clinical trials through ancillary studies like Nutrition for Precision Health. We sought to characterize high-level trial opportunities for the diverse participants and sponsors of future trial investment.

MATERIALS AND METHODS

We matched All of Us participants with available trials on ClinicalTrials.gov based on medical conditions, age, sex, and geographic location. Based on the number of matched trials, we (1) developed the Trial Opportunities Compass (TOC) to help sponsors assess trial investment portfolios, (2) characterized the landscape of trial opportunities in a phenome-wide association study (PheWAS), and (3) assessed the relationship between trial opportunities and social determinants of health (SDoH) to identify potential barriers to trial participation.

RESULTS

Our study included 181 529 All of Us participants and 18 634 trials. The TOC identified opportunities for portfolio investment and gaps in currently available trials across federal, industrial, and academic sponsors. PheWAS results revealed an emphasis on mental disorder-related trials, with anxiety disorder having the highest adjusted increase in the number of matched trials (59% [95% CI, 57-62]; P < 1e-300). Participants from certain communities underrepresented in biomedical research, including self-reported racial and ethnic minorities, had more matched trials after adjusting for other factors. Living in a nonmetropolitan area was associated with up to 13.1 times fewer matched trials.

DISCUSSION AND CONCLUSION

All of Us data are a valuable resource for identifying trial opportunities to inform trial portfolio planning. Characterizing these opportunities with consideration for SDoH can provide guidance on prioritizing the most pressing barriers to trial participation.

Collapse

Yan C, Ong HH, Grabowska ME, Krantz MS, Su WC, Dickson AL, Peterson JF, Feng Q, Roden DM, Stein CM, Kerchberger VE, Malin BA, Wei WQ. Large language models facilitate the generation of electronic health record phenotyping algorithms. J Am Med Inform Assoc 2024:ocae072. [PMID: 38613820 DOI: 10.1093/jamia/ocae072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/21/2024] [Accepted: 03/22/2024] [Indexed: 04/15/2024] Open

Abstract

OBJECTIVES

Phenotyping is a core task in observational health research utilizing electronic health records (EHRs). Developing an accurate algorithm demands substantial input from domain experts, involving extensive literature review and evidence synthesis. This burdensome process limits scalability and delays knowledge discovery. We investigate the potential for leveraging large language models (LLMs) to enhance the efficiency of EHR phenotyping by generating high-quality algorithm drafts.

MATERIALS AND METHODS

We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (ie, type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network.

RESULTS

GPT-4 and GPT-3.5 exhibited significantly higher overall expert evaluation scores in instruction following, algorithmic logic, and SQL executability, when compared to Claude 2 and Bard. Although GPT-4 and GPT-3.5 effectively identified relevant clinical concepts, they exhibited immature capability in organizing phenotyping criteria with the proper logic, leading to phenotyping algorithms that were either excessively restrictive (with low recall) or overly broad (with low positive predictive values).

CONCLUSION

GPT versions 3.5 and 4 are capable of drafting phenotyping algorithms by identifying relevant clinical criteria aligned with a CDM. However, expertise in informatics and clinical experience is still required to assess and further refine generated algorithms.

Collapse

Affiliation(s)

Chao Yan Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Henry H Ong Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Monika E Grabowska Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Matthew S Krantz Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Wu-Chen Su Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Alyson L Dickson Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Josh F Peterson Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
QiPing Feng Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Dan M Roden Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
C Michael Stein Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
V Eric Kerchberger Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Bradley A Malin Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States Department of Computer Science, Vanderbilt University, Nashville, TN 37203, United States Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Wei-Qi Wei Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States Department of Computer Science, Vanderbilt University, Nashville, TN 37203, United States

Collapse

Wei WQ, Rowley R, Wood A, MacArthur J, Embi PJ, Denaxas S. Improving reporting standards for phenotyping algorithm in biomedical research: 5 fundamental dimensions. J Am Med Inform Assoc 2024;31:1036-1041. [PMID: 38269642 PMCID: PMC10990558 DOI: 10.1093/jamia/ocae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/12/2023] [Accepted: 01/08/2024] [Indexed: 01/26/2024] Open

Jeffery AD, Fabbri D, Reeves RM, Matheny ME. Use of noisy labels as weak learners to identify incompletely ascertainable outcomes: A Feasibility study with opioid-induced respiratory depression. Heliyon 2024;10:e26434. [PMID: 38444495 PMCID: PMC10912240 DOI: 10.1016/j.heliyon.2024.e26434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 02/09/2024] [Accepted: 02/13/2024] [Indexed: 03/07/2024] Open

Abstract

Objective

Assigning outcome labels to large observational data sets in a timely and accurate manner, particularly when outcomes are rare or not directly ascertainable, remains a significant challenge within biomedical informatics. We examined whether noisy labels generated from subject matter experts' heuristics using heterogenous data types within a data programming paradigm could provide outcomes labels to a large, observational data set. We chose the clinical condition of opioid-induced respiratory depression for our use case because it is rare, has no administrative codes to easily identify the condition, and typically requires at least some unstructured text to ascertain its presence.

Materials and methods

Using de-identified electronic health records of 52,861 post-operative encounters, we applied a data programming paradigm (implemented in the Snorkel software) for the development of a machine learning classifier for opioid-induced respiratory depression. Our approach included subject matter experts creating 14 labeling functions that served as noisy labels for developing a probabilistic Generative model. We used probabilistic labels from the Generative model as outcome labels for training a Discriminative model on the source data. We evaluated performance of the Discriminative model with a hold-out test set of 599 independently-reviewed patient records.

Results

The final Discriminative classification model achieved an accuracy of 0.977, an F1 score of 0.417, a sensitivity of 1.0, and an AUC of 0.988 in the hold-out test set with a prevalence of 0.83% (5/599).

Discussion

All of the confirmed Cases were identified by the classifier. For rare outcomes, this finding is encouraging because it reduces the number of manual reviews needed by excluding visits/patients with low probabilities.

Conclusion

Application of a data programming paradigm with expert-informed labeling functions might have utility for phenotyping clinical phenomena that are not easily ascertainable from highly-structured data.

Collapse

Almuwaqqat Z, Hui Q, Liu C, Zhou JJ, Voight BF, Ho YL, Posner DC, Vassy JL, Gaziano JM, Cho K, Wilson PWF, Sun YV. Long-Term Body Mass Index Variability and Adverse Cardiovascular Outcomes. JAMA Netw Open 2024;7:e243062. [PMID: 38512255 PMCID: PMC10958234 DOI: 10.1001/jamanetworkopen.2024.3062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 01/23/2024] [Indexed: 03/22/2024] Open

Abstract

Importance

Body mass index (BMI; calculated as weight in kilograms divided by height in meters squared) is a commonly used estimate of obesity, which is a complex trait affected by genetic and lifestyle factors. Marked weight gain and loss could be associated with adverse biological processes.

Objective

To evaluate the association between BMI variability and incident cardiovascular disease (CVD) events in 2 distinct cohorts.

Design, Setting, and Participants

This cohort study used data from the Million Veteran Program (MVP) between 2011 and 2018 and participants in the UK Biobank (UKB) enrolled between 2006 and 2010. Participants were followed up for a median of 3.8 (5th-95th percentile, 3.5) years. Participants with baseline CVD or cancer were excluded. Data were analyzed from September 2022 and September 2023.

Exposure

BMI variability was calculated by the retrospective SD and coefficient of variation (CV) using multiple clinical BMI measurements up to the baseline.

Main Outcomes and Measures

The main outcome was incident composite CVD events (incident nonfatal myocardial infarction, acute ischemic stroke, and cardiovascular death), assessed using Cox proportional hazards modeling after adjustment for CVD risk factors, including age, sex, mean BMI, systolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, smoking status, diabetes status, and statin use. Secondary analysis assessed whether associations were dependent on the polygenic score of BMI.

Results

Among 92 363 US veterans in the MVP cohort (81 675 [88%] male; mean [SD] age, 56.7 [14.1] years), there were 9695 Hispanic participants, 22 488 non-Hispanic Black participants, and 60 180 non-Hispanic White participants. A total of 4811 composite CVD events were observed from 2011 to 2018. The CV of BMI was associated with 16% higher risk for composite CVD across all groups (hazard ratio [HR], 1.16; 95% CI, 1.13-1.19). These associations were unchanged among subgroups and after adjustment for the polygenic score of BMI. The UKB cohort included 65 047 individuals (mean [SD] age, 57.30 (7.77) years; 38 065 [59%] female) and had 6934 composite CVD events. Each 1-SD increase in BMI variability in the UKB cohort was associated with 8% increased risk of cardiovascular death (HR, 1.08; 95% CI, 1.04-1.11).

Conclusions and Relevance

This cohort study found that among US veterans, higher BMI variability was a significant risk marker associated with adverse cardiovascular events independent of mean BMI across major racial and ethnic groups. Results were consistent in the UKB for the cardiovascular death end point. Further studies should investigate the phenotype of high BMI variability.

Collapse

Affiliation(s)

Zakaria Almuwaqqat Veterans Affairs Atlanta Healthcare System, Decatur, Georgia Division of Cardiology, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia
Qin Hui Veterans Affairs Atlanta Healthcare System, Decatur, Georgia Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, Georgia
Chang Liu Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, Georgia
Jin J. Zhou Department of Medicine and Biostatistics, University of California, Los Angeles Veterans Affairs Phoenix Healthcare System, Phoenix, Arizona
Benjamin F. Voight Corporal Michael J. Crescenz VA Medical Center, Philadelphia, Pennsylvania Department of Systems Pharmacology and Translational Therapeutics, Department of Genetics, University of Pennsylvania, Philadelphia\
Yuk-Lam Ho Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston
Daniel C. Posner Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston
Jason L. Vassy Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
J. Michael Gaziano Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston Division of Aging, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
Kelly Cho Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
Peter W. F. Wilson Veterans Affairs Atlanta Healthcare System, Decatur, Georgia Division of Cardiology, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia
Yan V. Sun Veterans Affairs Atlanta Healthcare System, Decatur, Georgia Division of Cardiology, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, Georgia

Collapse

Wang DD, Li Y, Nguyen XM, Ho YL, Hu FB, Willett WC, Wilson PW, Cho K, Gaziano JM, Djoussé L. Red Meat Intake and the Risk of Cardiovascular Diseases: A Prospective Cohort Study in the Million Veteran Program. J Nutr 2024;154:886-895. [PMID: 38163586 DOI: 10.1016/j.tjnut.2023.12.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 12/22/2023] [Accepted: 12/28/2023] [Indexed: 01/03/2024] Open

Abstract

BACKGROUND

Red meat consumption was associated with an increased risk of cardiovascular disease (CVD) in prospective cohort studies and a profile of biomarkers favoring high CVD risk in short-term controlled trials. However, several recent systematic reviews and meta-analyses concluded with no or weak evidence for limiting red meat intake.

OBJECTIVES

To prospectively examine the associations between red meat intake and incident CVD in an ongoing cohort study with diverse socioeconomic and racial or ethnic backgrounds.

METHODS

Our study included 148,506 participants [17,804 female (12.0%)] who were free of cancer, diabetes, and CVD at baseline from the Million Veteran Program. A food frequency questionnaire measured red meat intakes at baseline. Nonfatal myocardial infarction and acute ischemic stroke were identified through a high-throughput phenotyping algorithm, and fatal CVD events were identified by searching the National Death Index.

RESULTS

Comparing the extreme categories of intake, the multivariate-adjusted relative risks of CVD was 1.18 (95% CI: 1.01, 1.38; P-trend < 0.0001) for total red meat, 1.14 (95% CI: 0.96, 1.36; P-trend = 0.01) for unprocessed red meat, and 1.29 (95% CI: 1.04, 1.60; P-trend = 0.003) for processed red meat. We observed a more pronounced positive association between red meat intake and CVD in African American participants than in White participants (P-interaction = 0.01). Replacing 0.5 servings/d of red meat with 0.5 servings/d of nuts, whole grains, and skimmed milk was associated with 14% (RR: 0.86; 95% CI: 0.83, 0.90), 7% (RR: 0.93; 95% CI: 0.89, 0.96), and 4% (RR: 0.96; 95% CI: 0.94, 0.99) lower risks of CVD, respectively.

CONCLUSIONS

Red meat consumption is associated with an increased risk of CVD. Our findings support lowering red meat intake and replacing red meat with plant-based protein sources or low-fat dairy foods as a key dietary recommendation for the prevention of CVD.

Collapse

Affiliation(s)

Dong D Wang Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States; The Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States; Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA, United States; Broad Institute of MIT and Harvard, Cambridge, MA, United States.
Yanping Li Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States; Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA, United States
Xuan-Mai Nguyen Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States; Department of Medicine, Division of Aging, Brigham and Women's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States
Yuk-Lam Ho Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States
Frank B Hu The Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States; Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA, United States; Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States
Walter C Willett The Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States; Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA, United States; Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States
Peter Wf Wilson Atlanta VA Medical Center, Atlanta, GA, United States; Emory Clinical Cardiovascular Research Institute, Atlanta, GA, United States
Kelly Cho Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States; Department of Medicine, Division of Aging, Brigham and Women's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States
J Michael Gaziano Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States; Department of Medicine, Division of Aging, Brigham and Women's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States
Luc Djoussé Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States; Department of Medicine, Division of Aging, Brigham and Women's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States

Collapse

Yan C, Ong HH, Grabowska ME, Krantz MS, Su WC, Dickson AL, Peterson JF, Feng Q, Roden DM, Stein CM, Kerchberger VE, Malin BA, Wei WQ. Large Language Models Facilitate the Generation of Electronic Health Record Phenotyping Algorithms. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.12.19.23300230. [PMID: 38196578 PMCID: PMC10775330 DOI: 10.1101/2023.12.19.23300230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]

Abstract

Objectives

Materials and Methods

We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (i.e., type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network.

Results

Conclusion

Collapse

Gao J, Bonzel CL, Hong C, Varghese P, Zakir K, Gronsbell J. Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms. J Am Med Inform Assoc 2024;31:640-650. [PMID: 38128118 PMCID: PMC10873838 DOI: 10.1093/jamia/ocad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/22/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023] Open

Wu X, Li W, Tu H. Big data and artificial intelligence in cancer research. Trends Cancer 2024;10:147-160. [PMID: 37977902 DOI: 10.1016/j.trecan.2023.10.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/17/2023] [Accepted: 10/20/2023] [Indexed: 11/19/2023]

Jeffery AD, Fabbri D, Reeves RM, Matheny ME. Use of Noisy Labels as Weak Learners to Identify Incompletely Ascertainable Outcomes: A Feasibility Study with Opioid-Induced Respiratory Depression. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.29.24301963. [PMID: 38352435 PMCID: PMC10863026 DOI: 10.1101/2024.01.29.24301963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]

Abstract

Objective

Materials and Methods

Results

Discussion

Conclusion

Application of a data programming paradigm with expert-informed labeling functions might have utility for phenotyping clinical phenomena that are not easily ascertainable from highly-structured data.

Collapse

Li Y, Wang DD, Nguyen XMT, Song RJ, Ho YL, Hu FB, Willett WC, Wilson PWF, Cho K, Gaziano JM, Djousse L. Plant-based diets and the incidence of cardiovascular disease: the Million Veteran Program. BMJ Nutr Prev Health 2023;6:212-220. [PMID: 38264362 PMCID: PMC10800254 DOI: 10.1136/bmjnph-2021-000401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 09/25/2023] [Indexed: 01/25/2024] Open

Abstract

Background

A healthful plant-based diet was associated with lower risks of coronary heart disease and type 2 diabetes, and a favourable profile of adiposity-associated biomarkers, while an unhealthful plant-based diet was associated with elevated risk of cardiometabolic disease in health professional populations. However, little is known about the associations between plant-based dietary patterns and risk of cardiovascular disease (CVD) in US veterans.

Methods

The study population consisted of 148 506 participants who were free of diabetes, CVD and cancer at baseline in the Veterans Affairs (VA) Million Veteran Program. Diet was assessed using a Food Frequency Questionnaire at baseline. We calculated an overall Plant-Based Diet Index (PDI), a healthful PDI (hPDI) and an unhealthful PDI (uPDI). The CVD endpoints included non-fatal myocardial infarction (MI) and acute ischaemic stroke (AIS) identified through high-throughput phenotyping algorithms approach and fatal CVD events identified by searching the National Death Index.

Results

With up to 8 years of follow-up, we documented 5025 CVD cases. After adjustment for confounding factors, a higher PDI was significantly associated with a lower risk of CVD (HR comparing extreme quintiles=0.75, 95% CI 0.68 to 0.82, P trend<0.0001). We observed an inverse association between hPDI and the risk of CVD (HR comparing extreme quintiles=0.71, 95% CI 0.64 to 0.78, P trend<0.001), whereas uPDI was positively associated with the risk of CVD (HR comparing extreme quintiles=1.12, 95% CI 1.02 to 1.24, P trend<0.001). We found similar associations of hPDI with subtypes of CVD; a 10-unit increment in hPDI was associated with HRs (95% CI) of 0.81 (0.75 to 0.87) for fatal CVD, 0.86 (0.79 to 0.94) for non-fatal MI and 0.86 (0.78 to 0.95) for non-fatal AIS.

Conclusions

Plant-based dietary pattern enriched with healthier plant foods was associated with a substantially lower CVD risk in US veterans.

Collapse

Affiliation(s)

Yanping Li Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
Dong D Wang Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA The Channing Division for Network Medicine,Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
Xuan-Mai T Nguyen Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA Carle Illinois College of Medicine, University of Illinois Urbana Champaign, Champaign, Illinois, USA
Rebecca J Song Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts, USA
Yuk-Lam Ho Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
Frank B Hu Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA The Channing Division for Network Medicine,Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
Walter C Willett Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA The Channing Division for Network Medicine,Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
Peter W F Wilson Epidemiology and Genomic Medicine, Atlanta VA Medical Center, Atlanta, Massachusetts, USA Division of Cardiology, Emory Clinical Cardiovascular Research Institute, Atlanta, Georgia, USA
Kelly Cho Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA Division of Aging, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
John Michael Gaziano Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA Division of Aging, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
Luc Djousse Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA Division of Aging, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA

Collapse

Chen F, Ahimaz P, Wang K, Chung WK, Ta C, Weng C, Liu C. Phenotype-Driven Molecular Genetic Test Recommendation for Diagnosing Pediatric Rare Disorders. RESEARCH SQUARE 2023:rs.3.rs-3593490. [PMID: 38045411 PMCID: PMC10690317 DOI: 10.21203/rs.3.rs-3593490/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]

He S, Park S, Kuklina E, Therrien NL, Lundeen EA, Wall HK, Lampley K, Kompaniyets L, Pierce SL, Sperling L, Jackson SL. Leveraging Electronic Health Records to Construct a Phenotype for Hypertension Surveillance in the United States. Am J Hypertens 2023;36:677-685. [PMID: 37696605 PMCID: PMC10898654 DOI: 10.1093/ajh/hpad081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/10/2023] [Accepted: 09/06/2023] [Indexed: 09/13/2023] Open

Affiliation(s)

Siran He Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
Soyoun Park Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
Elena Kuklina Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
Nicole L Therrien Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
Elizabeth A Lundeen Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
Hilary K Wall Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
Katrice Lampley Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA ASRT, INC, Smyrna, GA, USA
Lyudmyla Kompaniyets Division of Nutrition, Physical Activity, and Obesity, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
Samantha L Pierce Division of Nutrition, Physical Activity, and Obesity, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
Laurence Sperling Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
Sandra L Jackson Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA

Collapse

Lee HJ, Schwamm LH, Sansing L, Kamel H, de Havenon A, Turner AC, Sheth KN, Krishnaswamy S, Brandt C, Zhao H, Krumholz H, Sharma R. StrokeClassifier: Ischemic Stroke Etiology Classification by Ensemble Consensus Modeling Using Electronic Health Records. RESEARCH SQUARE 2023:rs.3.rs-3367169. [PMID: 37961532 PMCID: PMC10635373 DOI: 10.21203/rs.3.rs-3367169/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]

Abstract

Determining the etiology of an acute ischemic stroke (AIS) is fundamental to secondary stroke prevention efforts but can be diagnostically challenging. We trained and validated an automated classification machine intelligence tool, StrokeClassifier, using electronic health record (EHR) text data from 2,039 non-cryptogenic AIS patients at 2 academic hospitals to predict the 4-level outcome of stroke etiology determined by agreement of at least 2 board-certified vascular neurologists' review of the stroke hospitalization EHR. StrokeClassifier is an ensemble consensus meta-model of 9 machine learning classifiers applied to features extracted from discharge summary texts by natural language processing. StrokeClassifier was externally validated in 406 discharge summaries from the MIMIC-III dataset reviewed by a vascular neurologist to ascertain stroke etiology. Compared with stroke etiologies adjudicated by vascular neurologists, StrokeClassifier achieved the mean cross-validated accuracy of 0.74 (±0.01) and weighted F1 of 0.74 (±0.01). In the MIMIC-III cohort, the accuracy and weighted F1 of StrokeClassifier were 0.70 and 0.71, respectively. SHapley Additive exPlanation analysis elucidated that the top 5 features contributing to stroke etiology prediction were atrial fibrillation, age, middle cerebral artery occlusion, internal carotid artery occlusion, and frontal stroke location. We then designed a certainty heuristic to deem a StrokeClassifier diagnosis as confidently non-cryptogenic by the degree of consensus among the 9 classifiers, and applied it to 788 cryptogenic patients. This reduced the percentage of the cryptogenic strokes from 25.2% to 7.2% of all ischemic strokes. StrokeClassifier is a validated artificial intelligence tool that rivals the performance of vascular neurologists in classifying ischemic stroke etiology for individual patients. With further training, StrokeClassifier may have downstream applications including its use as a clinical decision support system.

Collapse

Srinivasan S, Wu P, Mercader JM, Udler MS, Porneala BC, Bartz TM, Floyd JS, Sitlani C, Guo X, Haessler J, Kooperberg C, Liu J, Ahmad S, van Duijn C, Liu CT, Goodarzi MO, Florez JC, Meigs JB, Rotter JI, Rich SS, Dupuis J, Leong A. A Type 1 Diabetes Polygenic Score Is Not Associated With Prevalent Type 2 Diabetes in Large Population Studies. J Endocr Soc 2023;7:bvad123. [PMID: 37841955 PMCID: PMC10576255 DOI: 10.1210/jendso/bvad123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Indexed: 10/17/2023] Open

Affiliation(s)

Shylaja Srinivasan Division of Pediatric Endocrinology, University of California at San Francisco, San Francisco, CA 94158, USA
Peitao Wu Department of Biostatistics, Boston University School of Public Health, Boston, MA 02215, USA
Josep M Mercader Department of Medicine, Harvard Medical School, Boston, MA 02115, USA Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard & Massachusetts Institute of Technology, Cambridge, MA 02142, USA Center for Genomic Medicine and Diabetes Unit, Massachusetts General Hospital, Boston, MA 02114, USA
Miriam S Udler Department of Medicine, Harvard Medical School, Boston, MA 02115, USA Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard & Massachusetts Institute of Technology, Cambridge, MA 02142, USA Center for Genomic Medicine and Diabetes Unit, Massachusetts General Hospital, Boston, MA 02114, USA
Bianca C Porneala Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
Traci M Bartz Department of Biostatistics, University of Washington, Seattle, WA 98195, USA Cardiovascular Health Research Unit, University of Washington, Seattle, WA 98195, USA
James S Floyd Cardiovascular Health Research Unit, University of Washington, Seattle, WA 98195, USA Department of Medicine, University of Washington, Seattle, WA 98195, USA Department of Epidemiology, University of Washington, Seattle, WA 98195, USA
Colleen Sitlani Cardiovascular Health Research Unit, University of Washington, Seattle, WA 98195, USA Department of Medicine, University of Washington, Seattle, WA 98195, USA
Xiquing Guo The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
Jeffrey Haessler Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
Charles Kooperberg Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
Jun Liu Department of Epidemiology, Erasmus Medical Center, 3015 GD Rotterdam, The Netherlands Nuffield Department of Population Health, University of Oxford, Oxford OX1 2JD, UK
Shahzad Ahmad Department of Epidemiology, Erasmus Medical Center, 3015 GD Rotterdam, The Netherlands
Cornelia van Duijn Department of Epidemiology, Erasmus Medical Center, 3015 GD Rotterdam, The Netherlands Nuffield Department of Population Health, University of Oxford, Oxford OX1 2JD, UK
Ching-Ti Liu Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard & Massachusetts Institute of Technology, Cambridge, MA 02142, USA
Mark O Goodarzi Division of Endocrinology, Diabetes and Metabolism, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
Jose C Florez Department of Medicine, Harvard Medical School, Boston, MA 02115, USA Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard & Massachusetts Institute of Technology, Cambridge, MA 02142, USA Center for Genomic Medicine and Diabetes Unit, Massachusetts General Hospital, Boston, MA 02114, USA
James B Meigs Department of Medicine, Harvard Medical School, Boston, MA 02115, USA Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard & Massachusetts Institute of Technology, Cambridge, MA 02142, USA Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
Jerome I Rotter The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
Stephen S Rich Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22903, USA
Josée Dupuis Department of Biostatistics, Boston University School of Public Health, Boston, MA 02215, USA
Aaron Leong Department of Medicine, Harvard Medical School, Boston, MA 02115, USA Center for Genomic Medicine and Diabetes Unit, Massachusetts General Hospital, Boston, MA 02114, USA Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA 02114, USA

Collapse

Hong C, Liang L, Yuan Q, Cho K, Liao KP, Pencina MJ, Christiani DC, Cai T. Semi-supervised calibration of noisy event risk (SCANER) with electronic health records. J Biomed Inform 2023;144:104425. [PMID: 37331495 PMCID: PMC10478159 DOI: 10.1016/j.jbi.2023.104425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Revised: 05/05/2023] [Accepted: 05/19/2023] [Indexed: 06/20/2023]

Abstract

OBJECTIVE

Electronic health records (EHR), containing detailed longitudinal clinical information on a large number of patients and covering broad patient populations, open opportunities for comprehensive predictive modeling of disease progression and treatment response. However, since EHRs were originally constructed for administrative purposes not for research, in the EHR-linked studies, it is often not feasible to capture reliable information for analytical variables, especially in the survival setting, when both accurate event status and event times are needed for model building. For example, progression-free survival (PFS), a commonly used survival outcome for cancer patients, often involves complex information embedded in free-text clinical notes and cannot be extracted reliably. Proxies of PFS time such as time to the first mention of progression in the notes are at best good approximations to the true event time. This leads to difficulty in efficiently estimating event rates for an EHR patient cohort. Estimating survival rates based on error-prone outcome definitions can lead to biased results and hamper the power in the downstream analysis. On the other hand, extracting accurate event time information via manual annotation is time and resource intensive. The objective of this study is to develop a calibrated survival rate estimator using noisy outcomes from EHR data.

MATERIALS AND METHODS

In this paper, we propose a two-stage semi-supervised calibration of noisy event rate (SCANER) estimator that can effectively overcome censoring induced dependency and attains more robust performance (i.e., not sensitive to misspecification of the imputation model) by fully utilizing both a small-labeled set of gold-standard survival outcomes annotated via manual chart review and a set of proxy features automatically captured via EHR in the unlabeled set. We validate the SCANER estimator by estimating the PFS rates for a virtual cohort of lung cancer patients from one large tertiary care center and the ICU-free survival rates for COVID patients from two large tertiary care centers.

RESULTS

In terms of survival rate estimates, the SCANER had very similar point estimates compared to the complete-case Kaplan Meier estimator. On the other hand, other benchmark methods for comparison, which fail to account for the induced dependency between event time and the censoring time conditioning on surrogate outcomes, produced biased results across all three case studies. In terms of standard errors, the SCANER estimator was more efficient than the KM estimator, with up to 50% efficiency gain.

CONCLUSION

The SCANER estimator achieves more efficient, robust, and accurate survival rate estimates compared to existing approaches. This promising new approach can also improve the resolution (i.e., granularity of event time) by using labels conditioning on multiple surrogates, particularly among less common or poorly coded conditions.

Collapse

Yin Y. Prediction and analysis of time series data based on granular computing. Front Comput Neurosci 2023;17:1192876. [PMID: 37576071 PMCID: PMC10413556 DOI: 10.3389/fncom.2023.1192876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 07/06/2023] [Indexed: 08/15/2023] Open

Berloco F, Ciavarella S, Colucci S, Grieco LA, Guarini A, Zaccaria GM. ARGO 2.0: a Hybrid NLP/ML Framework for Diagnosis Standardization. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023;2023:1-4. [PMID: 38083100 DOI: 10.1109/embc40787.2023.10340022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]

Penrod N, Okeh C, Velez Edwards DR, Barnhart K, Senapati S, Verma SS. Leveraging electronic health record data for endometriosis research. Front Digit Health 2023;5:1150687. [PMID: 37342866 PMCID: PMC10278662 DOI: 10.3389/fdgth.2023.1150687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 05/10/2023] [Indexed: 06/23/2023] Open

Vassy JL, Posner DC, Ho YL, Gagnon DR, Galloway A, Tanukonda V, Houghton SC, Madduri RK, McMahon BH, Tsao PS, Damrauer SM, O’Donnell CJ, Assimes TL, Casas JP, Gaziano JM, Pencina MJ, Sun YV, Cho K, Wilson PW. Cardiovascular Disease Risk Assessment Using Traditional Risk Factors and Polygenic Risk Scores in the Million Veteran Program. JAMA Cardiol 2023;8:564-574. [PMID: 37133828 PMCID: PMC10157509 DOI: 10.1001/jamacardio.2023.0857] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 03/09/2023] [Indexed: 05/04/2023]

Abstract

Importance

Primary prevention of atherosclerotic cardiovascular disease (ASCVD) relies on risk stratification. Genome-wide polygenic risk scores (PRSs) are proposed to improve ASCVD risk estimation.

Objective

To determine whether genome-wide PRSs for coronary artery disease (CAD) and acute ischemic stroke improve ASCVD risk estimation with traditional clinical risk factors in an ancestrally diverse midlife population.

Design, Setting, and Participants

This was a prognostic analysis of incident events in a retrospectively defined longitudinal cohort conducted from January 1, 2011, to December 31, 2018. Included in the study were adults free of ASCVD and statin naive at baseline from the Million Veteran Program (MVP), a mega biobank with genetic, survey, and electronic health record data from a large US health care system. Data were analyzed from March 15, 2021, to January 5, 2023.

Exposures

PRSs for CAD and ischemic stroke derived from cohorts of largely European descent and risk factors, including age, sex, systolic blood pressure, total cholesterol, high-density lipoprotein (HDL) cholesterol, smoking, and diabetes status.

Main Outcomes and Measures

Incident nonfatal myocardial infarction (MI), ischemic stroke, ASCVD death, and composite ASCVD events.

Results

A total of 79 151 participants (mean [SD] age, 57.8 [13.7] years; 68 503 male [86.5%]) were included in the study. The cohort included participants from the following harmonized genetic ancestry and race and ethnicity categories: 18 505 non-Hispanic Black (23.4%), 6785 Hispanic (8.6%), and 53 861 non-Hispanic White (68.0%) with a median (5th-95th percentile) follow-up of 4.3 (0.7-6.9) years. From 2011 to 2018, 3186 MIs (4.0%), 1933 ischemic strokes (2.4%), 867 ASCVD deaths (1.1%), and 5485 composite ASCVD events (6.9%) were observed. CAD PRS was associated with incident MI in non-Hispanic Black (hazard ratio [HR], 1.10; 95% CI, 1.02-1.19), Hispanic (HR, 1.26; 95% CI, 1.09-1.46), and non-Hispanic White (HR, 1.23; 95% CI, 1.18-1.29) participants. Stroke PRS was associated with incident stroke in non-Hispanic White participants (HR, 1.15; 95% CI, 1.08-1.21). A combined CAD plus stroke PRS was associated with ASCVD deaths among non-Hispanic Black (HR, 1.19; 95% CI, 1.03-1.17) and non-Hispanic (HR, 1.11; 95% CI, 1.03-1.21) participants. The combined PRS was also associated with composite ASCVD across all ancestry groups but greater among non-Hispanic White (HR, 1.20; 95% CI, 1.16-1.24) than non-Hispanic Black (HR, 1.11; 95% CI, 1.05-1.17) and Hispanic (HR, 1.12; 95% CI, 1.00-1.25) participants. Net reclassification improvement from adding PRS to a traditional risk model was modest for the intermediate risk group for composite CVD among men (5-year risk >3.75%, 0.38%; 95% CI, 0.07%-0.68%), among women, (6.79%; 95% CI, 3.01%-10.58%), for age older than 55 years (0.25%; 95% CI, 0.03%-0.47%), and for ages 40 to 55 years (1.61%; 95% CI, -0.07% to 3.30%).

Conclusions and Relevance

Study results suggest that PRSs derived predominantly in European samples were statistically significantly associated with ASCVD in the multiancestry midlife and older-age MVP cohort. Overall, modest improvement in discrimination metrics were observed with addition of PRSs to traditional risk factors with greater magnitude in women and younger age groups.

Collapse

Affiliation(s)

Jason L. Vassy Veterans Affairs Boston Healthcare System, Boston, Massachusetts Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, Massachusetts
Daniel C. Posner Veterans Affairs Boston Healthcare System, Boston, Massachusetts
Yuk-Lam Ho Veterans Affairs Boston Healthcare System, Boston, Massachusetts
David R. Gagnon Veterans Affairs Boston Healthcare System, Boston, Massachusetts Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
Ashley Galloway Veterans Affairs Boston Healthcare System, Boston, Massachusetts
Vidisha Tanukonda Veterans Affairs Atlanta Healthcare System, Decatur, Georgia
Serena C. Houghton Veterans Affairs Boston Healthcare System, Boston, Massachusetts
Ravi K. Madduri Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois University of Chicago Consortium for Advanced Science and Engineering, The University of Chicago, Chicago, Illinois
Benjamin H. McMahon Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, New Mexico
Philip S. Tsao Palo Alto VA Healthcare System, Palo Alto, California Stanford Cardiovascular Institute, Stanford University, Stanford, California
Scott M. Damrauer Corporal Michael J. Crescenz VA Medical Center, Philadelphia, Pennsylvania Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia
Christopher J. O’Donnell Veterans Affairs Boston Healthcare System, Boston, Massachusetts
Themistocles L. Assimes Palo Alto VA Healthcare System, Palo Alto, California Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, California Stanford Cardiovascular Institute, Stanford University, Stanford, California
Juan P. Casas Veterans Affairs Boston Healthcare System, Boston, Massachusetts Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, Massachusetts
J. Michael Gaziano Veterans Affairs Boston Healthcare System, Boston, Massachusetts Division of Aging, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
Michael J. Pencina Department of Biostatistics, Duke University Medical Center, Durham, North Carolina
Yan V. Sun Veterans Affairs Atlanta Healthcare System, Decatur, Georgia Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia
Kelly Cho Veterans Affairs Boston Healthcare System, Boston, Massachusetts Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, Massachusetts
Peter W.F. Wilson Veterans Affairs Atlanta Healthcare System, Decatur, Georgia Division of Cardiology, Emory University School of Medicine, Atlanta, Georgia Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia

Collapse

Huang S, Cai T, Weber BN, He Z, Dahal KP, Hong C, Hou J, Seyok T, Cagan A, DiCarli MF, Joseph J, Kim SC, Solomon DH, Cai T, Liao KP. Association Between Inflammation, Incident Heart Failure, and Heart Failure Subtypes in Patients With Rheumatoid Arthritis. Arthritis Care Res (Hoboken) 2023;75:1036-1045. [PMID: 34623035 PMCID: PMC8989720 DOI: 10.1002/acr.24804] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 09/27/2021] [Accepted: 10/05/2021] [Indexed: 12/14/2022]

Abstract

OBJECTIVE

In rheumatoid arthritis (RA), there are limited data on risk factors for the clinical heart failure (HF) subtypes of HF with reduced ejection fraction (HFrEF) and HF with preserved ejection fraction (HFpEF). This study examined the association between inflammation and incident HF subtypes in RA. Because inflammation changes over time with disease activity, we hypothesized that the effect of inflammation may be stronger at the 5-year follow-up than at the standard 10-year follow-up from general population studies of cardiovascular risk.

METHODS

We studied an electronic health record (EHR)-based RA cohort with data pre- and post-RA incidence. We applied a validated approach to identify HF and extract ejection fraction to classify HFrEF and HFpEF. Follow-up started from the RA incidence date (index date) to the earliest occurrence of incident HF, death, last EHR encounter, or 10 years. Baseline inflammation was assessed using erythrocyte sedimentation rate or C-reactive protein values. Covariates included demographic characteristics, established HF risk factors, and RA-related factors. We tested the association between baseline inflammation with incident HF and its subtypes using Cox proportional hazards models.

RESULTS

We studied 9,087 patients with RA; 8.2% developed HF during 10 years of follow-up. Elevated inflammation was associated with increased risk for HF at both 5- and 10-year follow-ups (hazard ratio [HR] 1.66, 95% confidence interval [95% CI] 1.12-2.46 and HR 1.46, 95% CI 1.13-1.90, respectively), which is also seen for HFpEF at 5 years (HR 1.72, 95% CI 1.09-2.70) and 10 years (HR 1.45, 95% CI 1.07-1.94). HFrEF was not associated with inflammation for either follow-up time.

CONCLUSION

Elevated inflammation early in RA diagnosis was associated with HF; this association was driven by HFpEF and not HFrEF, suggesting a window of opportunity for prevention of HFpEF in RA.

Collapse

Affiliation(s)

Sicong Huang Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity Section of Rheumatology Veterans Administration Boston Healthcare System
Tianrun Cai Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity Veterans Administration Boston Healthcare System
Brittany N. Weber Brigham and Women’s Hospital and Harvard Medical School Cardiovascular Division
Zeling He Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity
Kumar P. Dahal Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity Veterans Administration Boston Healthcare System
Chuan Hong Veterans Administration Boston Healthcare System Department of Biomedical Informatics, Harvard Medical School Biostatistics, Harvard T.H. Chan School of Public Health
Jue Hou Veterans Administration Boston Healthcare System Biostatistics, Harvard T.H. Chan School of Public Health
Thany Seyok Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity
Andrew Cagan Brigham and Women’s Hospital and Harvard Medical School Research Information Science and Computing, Mass General Brigham
Marcelo F. DiCarli Brigham and Women’s Hospital and Harvard Medical School Cardiovascular Division
Jacob Joseph Brigham and Women’s Hospital and Harvard Medical School Veterans Administration Boston Healthcare System Cardiovascular Division
Seoyoung C. Kim Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity Division of Pharmacoepidemiology and Pharmacoeconomics
Daniel H. Solomon Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity
Tianxi Cai Veterans Administration Boston Healthcare System Department of Biomedical Informatics, Harvard Medical School Biostatistics, Harvard T.H. Chan School of Public Health
Katherine P. Liao Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity Section of Rheumatology Veterans Administration Boston Healthcare System Department of Biomedical Informatics, Harvard Medical School

Collapse

Zhang HG, Honerlaw JP, Maripuri M, Samayamuthu MJ, Beaulieu-Jones BR, Baig HS, L'Yi S, Ho YL, Morris M, Panickan VA, Wang X, Weber GM, Liao KP, Visweswaran S, Tan BWQ, Yuan W, Gehlenborg N, Muralidhar S, Ramoni RB, Kohane IS, Xia Z, Cho K, Cai T, Brat GA. Potential pitfalls in the use of real-world data for studying long COVID. Nat Med 2023;29:1040-1043. [PMID: 37055567 PMCID: PMC10205658 DOI: 10.1038/s41591-023-02274-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2023]

Affiliation(s)

Harrison G Zhang Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
Jacqueline P Honerlaw Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
Monika Maripuri Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
Malarkodi Jebathilagam Samayamuthu Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
Brendin R Beaulieu-Jones Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Huma S Baig Department of Surgery, Beth Israel Deaconess Medical Center, Boston, MA, USA
Sehi L'Yi Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Yuk-Lam Ho Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
Michele Morris Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
Vidul Ayakulangara Panickan Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Xuan Wang Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Griffin M Weber Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Katherine P Liao Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
Shyam Visweswaran Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
Bryce W Q Tan Department of Medicine, National University Hospital, Singapore, Singapore, Singapore
William Yuan Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Nils Gehlenborg Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Sumitra Muralidhar Office of Research and Development, US Department of Veterans Affairs, Washington DC, USA
Rachel B Ramoni Office of Research and Development, US Department of Veterans Affairs, Washington DC, USA
Isaac S Kohane Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Zongqi Xia Department of Neurology, University of Pittsburgh, Pittsburgh, PA, USA
Kelly Cho Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
Tianxi Cai Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
Gabriel A Brat Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Collapse

He T, Belouali A, Patricoski J, Lehmann H, Ball R, Anagnostou V, Kreimeyer K, Botsis T. Trends and opportunities in computable clinical phenotyping: A scoping review. J Biomed Inform 2023;140:104335. [PMID: 36933631 DOI: 10.1016/j.jbi.2023.104335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/07/2023] [Accepted: 03/09/2023] [Indexed: 03/18/2023]

Abstract

Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.

Collapse

James KN, Phadke S, Wong TC, Chowdhury S. Artificial Intelligence in the Genetic Diagnosis of Rare Disease. Clin Lab Med 2023;43:127-143. [PMID: 36764805 DOI: 10.1016/j.cll.2022.09.023] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]

Wan NC, Yaqoob AA, Ong HH, Zhao J, Wei WQ. Evaluating resources composing the PheMAP knowledge base to enhance high-throughput phenotyping. J Am Med Inform Assoc 2023;30:456-465. [PMID: 36451277 PMCID: PMC9933070 DOI: 10.1093/jamia/ocac234] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 10/28/2022] [Accepted: 11/23/2022] [Indexed: 12/02/2022] Open

Abstract

OBJECTIVE

A previous study, PheMAP, combined independent, online resources to enable high-throughput phenotyping (HTP) using electronic health records (EHRs). However, online resources offer distinct quality descriptions of diseases which may affect phenotyping performance. We aimed to evaluate the phenotyping performance of single resource-based PheMAPs and investigate an optimized strategy for HTP.

MATERIALS AND METHODS

We compared how each resource produced top-ranked concept unique identifiers (CUIs) by term frequency-inverse document frequency with Jaccard matrices comparing single resources and the original PheMAP. We correlated top-ranked concepts from each resource to features used in established Phenotype KnowledgeBase (PheKB) algorithms for hypothyroidism, type II diabetes mellitus (T2DM), and dementias. Using resources separately, we calculated multiple phenotype risk scores for individuals from Vanderbilt University Medical Center's BioVU DNA Biobank and compared phenotyping performance against rule-based eMERGE algorithms. Lastly, we implemented an ensemble strategy which classified patient case/control status based upon PheMAP resource agreement.

RESULTS

Jaccard similarity matrices indicate that the similarity of CUIs comprising single resource-based PheMAPs varies. Single resource-based PheMAPs generated from MedlinePlus and MedicineNet outperformed others but only encompass 81.6% of overall disease phenotypes. We propose the PheMAP-Ensemble which provides higher average accuracy and precision than the combined average accuracy and precision of single resource-based PheMAPs. While offering complete phenotype coverage, PheMAP-Ensemble significantly increases phenotyping recall compared to the original iteration.

CONCLUSIONS

Resources comprising the PheMAP produce different phenotyping performance when implemented individually. The ensemble method significantly improves the quality of PheMAP by fully utilizing dissimilar resources to capture accurate phenotyping data from EHRs.

Collapse

Yang S, Varghese P, Stephenson E, Tu K, Gronsbell J. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 2023;30:367-381. [PMID: 36413056 PMCID: PMC9846699 DOI: 10.1093/jamia/ocac216] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/27/2022] [Accepted: 10/27/2022] [Indexed: 11/23/2022] Open

Abstract

OBJECTIVE

Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used.

MATERIALS AND METHODS

We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies.

RESULTS

Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions.

DISCUSSION

Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released.

CONCLUSION

Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.

Collapse

Carrell DS, Gruber S, Floyd JS, Bann MA, Cushing-Haugen KL, Johnson RL, Graham V, Cronkite DJ, Hazlehurst BL, Felcher AH, Bejan CA, Kennedy A, Shinde MU, Karami S, Ma Y, Stojanovic D, Zhao Y, Ball R, Nelson JC. Improving Methods of Identifying Anaphylaxis for Medical Product Safety Surveillance Using Natural Language Processing and Machine Learning. Am J Epidemiol 2022;192:283-295. [PMID: 36331289 PMCID: PMC9896464 DOI: 10.1093/aje/kwac182] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 07/06/2022] [Accepted: 10/11/2022] [Indexed: 11/06/2022] Open

Nelson AE, Arbeeva L. Narrative Review of Machine Learning in Rheumatic and Musculoskeletal Diseases for Clinicians and Researchers: Biases, Goals, and Future Directions. J Rheumatol 2022;49:1191-1200. [PMID: 35840150 PMCID: PMC9633365 DOI: 10.3899/jrheum.220326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/21/2022] [Indexed: 11/22/2022]

Hamamoto R, Koyama T, Kouno N, Yasuda T, Yui S, Sudo K, Hirata M, Sunami K, Kubo T, Takasawa K, Takahashi S, Machino H, Kobayashi K, Asada K, Komatsu M, Kaneko S, Yatabe Y, Yamamoto N. Introducing AI to the molecular tumor board: one direction toward the establishment of precision medicine using large-scale cancer clinical and biological information. Exp Hematol Oncol 2022;11:82. [PMID: 36316731 PMCID: PMC9620610 DOI: 10.1186/s40164-022-00333-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 10/05/2022] [Indexed: 11/10/2022] Open

Affiliation(s)

Ryuji Hamamoto grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,2grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
Takafumi Koyama grid.272242.30000 0001 2168 5385Department of Experimental Therapeutics, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
Nobuji Kouno grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,4grid.258799.80000 0004 0372 2033Department of Surgery, Graduate School of Medicine, Kyoto University, Yoshida-konoe-cho, Sakyo-ku, Kyoto, 606-8303 Japan
Tomohiro Yasuda grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,5grid.417547.40000 0004 1763 9564Research and Development Group, Hitachi, Ltd., 1-280 Higashi-koigakubo, Kokubunji, Tokyo, 185-8601 Japan
Shuntaro Yui grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,5grid.417547.40000 0004 1763 9564Research and Development Group, Hitachi, Ltd., 1-280 Higashi-koigakubo, Kokubunji, Tokyo, 185-8601 Japan
Kazuki Sudo grid.272242.30000 0001 2168 5385Department of Experimental Therapeutics, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,6grid.272242.30000 0001 2168 5385Department of Medical Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
Makoto Hirata grid.272242.30000 0001 2168 5385Department of Genetic Medicine and Services, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
Kuniko Sunami grid.272242.30000 0001 2168 5385Department of Laboratory Medicine, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
Takashi Kubo grid.272242.30000 0001 2168 5385Department of Laboratory Medicine, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
Ken Takasawa grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,2grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
Satoshi Takahashi grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,2grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
Hidenori Machino grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,2grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
Kazuma Kobayashi grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,2grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
Ken Asada grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,2grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
Masaaki Komatsu grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,2grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
Syuzo Kaneko grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,2grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
Yasushi Yatabe grid.272242.30000 0001 2168 5385Department of Diagnostic Pathology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,10grid.272242.30000 0001 2168 5385Division of Molecular Pathology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
Noboru Yamamoto grid.272242.30000 0001 2168 5385Department of Experimental Therapeutics, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan

Collapse

Nogues IE, Wen J, Lin Y, Liu M, Tedeschi SK, Geva A, Cai T, Hong C. Weakly Semi-supervised phenotyping using Electronic Health records. J Biomed Inform 2022;134:104175. [PMID: 36064111 PMCID: PMC10112494 DOI: 10.1016/j.jbi.2022.104175] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 04/23/2022] [Accepted: 08/15/2022] [Indexed: 01/07/2023]

Abstract

OBJECTIVE

Electronic Health Record (EHR) based phenotyping is a crucial yet challenging problem in the biomedical field. Though clinicians typically determine patient-level diagnoses via manual chart review, the sheer volume and heterogeneity of EHR data renders such tasks challenging, time-consuming, and prohibitively expensive, thus leading to a scarcity of clinical annotations in EHRs. Weakly supervised learning algorithms have been successfully applied to various EHR phenotyping problems, due to their ability to leverage information from large quantities of unlabeled samples to better inform predictions based on a far smaller number of patients. However, most weakly supervised methods are subject to the challenge to choose the right cutoff value to generate an optimal classifier. Furthermore, since they only utilize the most informative features (i.e., main ICD and NLP counts) they may fail for episodic phenotypes that cannot be consistently detected via ICD and NLP data. In this paper, we propose a label-efficient, weakly semi-supervised deep learning algorithm for EHR phenotyping (WSS-DL), which overcomes the limitations above.

MATERIALS AND METHODS

WSS-DL classifies patient-level disease status through a series of learning stages: 1) generating silver standard labels, 2) deriving enhanced-silver-standard labels by fitting a weakly supervised deep learning model to data with silver standard labels as outcomes and high dimensional EHR features as input, and 3) obtaining the final prediction score and classifier by fitting a supervised learning model to data with a minimal number of gold standard labels as the outcome, and the enhanced-silver-standard labels and a minimal set of most informative EHR features as input. To assess the generalizability of WSS-DL across different phenotypes and medical institutions, we apply WSS-DL to classify a total of 17 diseases, including both acute and chronic conditions, using EHR data from three healthcare systems. Additionally, we determine the minimum quantity of training labels required by WSS-DL to outperform existing supervised and semi-supervised phenotyping methods.

RESULTS

The proposed method, in combining the strengths of deep learning and weakly semi-supervised learning, successfully leverages the crucial phenotyping information contained in EHR features from unlabeled samples. Indeed, the deep learning model's ability to handle high-dimensional EHR features allows it to generate strong phenotype status predictions from silver standard labels. These predictions, in turn, provide highly effective features in the final logistic regression stage, leading to high phenotyping accuracy in notably small subsets of labeled data (e.g. n = 40 labeled samples).

CONCLUSION

Our method's high performance in EHR datasets with very small numbers of labels indicates its potential value in aiding doctors to diagnose rare diseases as well as conditions susceptible to misdiagnosis.

Collapse

Ferolito B, do Valle IF, Gerlovin H, Costa L, Casas JP, Gaziano JM, Gagnon DR, Begoli E, Barabási AL, Cho K. Visualizing novel connections and genetic similarities across diseases using a network-medicine based approach. Sci Rep 2022;12:14914. [PMID: 36050444 PMCID: PMC9436158 DOI: 10.1038/s41598-022-19244-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 08/26/2022] [Indexed: 11/08/2022] Open

Affiliation(s)

Brian Ferolito VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA.
Italo Faria do Valle VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA Center for Complex Network Research, Department of Physics, Northeastern University, Boston, 02115, USA
Hanna Gerlovin VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA
Lauren Costa VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA
Juan P Casas VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA Brigham and Women's Hospital, Division of Aging, Department of Medicine, Harvard Medical School, Boston, 02115, USA
J Michael Gaziano VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA Brigham and Women's Hospital, Division of Aging, Department of Medicine, Harvard Medical School, Boston, 02115, USA
David R Gagnon VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA School of Public Health, Department of Biostatistics, Boston University, Boston, 02215, USA
Edmon Begoli Oak Ridge National Laboratory, Oak Ridge, 37830, USA
Albert-László Barabási Center for Complex Network Research, Department of Physics, Northeastern University, Boston, 02115, USA
Kelly Cho VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA Brigham and Women's Hospital, Division of Aging, Department of Medicine, Harvard Medical School, Boston, 02115, USA

Collapse

Noori A, Magdamo C, Liu X, Tyagi T, Li Z, Kondepudi A, Alabsi H, Rudmann E, Wilcox D, Brenner L, Robbins GK, Moura L, Zafar S, Benson NM, Hsu J, R Dickson J, Serrano-Pozo A, Hyman BT, Blacker D, Westover MB, Mukerji SS, Das S. Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phenotyping of Cognitive Status in Electronic Health Records: Diagnostic Study. J Med Internet Res 2022;24:e40384. [PMID: 36040790 PMCID: PMC9472045 DOI: 10.2196/40384] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 07/29/2022] [Accepted: 07/31/2022] [Indexed: 11/13/2022] Open

Abstract

BACKGROUND

Electronic health records (EHRs) with large sample sizes and rich information offer great potential for dementia research, but current methods of phenotyping cognitive status are not scalable.

OBJECTIVE

The aim of this study was to evaluate whether natural language processing (NLP)-powered semiautomated annotation can improve the speed and interrater reliability of chart reviews for phenotyping cognitive status.

METHODS

In this diagnostic study, we developed and evaluated a semiautomated NLP-powered annotation tool (NAT) to facilitate phenotyping of cognitive status. Clinical experts adjudicated the cognitive status of 627 patients at Mass General Brigham (MGB) health care, using NAT or traditional chart reviews. Patient charts contained EHR data from two data sets: (1) records from January 1, 2017, to December 31, 2018, for 100 Medicare beneficiaries from the MGB Accountable Care Organization and (2) records from 2 years prior to COVID-19 diagnosis to the date of COVID-19 diagnosis for 527 MGB patients. All EHR data from the relevant period were extracted; diagnosis codes, medications, and laboratory test values were processed and summarized; clinical notes were processed through an NLP pipeline; and a web tool was developed to present an integrated view of all data. Cognitive status was rated as cognitively normal, cognitively impaired, or undetermined. Assessment time and interrater agreement of NAT compared to manual chart reviews for cognitive status phenotyping was evaluated.

RESULTS

NAT adjudication provided higher interrater agreement (Cohen κ=0.89 vs κ=0.80) and significant speed up (time difference mean 1.4, SD 1.3 minutes; P<.001; ratio median 2.2, min-max 0.4-20) over manual chart reviews. There was moderate agreement with manual chart reviews (Cohen κ=0.67). In the cases that exhibited disagreement with manual chart reviews, NAT adjudication was able to produce assessments that had broader clinical consensus due to its integrated view of highlighted relevant information and semiautomated NLP features.

CONCLUSIONS

NAT adjudication improves the speed and interrater reliability for phenotyping cognitive status compared to manual chart reviews. This study underscores the potential of an NLP-based clinically adjudicated method to build large-scale dementia research cohorts from EHRs.

Collapse

Affiliation(s)

Ayush Noori Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
Colin Magdamo Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
Xiao Liu Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
Tanish Tyagi Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
Zhaozhi Li Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
Akhil Kondepudi Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
Haitham Alabsi Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Emily Rudmann Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Vaccine and Immunotherapy Center, Division of Infectious Disease, Boston, MA, United States
Douglas Wilcox Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Laura Brenner Harvard Medical School, Boston, MA, United States Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA, United States
Gregory K Robbins Harvard Medical School, Boston, MA, United States Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
Lidia Moura Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Sahar Zafar Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Nicole M Benson Harvard Medical School, Boston, MA, United States Mongan Institute, Massachusetts General Hospital, Boston, MA, United States McLean Hospital, Belmont, MA, United States
John Hsu Harvard Medical School, Boston, MA, United States Mongan Institute, Massachusetts General Hospital, Boston, MA, United States
John R Dickson Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Alberto Serrano-Pozo Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Bradley T Hyman Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Deborah Blacker Harvard Medical School, Boston, MA, United States Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States
M Brandon Westover Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Shibani S Mukerji Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
Sudeshna Das Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States

Collapse

Brandt PS, Pacheco JA, Adekkanattu P, Sholle ET, Abedian S, Stone DJ, Knaack DM, Xu J, Xu Z, Peng Y, Benda NC, Wang F, Luo Y, Jiang G, Pathak J, Rasmussen LV. Design and validation of a FHIR-based EHR-driven phenotyping toolbox. J Am Med Inform Assoc 2022;29:1449-1460. [PMID: 35799370 PMCID: PMC9382394 DOI: 10.1093/jamia/ocac063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 04/04/2022] [Accepted: 06/17/2022] [Indexed: 12/14/2022] Open

Krantz MS, Kerchberger VE, Wei WQ. Novel Analysis Methods to Mine Immune-Mediated Phenotypes and Find Genetic Variation Within the Electronic Health Record (Roadmap for Phenotype to Genotype: Immunogenomics). THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2022;10:1757-1762. [PMID: 35487368 PMCID: PMC9624141 DOI: 10.1016/j.jaip.2022.04.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 04/13/2022] [Accepted: 04/18/2022] [Indexed: 06/14/2023]

Liang L, Hou J, Uno H, Cho K, Ma Y, Cai T. Semi-supervised approach to event time annotation using longitudinal electronic health records. LIFETIME DATA ANALYSIS 2022;28:428-491. [PMID: 35753014 PMCID: PMC10044535 DOI: 10.1007/s10985-022-09557-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 05/13/2022] [Indexed: 06/15/2023]

Ge T, Irvin MR, Patki A, Srinivasasainagendra V, Lin YF, Tiwari HK, Armstrong ND, Benoit B, Chen CY, Choi KW, Cimino JJ, Davis BH, Dikilitas O, Etheridge B, Feng YCA, Gainer V, Huang H, Jarvik GP, Kachulis C, Kenny EE, Khan A, Kiryluk K, Kottyan L, Kullo IJ, Lange C, Lennon N, Leong A, Malolepsza E, Miles AD, Murphy S, Namjou B, Narayan R, O'Connor MJ, Pacheco JA, Perez E, Rasmussen-Torvik LJ, Rosenthal EA, Schaid D, Stamou M, Udler MS, Wei WQ, Weiss ST, Ng MCY, Smoller JW, Lebo MS, Meigs JB, Limdi NA, Karlson EW. Development and validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations. Genome Med 2022;14:70. [PMID: 35765100 PMCID: PMC9241245 DOI: 10.1186/s13073-022-01074-2] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 06/16/2022] [Indexed: 02/05/2023] Open

Abstract

BACKGROUND

Type 2 diabetes (T2D) is a worldwide scourge caused by both genetic and environmental risk factors that disproportionately afflicts communities of color. Leveraging existing large-scale genome-wide association studies (GWAS), polygenic risk scores (PRS) have shown promise to complement established clinical risk factors and intervention paradigms, and improve early diagnosis and prevention of T2D. However, to date, T2D PRS have been most widely developed and validated in individuals of European descent. Comprehensive assessment of T2D PRS in non-European populations is critical for equitable deployment of PRS to clinical practice that benefits global populations.

METHODS

We integrated T2D GWAS in European, African, and East Asian populations to construct a trans-ancestry T2D PRS using a newly developed Bayesian polygenic modeling method, and assessed the prediction accuracy of the PRS in the multi-ethnic Electronic Medical Records and Genomics (eMERGE) study (11,945 cases; 57,694 controls), four Black cohorts (5137 cases; 9657 controls), and the Taiwan Biobank (4570 cases; 84,996 controls). We additionally evaluated a post hoc ancestry adjustment method that can express the polygenic risk on the same scale across ancestrally diverse individuals and facilitate the clinical implementation of the PRS in prospective cohorts.

RESULTS

The trans-ancestry PRS was significantly associated with T2D status across the ancestral groups examined. The top 2% of the PRS distribution can identify individuals with an approximately 2.5-4.5-fold of increase in T2D risk, which corresponds to the increased risk of T2D for first-degree relatives. The post hoc ancestry adjustment method eliminated major distributional differences in the PRS across ancestries without compromising its predictive performance.

CONCLUSIONS

By integrating T2D GWAS from multiple populations, we developed and validated a trans-ancestry PRS, and demonstrated its potential as a meaningful index of risk among diverse patients in clinical settings. Our efforts represent the first step towards the implementation of the T2D PRS into routine healthcare.

Collapse

Affiliation(s)

Tian Ge Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA. Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA. Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA. Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Marguerite R Irvin Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
Amit Patki Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
Vinodh Srinivasasainagendra Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
Yen-Feng Lin Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan Department of Public Health & Medical Humanities, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan Institute of Behavioral Medicine, College of Medicine, National Cheng Kung University, Tainan, Taiwan
Hemant K Tiwari Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
Nicole D Armstrong Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
Barbara Benoit Mass General Brigham Research Information Science & Computing, Boston, MA, USA
Chia-Yen Chen Translational Biology, Biogen Inc., Cambridge, MA, USA
Karmel W Choi Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
James J Cimino Informatics Institute, University of Alabama at Birmingham, Birmingham, AL, USA
Brittney H Davis Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
Ozan Dikilitas Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA Department of Internal Medicine, Mayo Clinician-Investigator Training Program, Mayo Clinic, Rochester, MN, USA
Bethany Etheridge Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
Yen-Chen Anne Feng Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
Vivian Gainer Mass General Brigham Research Information Science & Computing, Boston, MA, USA
Hailiang Huang Broad Institute of MIT and Harvard, Cambridge, MA, USA Department of Medicine, Massachusetts General Hospital, Boston, MA, USA Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
Gail P Jarvik Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
Christopher Kachulis Broad Institute of MIT and Harvard, Cambridge, MA, USA
Eimear E Kenny Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Atlas Khan Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, USA
Krzysztof Kiryluk Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, USA
Leah Kottyan Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
Iftikhar J Kullo Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
Christoph Lange Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Niall Lennon Broad Institute of MIT and Harvard, Cambridge, MA, USA
Aaron Leong Broad Institute of MIT and Harvard, Cambridge, MA, USA Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, USA Diabetes Unit, Massachusetts General Hospital, Boston, MA, USA
Edyta Malolepsza Broad Institute of MIT and Harvard, Cambridge, MA, USA
Ayme D Miles Informatics Institute, University of Alabama at Birmingham, Birmingham, AL, USA
Shawn Murphy Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
Bahram Namjou Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
Renuka Narayan Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
Mark J O'Connor UMass Memorial Health Care, Worcester, MA, USA
Jennifer A Pacheco Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Emma Perez Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA Mass General Brigham Personalized Medicine, Boston, MA, USA
Laura J Rasmussen-Torvik Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Elisabeth A Rosenthal Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
Daniel Schaid Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
Maria Stamou Division of Endocrinology, Massachusetts General Hospital, Boston, MA, USA
Miriam S Udler Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
Wei-Qi Wei Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
Scott T Weiss Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
Maggie C Y Ng Vanderbilt Genetics Institute, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
Jordan W Smoller Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
Matthew S Lebo Broad Institute of MIT and Harvard, Cambridge, MA, USA Mass General Brigham Personalized Medicine, Boston, MA, USA Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
James B Meigs Broad Institute of MIT and Harvard, Cambridge, MA, USA Department of Medicine, Massachusetts General Hospital, Boston, MA, USA Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, USA
Nita A Limdi Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
Elizabeth W Karlson Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA Mass General Brigham Personalized Medicine, Boston, MA, USA

Collapse

Ghosh D, Mastej E, Jain R, Choi YS. Causal Inference in Radiomics: Framework, Mechanisms, and Algorithms. Front Neurosci 2022;16:884708. [PMID: 35812228 PMCID: PMC9261933 DOI: 10.3389/fnins.2022.884708] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Accepted: 05/20/2022] [Indexed: 12/30/2022] Open

Link NB, Huang S, Cai T, Sun J, Dahal K, Costa L, Cho K, Liao K, Cai T, Hong C. Binary acronym disambiguation in clinical notes from electronic health records with an application in computational phenotyping. Int J Med Inform 2022;162:104753. [PMID: 35405530 DOI: 10.1016/j.ijmedinf.2022.104753] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 03/11/2022] [Accepted: 03/27/2022] [Indexed: 01/05/2023]

Abstract

OBJECTIVE

The use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce a semi-supervised method for binary acronym disambiguation, the task of classifying a target sense for acronyms in the clinical EHR notes.

METHODS

We developed a semi-supervised ensemble machine learning (CASEml) algorithm to automatically identify when an acronym means a target sense by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard semi-supervised method and a baseline metric selecting the most frequent acronym sense. Along with evaluating the performance of these methods for specific instances of acronyms, we evaluated the impact of acronym disambiguation on NLP-driven phenotyping of rheumatoid arthritis.

RESULTS

CASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art semi-supervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis.

CONCLUSION

CASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and semi-supervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.

Collapse

Chan JTH, Liew DFL, Stojanova J, McMaster C. Better Pharmacovigilance Through Artificial Intelligence: What Is Needed To Make This A Reality? HEALTH POLICY AND TECHNOLOGY 2022. [DOI: 10.1016/j.hlpt.2022.100638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

Wang DD, Li Y, Nguyen XMT, Song RJ, Ho YL, Hu FB, Willett WC, Wilson PWF, Cho K, Gaziano JM, Djoussé L. Dietary Sodium and Potassium Intake and Risk of Non-Fatal Cardiovascular Diseases: The Million Veteran Program. Nutrients 2022;14:nu14051121. [PMID: 35268096 PMCID: PMC8912456 DOI: 10.3390/nu14051121] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 03/02/2022] [Accepted: 03/03/2022] [Indexed: 11/16/2022] Open

Abstract

Objective: To examine the association between intakes of sodium and potassium and the ratio of sodium to potassium and incident myocardial infarction and stroke. Design, Setting and Participants: Prospective cohort study of 180,156 Veterans aged 19 to 107 years with plausible dietary intake measured by food frequency questionnaire (FFQ) who were free of cardiovascular disease (CVD) and cancer at baseline in the VA Million Veteran Program (MVP). Main outcome measures: CVD defined as non-fatal myocardial infarction (MI) or acute ischemic stroke (AIS) ascertained using high-throughput phenotyping algorithms applied to electronic health records. Results: During up to 8 years of follow-up, we documented 4090 CVD cases (2499 MI and 1712 AIS). After adjustment for confounding factors, a higher sodium intake was associated with a higher risk of CVD, whereas potassium intake was inversely associated with the risk of CVD [hazard ratio (HR) comparing extreme quintiles, 95% confidence interval (CI): 1.09 (95% CI: 0.99−1.21, p trend = 0.01) for sodium and 0.87 (95% CI: 0.79−0.96, p trend = 0.005) for potassium]. In addition, the ratio of sodium to potassium (Na/K ratio) was positively associated with the risk of CVD (HR comparing extreme quintiles = 1.26, 95% CI: 1.14−1.39, p trend < 0.0001). The associations of Na/K ratio were consistent for two subtypes of CVD; one standard deviation increment in the ratio was associated with HRs (95% CI) of 1.12 (1.06−1.19) for MI and 1.11 (1.03−1.19) for AIS. In secondary analyses, the observed associations were consistent across race and status for diabetes, hypertension, and high cholesterol at baseline. Associations appeared to be more pronounced among participants with poor dietary quality. Conclusions: A high sodium intake and a low potassium intake were associated with a higher risk of CVD in this large population of US veterans.

Collapse

Affiliation(s)

Dong D Wang Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA The Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
Yanping Li Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
Xuan-Mai T Nguyen Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA Harvard Medical School, Boston, MA 02115, USA
Rebecca J Song Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA Department of Epidemiology, Boston University School of Public Health, Boston, MA 02115, USA
Yuk-Lam Ho Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA
Frank B Hu The Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
Walter C Willett The Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
Peter W F Wilson Atlanta VA Medical Center, Atlanta, GA 30033, USA Emory Clinical Cardiovascular Research Institute, Atlanta, GA 30033, USA
Kelly Cho Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA Harvard Medical School, Boston, MA 02115, USA
J Michael Gaziano Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA Harvard Medical School, Boston, MA 02115, USA
Luc Djoussé Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA Harvard Medical School, Boston, MA 02115, USA

Collapse

Cade BE, Hassan SM, Dashti HS, Kiernan M, Pavlova MK, Redline S, Karlson EW. Sleep apnea phenotyping and relationship to disease in a large clinical biobank. JAMIA Open 2022;5:ooab117. [PMID: 35156000 PMCID: PMC8826997 DOI: 10.1093/jamiaopen/ooab117] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 12/08/2021] [Accepted: 12/28/2021] [Indexed: 11/14/2022] Open

Abstract

Objective

Sleep apnea is associated with a broad range of pathophysiology. While electronic health record (EHR) information has the potential for revealing relationships between sleep apnea and associated risk factors and outcomes, practical challenges hinder its use. Our objectives were to develop a sleep apnea phenotyping algorithm that improves the precision of EHR case/control information using natural language processing (NLP); identify novel associations between sleep apnea and comorbidities in a large clinical biobank; and investigate the relationship between polysomnography statistics and comorbid disease using NLP phenotyping.

Materials and Methods

We performed clinical chart reviews on 300 participants putatively diagnosed with sleep apnea and applied International Classification of Sleep Disorders criteria to classify true cases and noncases. We evaluated 2 NLP and diagnosis code-only methods for their abilities to maximize phenotyping precision. The lead algorithm was used to identify incident and cross-sectional associations between sleep apnea and common comorbidities using 4876 NLP-defined sleep apnea cases and 3× matched controls.

Results

The optimal NLP phenotyping strategy had improved model precision (≥0.943) compared to the use of one diagnosis code (≤0.733). Of the tested diseases, 170 disorders had significant incidence odds ratios (ORs) between cases and controls, 8 of which were confirmed using polysomnography (n = 4544), and 281 disorders had significant prevalence OR between sleep apnea cases versus controls, 41 of which were confirmed using polysomnography data.

Discussion and Conclusion

An NLP-informed algorithm can improve the accuracy of case-control sleep apnea ascertainment and thus improve the performance of phenome-wide, genetic, and other EHR analyses of a highly prevalent disorder.

Sleep apnea is a common disease in which breathing partially or completely pauses during sleep, leading to less oxygen in the blood, repeated awakenings, and increased risk of developing multiple diseases. Current studies of sleep apnea often have relatively few participants due to the challenge of performing overnight sleep recordings. Electronic health record (EHR) billing code diagnoses of sleep apnea could be repurposed to increase the size of research studies, but the accuracy of the diagnoses is reduced. We developed a reusable algorithm that improves the accuracy of EHR sleep apnea diagnoses using natural language processing to extract information from clinical notes. As a proof of concept, we used the algorithm to identify hundreds of diseases that are increased among participants with sleep apnea compared to similar patients without sleep apnea. Many of these disease relationships with sleep apnea have not been previously recognized. This improved algorithm will help to accelerate future large-scale investigations of the causes and consequences of sleep apnea.

Collapse

Affiliation(s)

Brian E Cade Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, Massachusetts, USA Division of Sleep Medicine, Harvard Medical School, Boston, Massachusetts, USA Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA
Syed Moin Hassan Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, Massachusetts, USA Division of Sleep Medicine, Harvard Medical School, Boston, Massachusetts, USA Division of Pulmonary Disease and Critical Care Medicine, University of Vermont, Burlington, Vermont, USA
Hassan S Dashti Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA Department of Anesthesia, Pain, and Critical Care Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
Melissa Kiernan Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, Massachusetts, USA NeuroCare Center for Sleep, Newton, Massachusetts, USA
Milena K Pavlova Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, Massachusetts, USA Division of Sleep Medicine, Harvard Medical School, Boston, Massachusetts, USA
Susan Redline Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, Massachusetts, USA Division of Sleep Medicine, Harvard Medical School, Boston, Massachusetts, USA Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
Elizabeth W Karlson Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Boston, Massachusetts, USA

Collapse

Zhang Y, Liu M, Neykov M, Cai T. Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2022;23:83. [PMID: 37974910 PMCID: PMC10653017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]

Artificial Intelligence in Clinical Immunology. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_83] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Liang L, Kim N, Hou J, Cai T, Dahal K, Lin C, Finan S, Savovoa G, Rosso M, Polgar-Tucsanyi M, Weiner H, Chitnis T, Cai T, Xia Z. Temporal trends of multiple sclerosis disease activity: Electronic health records indicators. Mult Scler Relat Disord 2022;57:103333. [PMID: 35158446 PMCID: PMC8849591 DOI: 10.1016/j.msard.2021.103333] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 10/03/2021] [Accepted: 10/14/2021] [Indexed: 01/03/2023]