1
|
Sadeghi P, Karimi H, Lavafian A, Rashedi R, Samieefar N, Shafiekhani S, Rezaei N. Machine learning and artificial intelligence within pediatric autoimmune diseases: applications, challenges, future perspective. Expert Rev Clin Immunol 2024:1-18. [PMID: 38771915 DOI: 10.1080/1744666x.2024.2359019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 05/20/2024] [Indexed: 05/23/2024]
Abstract
INTRODUCTION Autoimmune disorders affect 4.5% to 9.4% of children, significantly reducing their quality of life. The diagnosis and prognosis of autoimmune diseases are uncertain because of the variety of onset and development. Machine learning can identify clinically relevant patterns from vast amounts of data. Hence, its introduction has been beneficial in the diagnosis and management of patients. AREAS COVERED This narrative review was conducted through searching various electronic databases, including PubMed, Scopus, and Web of Science. This study thoroughly explores the current knowledge and identifies the remaining gaps in the applications of machine learning specifically in the context of pediatric autoimmune and related diseases. EXPERT OPINION Machine learning algorithms have the potential to completely change how pediatric autoimmune disorders are identified, treated, and managed. Machine learning can assist physicians in making more precise and fast judgments, identifying new biomarkers and therapeutic targets, and personalizing treatment strategies for each patient by utilizing massive datasets and powerful analytics.
Collapse
Affiliation(s)
- Parniyan Sadeghi
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
- Student Research Committee, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Hanie Karimi
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Atiye Lavafian
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
- School of Medicine, Semnan University of Medical Science, Semnan, Iran
| | - Ronak Rashedi
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
- USERN Office, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Noosha Samieefar
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
- USERN Office, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Sajad Shafiekhani
- Department of Biomedical Engineering, Buein Zahra Technical University, Qazvin, Iran
| | - Nima Rezaei
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
- Research Center for Immunodeficiencies, Children's Medical Center, Tehran University of Medical Sciences, Tehran, Iran
- Department of Immunology, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
2
|
McCaw ZR, Gao J, Lin X, Gronsbell J. Synthetic surrogates improve power for genome-wide association studies of partially missing phenotypes in population biobanks. Nat Genet 2024:10.1038/s41588-024-01793-9. [PMID: 38872030 DOI: 10.1038/s41588-024-01793-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 05/08/2024] [Indexed: 06/15/2024]
Abstract
Within population biobanks, incomplete measurement of certain traits limits the power for genetic discovery. Machine learning is increasingly used to impute the missing values from the available data. However, performing genome-wide association studies (GWAS) on imputed traits can introduce spurious associations, identifying genetic variants that are not associated with the original trait. Here we introduce a new method, synthetic surrogate (SynSurr) analysis, which makes GWAS on imputed phenotypes robust to imputation errors. Rather than replacing missing values, SynSurr jointly analyzes the original and imputed traits. We show that SynSurr estimates the same genetic effect as standard GWAS and improves power in proportion to the quality of the imputations. SynSurr requires a commonly made missing-at-random assumption but relaxes the requirements of existing imputation methods by not requiring correct model specification. We present extensive simulations and ablation analyses to validate SynSurr and apply it to empower the GWAS of dual-energy X-ray absorptiometry traits within the UK Biobank.
Collapse
Affiliation(s)
- Zachary R McCaw
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Jianhui Gao
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
3
|
Xiao T, Kong S, Zhang Z, Hua D, Liu F. A review of big data technology and its application in cancer care. Comput Biol Med 2024; 176:108577. [PMID: 38739981 DOI: 10.1016/j.compbiomed.2024.108577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 05/07/2024] [Accepted: 05/07/2024] [Indexed: 05/16/2024]
Abstract
The development of modern medical devices and information technology has led to a rapid growth in the amount of data available for health protection information, with the concept of medical big data emerging globally, along with significant advances in cancer care relying on data-driven approaches. However, outstanding issues such as fragmented data governance, low-quality data specification, and data lock-in still make sharing challenging. Big data technology provides solutions for managing massive heterogeneous data while combining artificial intelligence (AI) techniques such as machine learning (ML) and deep learning (DL) to better mine the intrinsic connections between data. This paper surveys and organizes recent articles on big data technology and its applications in cancer, dividing them into three different types to outline their primary content and summarize their critical role in assisting cancer care. It then examines the latest research directions in big data technology in cancer and evaluates the current state of development of each type of application. Finally, current challenges and opportunities are discussed, and recommendations are made for the further integration of big data technology into the medical industry in the future.
Collapse
Affiliation(s)
- Tianyun Xiao
- Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei, 063210, China; The Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan, Hebei, 063210, China; College of Science, North China University of Science and Technology, Tangshan, Hebei, 063210, China
| | - Shanshan Kong
- College of Science, North China University of Science and Technology, Tangshan, Hebei, 063210, China.
| | - Zichen Zhang
- Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei, 063210, China; The Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan, Hebei, 063210, China; College of Science, North China University of Science and Technology, Tangshan, Hebei, 063210, China
| | - Dianbo Hua
- Beijing Sitairui Cancer Data Analysis Joint Laboratory, Beijing, 101149, China
| | - Fengchun Liu
- Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei, 063210, China; The Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan, Hebei, 063210, China; College of Science, North China University of Science and Technology, Tangshan, Hebei, 063210, China; Hebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, Hebei, China; Tangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan, Hebei, China
| |
Collapse
|
4
|
Jiang K, Cao T. Automated HIV Case Identification from the MIMIC-IV Database. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:555-564. [PMID: 38827090 PMCID: PMC11141847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Automatic HIV phenotyping is needed for HIV research based on electronic health records (EHRs). MIMIC-IV, an extension of MIMIC-III, contains more than 520,000 hospital admissions and has become a valuable EHR database for secondary medical research. However, there was no prior phenotyping algorithm to extract HIV cases from MIMIC-IV, which requires a comprehensive knowledge of the database. Moreover, previous HIV phenotyping algorithms did not consider the new HIV-1/HIV-2 antibody differentiation immunoassay tests that MIMIC-IV contains. Our work provided insight into the structure and data elements in MIMIC-IV and proposed a new HIV phenotyping algorithm to fill in these gaps. The results included MIMIC-IV's data tables and elements used, 1,781 and 1,843 HIV cases from MIMIC-IV's versions 0.4 and 2.1, respectively, and summary statistics of these two HIV case cohorts. They could be used for the development of statistical and machine learning models in future studies about the disease.
Collapse
Affiliation(s)
- Kai Jiang
- The University of Texas Health Science Center at Houston School of Public Health, Houston, Texas, United States
| | - Tru Cao
- The University of Texas Health Science Center at Houston School of Public Health, Houston, Texas, United States
| |
Collapse
|
5
|
Steinfeldt J, Wild B, Buergel T, Pietzner M, Upmeier Zu Belzen J, Vauvelle A, Hegselmann S, Denaxas S, Hemingway H, Langenberg C, Landmesser U, Deanfield J, Eils R. Medical history predicts phenome-wide disease onset and enables the rapid response to emerging health threats. Nat Commun 2024; 15:4257. [PMID: 38763986 PMCID: PMC11102902 DOI: 10.1038/s41467-024-48568-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 05/03/2024] [Indexed: 05/21/2024] Open
Abstract
The COVID-19 pandemic exposed a global deficiency of systematic, data-driven guidance to identify high-risk individuals. Here, we illustrate the utility of routinely recorded medical history to predict the risk for 1883 diseases across clinical specialties and support the rapid response to emerging health threats such as COVID-19. We developed a neural network to learn from health records of 502,460 UK Biobank. Importantly, we observed discriminative improvements over basic demographic predictors for 1774 (94.3%) endpoints. After transferring the unmodified risk models to the All of US cohort, we replicated these improvements for 1347 (89.8%) of 1500 investigated endpoints, demonstrating generalizability across healthcare systems and historically underrepresented groups. Ultimately, we showed how this approach could have been used to identify individuals vulnerable to severe COVID-19. Our study demonstrates the potential of medical history to support guidance for emerging pandemics by systematically estimating risk for thousands of diseases at once at minimal cost.
Collapse
Affiliation(s)
- Jakob Steinfeldt
- Department of Cardiology, Angiology and Intensive Care Medicine, Deutsches Herzzentrum der Charité (DHZC), Berlin, Germany
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Klinik/Centrum, Charitéplatz 1, 10117, Berlin, Germany
- Computational Medicine, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany
- Friede Springer Cardiovascular Prevention Center@Charite, Charite - University Medicine Berlin, Berlin, Germany
- Institute of Cardiovascular Sciences, University College London, London, UK
| | - Benjamin Wild
- Center for Digital Health, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany
| | - Thore Buergel
- Institute of Cardiovascular Sciences, University College London, London, UK
- Center for Digital Health, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany
| | - Maik Pietzner
- Computational Medicine, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, UK
- Precision Health University Research Institute, Queen Mary University of London and Barts NHS Trust, London, UK
| | - Julius Upmeier Zu Belzen
- Center for Digital Health, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany
| | - Andre Vauvelle
- Institute of Health Informatics, University College London, London, UK
| | - Stefan Hegselmann
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Massachusetts, USA
- Pattern Recognition and Image Analysis Lab, University of Münster, Münster, Germany
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK
- British Heart Foundation Data Science Centre, London, UK
- Health Data Research UK, London, UK
- National Institute for Health Research, Biomedical Research Centre at University College London Hospitals National Institute for Health Research, Biomedical Research Centre, London, UK
| | - Harry Hemingway
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, London, UK
- National Institute for Health Research, Biomedical Research Centre at University College London Hospitals National Institute for Health Research, Biomedical Research Centre, London, UK
| | - Claudia Langenberg
- Computational Medicine, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, UK
- Precision Health University Research Institute, Queen Mary University of London and Barts NHS Trust, London, UK
| | - Ulf Landmesser
- Department of Cardiology, Angiology and Intensive Care Medicine, Deutsches Herzzentrum der Charité (DHZC), Berlin, Germany
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Klinik/Centrum, Charitéplatz 1, 10117, Berlin, Germany
- Friede Springer Cardiovascular Prevention Center@Charite, Charite - University Medicine Berlin, Berlin, Germany
- Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, Berlin, Berlin, Germany
| | - John Deanfield
- Institute of Cardiovascular Sciences, University College London, London, UK
| | - Roland Eils
- Center for Digital Health, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany.
- Health Data Science Unit, Heidelberg University Hospital and BioQuant, Heidelberg, Germany.
| |
Collapse
|
6
|
Lee HJ, Schwamm LH, Sansing LH, Kamel H, de Havenon A, Turner AC, Sheth KN, Krishnaswamy S, Brandt C, Zhao H, Krumholz H, Sharma R. StrokeClassifier: ischemic stroke etiology classification by ensemble consensus modeling using electronic health records. NPJ Digit Med 2024; 7:130. [PMID: 38760474 PMCID: PMC11101464 DOI: 10.1038/s41746-024-01120-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 04/23/2024] [Indexed: 05/19/2024] Open
Abstract
Determining acute ischemic stroke (AIS) etiology is fundamental to secondary stroke prevention efforts but can be diagnostically challenging. We trained and validated an automated classification tool, StrokeClassifier, using electronic health record (EHR) text from 2039 non-cryptogenic AIS patients at 2 academic hospitals to predict the 4-level outcome of stroke etiology adjudicated by agreement of at least 2 board-certified vascular neurologists' review of the EHR. StrokeClassifier is an ensemble consensus meta-model of 9 machine learning classifiers applied to features extracted from discharge summary texts by natural language processing. StrokeClassifier was externally validated in 406 discharge summaries from the MIMIC-III dataset reviewed by a vascular neurologist to ascertain stroke etiology. Compared with vascular neurologists' diagnoses, StrokeClassifier achieved the mean cross-validated accuracy of 0.74 and weighted F1 of 0.74 for multi-class classification. In MIMIC-III, its accuracy and weighted F1 were 0.70 and 0.71, respectively. In binary classification, the two metrics ranged from 0.77 to 0.96. The top 5 features contributing to stroke etiology prediction were atrial fibrillation, age, middle cerebral artery occlusion, internal carotid artery occlusion, and frontal stroke location. We designed a certainty heuristic to grade the confidence of StrokeClassifier's diagnosis as non-cryptogenic by the degree of consensus among the 9 classifiers and applied it to 788 cryptogenic patients, reducing cryptogenic diagnoses from 25.2% to 7.2%. StrokeClassifier is a validated artificial intelligence tool that rivals the performance of vascular neurologists in classifying ischemic stroke etiology. With further training, StrokeClassifier may have downstream applications including its use as a clinical decision support system.
Collapse
Affiliation(s)
- Ho-Joon Lee
- Department of Genetics and Yale Center for Genome Analysis, Yale School of Medicine, New Haven, CT, USA.
| | - Lee H Schwamm
- Department of Neurology and Comprehensive Stroke Center, Massachusetts General Hospital and Harvard Medical School Boston, Boston, MA, USA
- Department of Neurology, Yale School of Medicine, New Haven, CT, USA
| | - Lauren H Sansing
- Department of Neurology, Yale School of Medicine, New Haven, CT, USA
| | - Hooman Kamel
- Department of Neurology, Weill Cornell Medicine, New York City, NY, USA
| | - Adam de Havenon
- Department of Neurology, Yale School of Medicine, New Haven, CT, USA
| | - Ashby C Turner
- Department of Neurology and Comprehensive Stroke Center, Massachusetts General Hospital and Harvard Medical School Boston, Boston, MA, USA
| | - Kevin N Sheth
- Department of Neurology, Yale School of Medicine, New Haven, CT, USA
| | - Smita Krishnaswamy
- Departments of Genetics and Computer Science, Yale School of Medicine, New Haven, CT, USA
| | - Cynthia Brandt
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Hongyu Zhao
- Departments of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Harlan Krumholz
- Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Richa Sharma
- Department of Neurology, Yale School of Medicine, New Haven, CT, USA.
| |
Collapse
|
7
|
Shyr C, Sulieman L, Harris PA. Illuminating the landscape of high-level clinical trial opportunities in the All of Us Research Program. J Am Med Inform Assoc 2024:ocae062. [PMID: 38622899 DOI: 10.1093/jamia/ocae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 03/02/2024] [Accepted: 03/07/2024] [Indexed: 04/17/2024] Open
Abstract
OBJECTIVE With its size and diversity, the All of Us Research Program has the potential to power and improve representation in clinical trials through ancillary studies like Nutrition for Precision Health. We sought to characterize high-level trial opportunities for the diverse participants and sponsors of future trial investment. MATERIALS AND METHODS We matched All of Us participants with available trials on ClinicalTrials.gov based on medical conditions, age, sex, and geographic location. Based on the number of matched trials, we (1) developed the Trial Opportunities Compass (TOC) to help sponsors assess trial investment portfolios, (2) characterized the landscape of trial opportunities in a phenome-wide association study (PheWAS), and (3) assessed the relationship between trial opportunities and social determinants of health (SDoH) to identify potential barriers to trial participation. RESULTS Our study included 181 529 All of Us participants and 18 634 trials. The TOC identified opportunities for portfolio investment and gaps in currently available trials across federal, industrial, and academic sponsors. PheWAS results revealed an emphasis on mental disorder-related trials, with anxiety disorder having the highest adjusted increase in the number of matched trials (59% [95% CI, 57-62]; P < 1e-300). Participants from certain communities underrepresented in biomedical research, including self-reported racial and ethnic minorities, had more matched trials after adjusting for other factors. Living in a nonmetropolitan area was associated with up to 13.1 times fewer matched trials. DISCUSSION AND CONCLUSION All of Us data are a valuable resource for identifying trial opportunities to inform trial portfolio planning. Characterizing these opportunities with consideration for SDoH can provide guidance on prioritizing the most pressing barriers to trial participation.
Collapse
Affiliation(s)
- Cathy Shyr
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Lina Sulieman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Paul A Harris
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37240, United States
| |
Collapse
|
8
|
Yan C, Ong HH, Grabowska ME, Krantz MS, Su WC, Dickson AL, Peterson JF, Feng Q, Roden DM, Stein CM, Kerchberger VE, Malin BA, Wei WQ. Large language models facilitate the generation of electronic health record phenotyping algorithms. J Am Med Inform Assoc 2024:ocae072. [PMID: 38613820 DOI: 10.1093/jamia/ocae072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/21/2024] [Accepted: 03/22/2024] [Indexed: 04/15/2024] Open
Abstract
OBJECTIVES Phenotyping is a core task in observational health research utilizing electronic health records (EHRs). Developing an accurate algorithm demands substantial input from domain experts, involving extensive literature review and evidence synthesis. This burdensome process limits scalability and delays knowledge discovery. We investigate the potential for leveraging large language models (LLMs) to enhance the efficiency of EHR phenotyping by generating high-quality algorithm drafts. MATERIALS AND METHODS We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (ie, type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network. RESULTS GPT-4 and GPT-3.5 exhibited significantly higher overall expert evaluation scores in instruction following, algorithmic logic, and SQL executability, when compared to Claude 2 and Bard. Although GPT-4 and GPT-3.5 effectively identified relevant clinical concepts, they exhibited immature capability in organizing phenotyping criteria with the proper logic, leading to phenotyping algorithms that were either excessively restrictive (with low recall) or overly broad (with low positive predictive values). CONCLUSION GPT versions 3.5 and 4 are capable of drafting phenotyping algorithms by identifying relevant clinical criteria aligned with a CDM. However, expertise in informatics and clinical experience is still required to assess and further refine generated algorithms.
Collapse
Affiliation(s)
- Chao Yan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Henry H Ong
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Monika E Grabowska
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Matthew S Krantz
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Wu-Chen Su
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Alyson L Dickson
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Josh F Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - QiPing Feng
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Dan M Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - C Michael Stein
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - V Eric Kerchberger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Computer Science, Vanderbilt University, Nashville, TN 37203, United States
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Computer Science, Vanderbilt University, Nashville, TN 37203, United States
| |
Collapse
|
9
|
Wei WQ, Rowley R, Wood A, MacArthur J, Embi PJ, Denaxas S. Improving reporting standards for phenotyping algorithm in biomedical research: 5 fundamental dimensions. J Am Med Inform Assoc 2024; 31:1036-1041. [PMID: 38269642 PMCID: PMC10990558 DOI: 10.1093/jamia/ocae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/12/2023] [Accepted: 01/08/2024] [Indexed: 01/26/2024] Open
Abstract
INTRODUCTION Phenotyping algorithms enable the interpretation of complex health data and definition of clinically relevant phenotypes; they have become crucial in biomedical research. However, the lack of standardization and transparency inhibits the cross-comparison of findings among different studies, limits large scale meta-analyses, confuses the research community, and prevents the reuse of algorithms, which results in duplication of efforts and the waste of valuable resources. RECOMMENDATIONS Here, we propose five independent fundamental dimensions of phenotyping algorithms-complexity, performance, efficiency, implementability, and maintenance-through which researchers can describe, measure, and deploy any algorithms efficiently and effectively. These dimensions must be considered in the context of explicit use cases and transparent methods to ensure that they do not reflect unexpected biases or exacerbate inequities.
Collapse
Affiliation(s)
- Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Robb Rowley
- National Human Genome Research Institute, Bethesda, MD 20892, United States
| | - Angela Wood
- Department of Public Health and Primary Care, University of Cambridge, Cambridge, CB2 1TN, United Kingdom
| | - Jacqueline MacArthur
- British Heart Foundation Data Science Center, Health Data Research, London, NW1 2BE, United Kingdom
| | - Peter J Embi
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Spiros Denaxas
- British Heart Foundation Data Science Center, Health Data Research, London, NW1 2BE, United Kingdom
- Institute of Health Informatics, University College London, London, WC1E 6BT, United Kingdom
| |
Collapse
|
10
|
Jeffery AD, Fabbri D, Reeves RM, Matheny ME. Use of noisy labels as weak learners to identify incompletely ascertainable outcomes: A Feasibility study with opioid-induced respiratory depression. Heliyon 2024; 10:e26434. [PMID: 38444495 PMCID: PMC10912240 DOI: 10.1016/j.heliyon.2024.e26434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 02/09/2024] [Accepted: 02/13/2024] [Indexed: 03/07/2024] Open
Abstract
Objective Assigning outcome labels to large observational data sets in a timely and accurate manner, particularly when outcomes are rare or not directly ascertainable, remains a significant challenge within biomedical informatics. We examined whether noisy labels generated from subject matter experts' heuristics using heterogenous data types within a data programming paradigm could provide outcomes labels to a large, observational data set. We chose the clinical condition of opioid-induced respiratory depression for our use case because it is rare, has no administrative codes to easily identify the condition, and typically requires at least some unstructured text to ascertain its presence. Materials and methods Using de-identified electronic health records of 52,861 post-operative encounters, we applied a data programming paradigm (implemented in the Snorkel software) for the development of a machine learning classifier for opioid-induced respiratory depression. Our approach included subject matter experts creating 14 labeling functions that served as noisy labels for developing a probabilistic Generative model. We used probabilistic labels from the Generative model as outcome labels for training a Discriminative model on the source data. We evaluated performance of the Discriminative model with a hold-out test set of 599 independently-reviewed patient records. Results The final Discriminative classification model achieved an accuracy of 0.977, an F1 score of 0.417, a sensitivity of 1.0, and an AUC of 0.988 in the hold-out test set with a prevalence of 0.83% (5/599). Discussion All of the confirmed Cases were identified by the classifier. For rare outcomes, this finding is encouraging because it reduces the number of manual reviews needed by excluding visits/patients with low probabilities. Conclusion Application of a data programming paradigm with expert-informed labeling functions might have utility for phenotyping clinical phenomena that are not easily ascertainable from highly-structured data.
Collapse
Affiliation(s)
- Alvin D. Jeffery
- Vanderbilt University School of Nursing, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ruth M. Reeves
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| | - Michael E. Matheny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| |
Collapse
|
11
|
Almuwaqqat Z, Hui Q, Liu C, Zhou JJ, Voight BF, Ho YL, Posner DC, Vassy JL, Gaziano JM, Cho K, Wilson PWF, Sun YV. Long-Term Body Mass Index Variability and Adverse Cardiovascular Outcomes. JAMA Netw Open 2024; 7:e243062. [PMID: 38512255 PMCID: PMC10958234 DOI: 10.1001/jamanetworkopen.2024.3062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 01/23/2024] [Indexed: 03/22/2024] Open
Abstract
Importance Body mass index (BMI; calculated as weight in kilograms divided by height in meters squared) is a commonly used estimate of obesity, which is a complex trait affected by genetic and lifestyle factors. Marked weight gain and loss could be associated with adverse biological processes. Objective To evaluate the association between BMI variability and incident cardiovascular disease (CVD) events in 2 distinct cohorts. Design, Setting, and Participants This cohort study used data from the Million Veteran Program (MVP) between 2011 and 2018 and participants in the UK Biobank (UKB) enrolled between 2006 and 2010. Participants were followed up for a median of 3.8 (5th-95th percentile, 3.5) years. Participants with baseline CVD or cancer were excluded. Data were analyzed from September 2022 and September 2023. Exposure BMI variability was calculated by the retrospective SD and coefficient of variation (CV) using multiple clinical BMI measurements up to the baseline. Main Outcomes and Measures The main outcome was incident composite CVD events (incident nonfatal myocardial infarction, acute ischemic stroke, and cardiovascular death), assessed using Cox proportional hazards modeling after adjustment for CVD risk factors, including age, sex, mean BMI, systolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, smoking status, diabetes status, and statin use. Secondary analysis assessed whether associations were dependent on the polygenic score of BMI. Results Among 92 363 US veterans in the MVP cohort (81 675 [88%] male; mean [SD] age, 56.7 [14.1] years), there were 9695 Hispanic participants, 22 488 non-Hispanic Black participants, and 60 180 non-Hispanic White participants. A total of 4811 composite CVD events were observed from 2011 to 2018. The CV of BMI was associated with 16% higher risk for composite CVD across all groups (hazard ratio [HR], 1.16; 95% CI, 1.13-1.19). These associations were unchanged among subgroups and after adjustment for the polygenic score of BMI. The UKB cohort included 65 047 individuals (mean [SD] age, 57.30 (7.77) years; 38 065 [59%] female) and had 6934 composite CVD events. Each 1-SD increase in BMI variability in the UKB cohort was associated with 8% increased risk of cardiovascular death (HR, 1.08; 95% CI, 1.04-1.11). Conclusions and Relevance This cohort study found that among US veterans, higher BMI variability was a significant risk marker associated with adverse cardiovascular events independent of mean BMI across major racial and ethnic groups. Results were consistent in the UKB for the cardiovascular death end point. Further studies should investigate the phenotype of high BMI variability.
Collapse
Affiliation(s)
- Zakaria Almuwaqqat
- Veterans Affairs Atlanta Healthcare System, Decatur, Georgia
- Division of Cardiology, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia
| | - Qin Hui
- Veterans Affairs Atlanta Healthcare System, Decatur, Georgia
- Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, Georgia
| | - Chang Liu
- Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, Georgia
| | - Jin J. Zhou
- Department of Medicine and Biostatistics, University of California, Los Angeles
- Veterans Affairs Phoenix Healthcare System, Phoenix, Arizona
| | - Benjamin F. Voight
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, Pennsylvania
- Department of Systems Pharmacology and Translational Therapeutics, Department of Genetics, University of Pennsylvania, Philadelphia\
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston
| | - Daniel C. Posner
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston
| | - Jason L. Vassy
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
| | - J. Michael Gaziano
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston
- Division of Aging, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
| | - Kelly Cho
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
| | - Peter W. F. Wilson
- Veterans Affairs Atlanta Healthcare System, Decatur, Georgia
- Division of Cardiology, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia
| | - Yan V. Sun
- Veterans Affairs Atlanta Healthcare System, Decatur, Georgia
- Division of Cardiology, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia
- Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, Georgia
| |
Collapse
|
12
|
Wang DD, Li Y, Nguyen XM, Ho YL, Hu FB, Willett WC, Wilson PW, Cho K, Gaziano JM, Djoussé L. Red Meat Intake and the Risk of Cardiovascular Diseases: A Prospective Cohort Study in the Million Veteran Program. J Nutr 2024; 154:886-895. [PMID: 38163586 DOI: 10.1016/j.tjnut.2023.12.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 12/22/2023] [Accepted: 12/28/2023] [Indexed: 01/03/2024] Open
Abstract
BACKGROUND Red meat consumption was associated with an increased risk of cardiovascular disease (CVD) in prospective cohort studies and a profile of biomarkers favoring high CVD risk in short-term controlled trials. However, several recent systematic reviews and meta-analyses concluded with no or weak evidence for limiting red meat intake. OBJECTIVES To prospectively examine the associations between red meat intake and incident CVD in an ongoing cohort study with diverse socioeconomic and racial or ethnic backgrounds. METHODS Our study included 148,506 participants [17,804 female (12.0%)] who were free of cancer, diabetes, and CVD at baseline from the Million Veteran Program. A food frequency questionnaire measured red meat intakes at baseline. Nonfatal myocardial infarction and acute ischemic stroke were identified through a high-throughput phenotyping algorithm, and fatal CVD events were identified by searching the National Death Index. RESULTS Comparing the extreme categories of intake, the multivariate-adjusted relative risks of CVD was 1.18 (95% CI: 1.01, 1.38; P-trend < 0.0001) for total red meat, 1.14 (95% CI: 0.96, 1.36; P-trend = 0.01) for unprocessed red meat, and 1.29 (95% CI: 1.04, 1.60; P-trend = 0.003) for processed red meat. We observed a more pronounced positive association between red meat intake and CVD in African American participants than in White participants (P-interaction = 0.01). Replacing 0.5 servings/d of red meat with 0.5 servings/d of nuts, whole grains, and skimmed milk was associated with 14% (RR: 0.86; 95% CI: 0.83, 0.90), 7% (RR: 0.93; 95% CI: 0.89, 0.96), and 4% (RR: 0.96; 95% CI: 0.94, 0.99) lower risks of CVD, respectively. CONCLUSIONS Red meat consumption is associated with an increased risk of CVD. Our findings support lowering red meat intake and replacing red meat with plant-based protein sources or low-fat dairy foods as a key dietary recommendation for the prevention of CVD.
Collapse
Affiliation(s)
- Dong D Wang
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States; The Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States; Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA, United States; Broad Institute of MIT and Harvard, Cambridge, MA, United States.
| | - Yanping Li
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States; Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA, United States
| | - Xuan-Mai Nguyen
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States; Department of Medicine, Division of Aging, Brigham and Women's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States
| | - Frank B Hu
- The Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States; Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA, United States; Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States
| | - Walter C Willett
- The Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States; Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA, United States; Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States
| | - Peter Wf Wilson
- Atlanta VA Medical Center, Atlanta, GA, United States; Emory Clinical Cardiovascular Research Institute, Atlanta, GA, United States
| | - Kelly Cho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States; Department of Medicine, Division of Aging, Brigham and Women's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States
| | - J Michael Gaziano
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States; Department of Medicine, Division of Aging, Brigham and Women's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States
| | - Luc Djoussé
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, United States; Department of Medicine, Division of Aging, Brigham and Women's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States
| |
Collapse
|
13
|
Yan C, Ong HH, Grabowska ME, Krantz MS, Su WC, Dickson AL, Peterson JF, Feng Q, Roden DM, Stein CM, Kerchberger VE, Malin BA, Wei WQ. Large Language Models Facilitate the Generation of Electronic Health Record Phenotyping Algorithms. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.12.19.23300230. [PMID: 38196578 PMCID: PMC10775330 DOI: 10.1101/2023.12.19.23300230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Objectives Phenotyping is a core task in observational health research utilizing electronic health records (EHRs). Developing an accurate algorithm demands substantial input from domain experts, involving extensive literature review and evidence synthesis. This burdensome process limits scalability and delays knowledge discovery. We investigate the potential for leveraging large language models (LLMs) to enhance the efficiency of EHR phenotyping by generating high-quality algorithm drafts. Materials and Methods We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (i.e., type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network. Results GPT-4 and GPT-3.5 exhibited significantly higher overall expert evaluation scores in instruction following, algorithmic logic, and SQL executability, when compared to Claude 2 and Bard. Although GPT-4 and GPT-3.5 effectively identified relevant clinical concepts, they exhibited immature capability in organizing phenotyping criteria with the proper logic, leading to phenotyping algorithms that were either excessively restrictive (with low recall) or overly broad (with low positive predictive values). Conclusion GPT versions 3.5 and 4 are capable of drafting phenotyping algorithms by identifying relevant clinical criteria aligned with a CDM. However, expertise in informatics and clinical experience is still required to assess and further refine generated algorithms.
Collapse
Affiliation(s)
- Chao Yan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Henry H. Ong
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Monika E. Grabowska
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Matthew S. Krantz
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Wu-Chen Su
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Alyson L. Dickson
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Josh F. Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - QiPing Feng
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Dan M. Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - C. Michael Stein
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - V. Eric Kerchberger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Bradley A. Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
- Department of Computer Science, Vanderbilt University, Nashville, TN
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
- Department of Computer Science, Vanderbilt University, Nashville, TN
| |
Collapse
|
14
|
Gao J, Bonzel CL, Hong C, Varghese P, Zakir K, Gronsbell J. Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms. J Am Med Inform Assoc 2024; 31:640-650. [PMID: 38128118 PMCID: PMC10873838 DOI: 10.1093/jamia/ocad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/22/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023] Open
Abstract
OBJECTIVE High-throughput phenotyping will accelerate the use of electronic health records (EHRs) for translational research. A critical roadblock is the extensive medical supervision required for phenotyping algorithm (PA) estimation and evaluation. To address this challenge, numerous weakly-supervised learning methods have been proposed. However, there is a paucity of methods for reliably evaluating the predictive performance of PAs when a very small proportion of the data is labeled. To fill this gap, we introduce a semi-supervised approach (ssROC) for estimation of the receiver operating characteristic (ROC) parameters of PAs (eg, sensitivity, specificity). MATERIALS AND METHODS ssROC uses a small labeled dataset to nonparametrically impute missing labels. The imputations are then used for ROC parameter estimation to yield more precise estimates of PA performance relative to classical supervised ROC analysis (supROC) using only labeled data. We evaluated ssROC with synthetic, semi-synthetic, and EHR data from Mass General Brigham (MGB). RESULTS ssROC produced ROC parameter estimates with minimal bias and significantly lower variance than supROC in the simulated and semi-synthetic data. For the 5 PAs from MGB, the estimates from ssROC are 30% to 60% less variable than supROC on average. DISCUSSION ssROC enables precise evaluation of PA performance without demanding large volumes of labeled data. ssROC is also easily implementable in open-source R software. CONCLUSION When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research.
Collapse
Affiliation(s)
- Jianhui Gao
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Paul Varghese
- Health Informatics, Verily Life Sciences, Cambridge, MA, United States
| | - Karim Zakir
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
15
|
Wu X, Li W, Tu H. Big data and artificial intelligence in cancer research. Trends Cancer 2024; 10:147-160. [PMID: 37977902 DOI: 10.1016/j.trecan.2023.10.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/17/2023] [Accepted: 10/20/2023] [Indexed: 11/19/2023]
Abstract
The field of oncology has witnessed an extraordinary surge in the application of big data and artificial intelligence (AI). AI development has made multiscale and multimodal data fusion and analysis possible. A new era of extracting information from complex big data is rapidly evolving. However, challenges related to efficient data curation, in-depth analysis, and utilization remain. We provide a comprehensive overview of the current state of the art in big data and computational analysis, highlighting key applications, challenges, and future opportunities in cancer research. By sketching the current landscape, we seek to foster a deeper understanding and facilitate the advancement of big data utilization in oncology, call for interdisciplinary collaborations, ultimately contributing to improved patient outcomes and a profound understanding of cancer.
Collapse
Affiliation(s)
- Xifeng Wu
- Department of Big Data in Health Science, School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China; National Institute for Data Science in Health and Medicine, Zhejiang University, Hangzhou, Zhejiang, China.
| | - Wenyuan Li
- Department of Big Data in Health Science, School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China; The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, Zhejiang, China
| | - Huakang Tu
- Department of Big Data in Health Science, School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China; Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
| |
Collapse
|
16
|
Jeffery AD, Fabbri D, Reeves RM, Matheny ME. Use of Noisy Labels as Weak Learners to Identify Incompletely Ascertainable Outcomes: A Feasibility Study with Opioid-Induced Respiratory Depression. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.29.24301963. [PMID: 38352435 PMCID: PMC10863026 DOI: 10.1101/2024.01.29.24301963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Objective Assigning outcome labels to large observational data sets in a timely and accurate manner, particularly when outcomes are rare or not directly ascertainable, remains a significant challenge within biomedical informatics. We examined whether noisy labels generated from subject matter experts' heuristics using heterogenous data types within a data programming paradigm could provide outcomes labels to a large, observational data set. We chose the clinical condition of opioid-induced respiratory depression for our use case because it is rare, has no administrative codes to easily identify the condition, and typically requires at least some unstructured text to ascertain its presence. Materials and Methods Using de-identified electronic health records of 52,861 post-operative encounters, we applied a data programming paradigm (implemented in the Snorkel software) for the development of a machine learning classifier for opioid-induced respiratory depression. Our approach included subject matter experts creating 14 labeling functions that served as noisy labels for developing a probabilistic Generative model. We used probabilistic labels from the Generative model as outcome labels for training a Discriminative model on the source data. We evaluated performance of the Discriminative model with a hold-out test set of 599 independently-reviewed patient records. Results The final Discriminative classification model achieved an accuracy of 0.977, an F1 score of 0.417, a sensitivity of 1.0, and an AUC of 0.988 in the hold-out test set with a prevalence of 0.83% (5/599). Discussion All of the confirmed Cases were identified by the classifier. For rare outcomes, this finding is encouraging because it reduces the number of manual reviews needed by excluding visits/patients with low probabilities. Conclusion Application of a data programming paradigm with expert-informed labeling functions might have utility for phenotyping clinical phenomena that are not easily ascertainable from highly-structured data.
Collapse
Affiliation(s)
- Alvin D Jeffery
- School of Nursing, Vanderbilt University, Department of Biomedical Informatics, Vanderbilt University Medical Center, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ruth M Reeves
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| | - Michael E Matheny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| |
Collapse
|
17
|
Li Y, Wang DD, Nguyen XMT, Song RJ, Ho YL, Hu FB, Willett WC, Wilson PWF, Cho K, Gaziano JM, Djousse L. Plant-based diets and the incidence of cardiovascular disease: the Million Veteran Program. BMJ Nutr Prev Health 2023; 6:212-220. [PMID: 38264362 PMCID: PMC10800254 DOI: 10.1136/bmjnph-2021-000401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 09/25/2023] [Indexed: 01/25/2024] Open
Abstract
Background A healthful plant-based diet was associated with lower risks of coronary heart disease and type 2 diabetes, and a favourable profile of adiposity-associated biomarkers, while an unhealthful plant-based diet was associated with elevated risk of cardiometabolic disease in health professional populations. However, little is known about the associations between plant-based dietary patterns and risk of cardiovascular disease (CVD) in US veterans. Methods The study population consisted of 148 506 participants who were free of diabetes, CVD and cancer at baseline in the Veterans Affairs (VA) Million Veteran Program. Diet was assessed using a Food Frequency Questionnaire at baseline. We calculated an overall Plant-Based Diet Index (PDI), a healthful PDI (hPDI) and an unhealthful PDI (uPDI). The CVD endpoints included non-fatal myocardial infarction (MI) and acute ischaemic stroke (AIS) identified through high-throughput phenotyping algorithms approach and fatal CVD events identified by searching the National Death Index. Results With up to 8 years of follow-up, we documented 5025 CVD cases. After adjustment for confounding factors, a higher PDI was significantly associated with a lower risk of CVD (HR comparing extreme quintiles=0.75, 95% CI 0.68 to 0.82, P trend<0.0001). We observed an inverse association between hPDI and the risk of CVD (HR comparing extreme quintiles=0.71, 95% CI 0.64 to 0.78, P trend<0.001), whereas uPDI was positively associated with the risk of CVD (HR comparing extreme quintiles=1.12, 95% CI 1.02 to 1.24, P trend<0.001). We found similar associations of hPDI with subtypes of CVD; a 10-unit increment in hPDI was associated with HRs (95% CI) of 0.81 (0.75 to 0.87) for fatal CVD, 0.86 (0.79 to 0.94) for non-fatal MI and 0.86 (0.78 to 0.95) for non-fatal AIS. Conclusions Plant-based dietary pattern enriched with healthier plant foods was associated with a substantially lower CVD risk in US veterans.
Collapse
Affiliation(s)
- Yanping Li
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Dong D Wang
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
- The Channing Division for Network Medicine,Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Xuan-Mai T Nguyen
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
- Carle Illinois College of Medicine, University of Illinois Urbana Champaign, Champaign, Illinois, USA
| | - Rebecca J Song
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
| | - Frank B Hu
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- The Channing Division for Network Medicine,Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Walter C Willett
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- The Channing Division for Network Medicine,Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Peter W F Wilson
- Epidemiology and Genomic Medicine, Atlanta VA Medical Center, Atlanta, Massachusetts, USA
- Division of Cardiology, Emory Clinical Cardiovascular Research Institute, Atlanta, Georgia, USA
| | - Kelly Cho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
- Division of Aging, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - John Michael Gaziano
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
- Division of Aging, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Luc Djousse
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
- Division of Aging, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
18
|
Chen F, Ahimaz P, Wang K, Chung WK, Ta C, Weng C, Liu C. Phenotype-Driven Molecular Genetic Test Recommendation for Diagnosing Pediatric Rare Disorders. RESEARCH SQUARE 2023:rs.3.rs-3593490. [PMID: 38045411 PMCID: PMC10690317 DOI: 10.21203/rs.3.rs-3593490/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Rare disease patients often endure prolonged diagnostic odysseys and may still remain undiagnosed for years. Selecting the appropriate genetic tests is crucial to lead to timely diagnosis. Phenotypic features offer great potential for aiding genomic diagnosis in rare disease cases. We see great promise in effective integration of phenotypic information into genetic test selection workflow. In this study, we present a phenotype-driven molecular genetic test recommendation (Phen2Test) for pediatric rare disease diagnosis. Phen2Test was constructed using frequency matrix of phecodes and demographic data from the EHR before ordering genetic tests, with the objective to streamline the selection of molecular genetic tests (whole-exome / whole-genome sequencing, or gene panels) for clinicians with minimum genetic training expertise. We developed and evaluated binary classifiers based on 1,005 individuals referred to genetic counselors for potential genetic evaluation. In the evaluation using the gold standard cohort, the model achieved strong performance with an AUROC of 0.82 and an AUPRC of 0.92. Furthermore, we tested the model on another silver standard cohort (n=6,458), achieving an overall AUROC of 0.72 and an AUPRC of 0.671. Phen2Test was adjusted to align with current clinical guidelines, showing superior performance with more recent data, demonstrating its potential for use within a learning healthcare system as a genomic medicine intervention that adapts to guideline updates. This study showcases the practical utility of phenotypic features in recommending molecular genetic tests with performance comparable to clinical geneticists. Phen2Test could assist clinicians with limited genetic training and knowledge to order appropriate genetic tests.
Collapse
Affiliation(s)
- Fangyi Chen
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Priyanka Ahimaz
- Department of Pediatrics, Columbia University, New York, NY, USA
- Institute of Genomic Medicine, Columbia University, New York, NY, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Wendy K. Chung
- Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Casey Ta
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
19
|
He S, Park S, Kuklina E, Therrien NL, Lundeen EA, Wall HK, Lampley K, Kompaniyets L, Pierce SL, Sperling L, Jackson SL. Leveraging Electronic Health Records to Construct a Phenotype for Hypertension Surveillance in the United States. Am J Hypertens 2023; 36:677-685. [PMID: 37696605 PMCID: PMC10898654 DOI: 10.1093/ajh/hpad081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/10/2023] [Accepted: 09/06/2023] [Indexed: 09/13/2023] Open
Abstract
BACKGROUND Hypertension is an important risk factor for cardiovascular diseases. Electronic health records (EHRs) may augment chronic disease surveillance. We aimed to develop an electronic phenotype (e-phenotype) for hypertension surveillance. METHODS We included 11,031,368 eligible adults from the 2019 IQVIA Ambulatory Electronic Medical Records-US (AEMR-US) dataset. We identified hypertension using three criteria, alone or in combination: diagnosis codes, blood pressure (BP) measurements, and antihypertensive medications. We compared AEMR-US estimates of hypertension prevalence and control against those from the National Health and Nutrition Examination Survey (NHANES) 2017-18, which defined hypertension as BP ≥130/80 mm Hg or ≥1 antihypertensive medication. RESULTS The study population had a mean (SD) age of 52.3 (6.7) years, and 56.7% were women. The selected three-criteria e-phenotype (≥1 diagnosis code, ≥2 BP measurements of ≥130/80 mm Hg, or ≥1 antihypertensive medication) yielded similar trends in hypertension prevalence as NHANES: 42.2% (AEMR-US) vs. 44.9% (NHANES) overall, 39.0% vs. 38.7% among women, and 46.5% vs. 50.9% among men. The pattern of age-related increase in hypertension prevalence was similar between AEMR-US and NHANES. The prevalence of hypertension control in AEMR-US was 31.5% using the three-criteria e-phenotype, which was higher than NHANES (14.5%). CONCLUSIONS Using an EHR dataset of 11 million adults, we constructed a hypertension e-phenotype using three criteria, which can be used for surveillance of hypertension prevalence and control.
Collapse
Affiliation(s)
- Siran He
- Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Soyoun Park
- Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Elena Kuklina
- Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Nicole L Therrien
- Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Elizabeth A Lundeen
- Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Hilary K Wall
- Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Katrice Lampley
- Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
- ASRT, INC, Smyrna, GA, USA
| | - Lyudmyla Kompaniyets
- Division of Nutrition, Physical Activity, and Obesity, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Samantha L Pierce
- Division of Nutrition, Physical Activity, and Obesity, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Laurence Sperling
- Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Sandra L Jackson
- Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
| |
Collapse
|
20
|
Lee HJ, Schwamm LH, Sansing L, Kamel H, de Havenon A, Turner AC, Sheth KN, Krishnaswamy S, Brandt C, Zhao H, Krumholz H, Sharma R. StrokeClassifier: Ischemic Stroke Etiology Classification by Ensemble Consensus Modeling Using Electronic Health Records. RESEARCH SQUARE 2023:rs.3.rs-3367169. [PMID: 37961532 PMCID: PMC10635373 DOI: 10.21203/rs.3.rs-3367169/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Determining the etiology of an acute ischemic stroke (AIS) is fundamental to secondary stroke prevention efforts but can be diagnostically challenging. We trained and validated an automated classification machine intelligence tool, StrokeClassifier, using electronic health record (EHR) text data from 2,039 non-cryptogenic AIS patients at 2 academic hospitals to predict the 4-level outcome of stroke etiology determined by agreement of at least 2 board-certified vascular neurologists' review of the stroke hospitalization EHR. StrokeClassifier is an ensemble consensus meta-model of 9 machine learning classifiers applied to features extracted from discharge summary texts by natural language processing. StrokeClassifier was externally validated in 406 discharge summaries from the MIMIC-III dataset reviewed by a vascular neurologist to ascertain stroke etiology. Compared with stroke etiologies adjudicated by vascular neurologists, StrokeClassifier achieved the mean cross-validated accuracy of 0.74 (±0.01) and weighted F1 of 0.74 (±0.01). In the MIMIC-III cohort, the accuracy and weighted F1 of StrokeClassifier were 0.70 and 0.71, respectively. SHapley Additive exPlanation analysis elucidated that the top 5 features contributing to stroke etiology prediction were atrial fibrillation, age, middle cerebral artery occlusion, internal carotid artery occlusion, and frontal stroke location. We then designed a certainty heuristic to deem a StrokeClassifier diagnosis as confidently non-cryptogenic by the degree of consensus among the 9 classifiers, and applied it to 788 cryptogenic patients. This reduced the percentage of the cryptogenic strokes from 25.2% to 7.2% of all ischemic strokes. StrokeClassifier is a validated artificial intelligence tool that rivals the performance of vascular neurologists in classifying ischemic stroke etiology for individual patients. With further training, StrokeClassifier may have downstream applications including its use as a clinical decision support system.
Collapse
Affiliation(s)
- Ho-Joon Lee
- Department of Genetics and Yale Center for Genome Analysis, Yale School of Medicine, New Haven, CT
| | - Lee H. Schwamm
- Department of Neurology and Comprehensive Stroke Center, Massachusetts General Hospital and Harvard Medical School Boston, MA
- Department of Neurology, Yale School of Medicine, New Haven, CT
| | - Lauren Sansing
- Department of Neurology, Yale School of Medicine, New Haven, CT
| | - Hooman Kamel
- Department of Neurology, Weill Cornell Medicine, New York City, NY
| | - Adam de Havenon
- Department of Neurology, Yale School of Medicine, New Haven, CT
| | - Ashby C. Turner
- Department of Neurology and Comprehensive Stroke Center, Massachusetts General Hospital and Harvard Medical School Boston, MA
| | - Kevin N. Sheth
- Department of Neurology, Yale School of Medicine, New Haven, CT
| | - Smita Krishnaswamy
- Departments of Genetics and Computer Science, Yale School of Medicine, New Haven, CT
| | - Cynthia Brandt
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT
| | - Hongyu Zhao
- Departments of Biostatistics, Yale School of Public Health, New Haven, CT
| | - Harlan Krumholz
- Department of Internal Medicine, Yale School of Medicine, New Haven, CT
| | - Richa Sharma
- Department of Neurology, Yale School of Medicine, New Haven, CT
| |
Collapse
|
21
|
Srinivasan S, Wu P, Mercader JM, Udler MS, Porneala BC, Bartz TM, Floyd JS, Sitlani C, Guo X, Haessler J, Kooperberg C, Liu J, Ahmad S, van Duijn C, Liu CT, Goodarzi MO, Florez JC, Meigs JB, Rotter JI, Rich SS, Dupuis J, Leong A. A Type 1 Diabetes Polygenic Score Is Not Associated With Prevalent Type 2 Diabetes in Large Population Studies. J Endocr Soc 2023; 7:bvad123. [PMID: 37841955 PMCID: PMC10576255 DOI: 10.1210/jendso/bvad123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Indexed: 10/17/2023] Open
Abstract
Context Both type 1 diabetes (T1D) and type 2 diabetes (T2D) have significant genetic contributions to risk and understanding their overlap can offer clinical insight. Objective We examined whether a T1D polygenic score (PS) was associated with a diagnosis of T2D in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium. Methods We constructed a T1D PS using 79 known single nucleotide polymorphisms associated with T1D risk. We analyzed 13 792 T2D cases and 14 169 controls from CHARGE cohorts to determine the association between the T1D PS and T2D prevalence. We validated findings in an independent sample of 2256 T2D cases and 27 052 controls from the Mass General Brigham Biobank (MGB Biobank). As secondary analyses in 5228 T2D cases from CHARGE, we used multivariable regression models to assess the association of the T1D PS with clinical outcomes associated with T1D. Results The T1D PS was not associated with T2D both in CHARGE (P = .15) and in the MGB Biobank (P = .87). The partitioned human leukocyte antigens only PS was associated with T2D in CHARGE (OR 1.02 per 1 SD increase in PS, 95% CI 1.01-1.03, P = .006) but not in the MGB Biobank. The T1D PS was weakly associated with insulin use (OR 1.007, 95% CI 1.001-1.012, P = .03) in CHARGE T2D cases but not with other outcomes. Conclusion In large biobank samples, a common variant PS for T1D was not consistently associated with prevalent T2D. However, possible heterogeneity in T2D cannot be ruled out and future studies are needed do subphenotyping.
Collapse
Affiliation(s)
- Shylaja Srinivasan
- Division of Pediatric Endocrinology, University of California at San Francisco, San Francisco, CA 94158, USA
| | - Peitao Wu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02215, USA
| | - Josep M Mercader
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard & Massachusetts Institute of Technology, Cambridge, MA 02142, USA
- Center for Genomic Medicine and Diabetes Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Miriam S Udler
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard & Massachusetts Institute of Technology, Cambridge, MA 02142, USA
- Center for Genomic Medicine and Diabetes Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Bianca C Porneala
- Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Traci M Bartz
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA 98195, USA
| | - James S Floyd
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA 98195, USA
- Department of Medicine, University of Washington, Seattle, WA 98195, USA
- Department of Epidemiology, University of Washington, Seattle, WA 98195, USA
| | - Colleen Sitlani
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA 98195, USA
- Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Xiquing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Jeffrey Haessler
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Jun Liu
- Department of Epidemiology, Erasmus Medical Center, 3015 GD Rotterdam, The Netherlands
- Nuffield Department of Population Health, University of Oxford, Oxford OX1 2JD, UK
| | - Shahzad Ahmad
- Department of Epidemiology, Erasmus Medical Center, 3015 GD Rotterdam, The Netherlands
| | - Cornelia van Duijn
- Department of Epidemiology, Erasmus Medical Center, 3015 GD Rotterdam, The Netherlands
- Nuffield Department of Population Health, University of Oxford, Oxford OX1 2JD, UK
| | - Ching-Ti Liu
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard & Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Mark O Goodarzi
- Division of Endocrinology, Diabetes and Metabolism, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Jose C Florez
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard & Massachusetts Institute of Technology, Cambridge, MA 02142, USA
- Center for Genomic Medicine and Diabetes Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - James B Meigs
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard & Massachusetts Institute of Technology, Cambridge, MA 02142, USA
- Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Stephen S Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22903, USA
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02215, USA
| | - Aaron Leong
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
- Center for Genomic Medicine and Diabetes Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| |
Collapse
|
22
|
Hong C, Liang L, Yuan Q, Cho K, Liao KP, Pencina MJ, Christiani DC, Cai T. Semi-supervised calibration of noisy event risk (SCANER) with electronic health records. J Biomed Inform 2023; 144:104425. [PMID: 37331495 PMCID: PMC10478159 DOI: 10.1016/j.jbi.2023.104425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Revised: 05/05/2023] [Accepted: 05/19/2023] [Indexed: 06/20/2023]
Abstract
OBJECTIVE Electronic health records (EHR), containing detailed longitudinal clinical information on a large number of patients and covering broad patient populations, open opportunities for comprehensive predictive modeling of disease progression and treatment response. However, since EHRs were originally constructed for administrative purposes not for research, in the EHR-linked studies, it is often not feasible to capture reliable information for analytical variables, especially in the survival setting, when both accurate event status and event times are needed for model building. For example, progression-free survival (PFS), a commonly used survival outcome for cancer patients, often involves complex information embedded in free-text clinical notes and cannot be extracted reliably. Proxies of PFS time such as time to the first mention of progression in the notes are at best good approximations to the true event time. This leads to difficulty in efficiently estimating event rates for an EHR patient cohort. Estimating survival rates based on error-prone outcome definitions can lead to biased results and hamper the power in the downstream analysis. On the other hand, extracting accurate event time information via manual annotation is time and resource intensive. The objective of this study is to develop a calibrated survival rate estimator using noisy outcomes from EHR data. MATERIALS AND METHODS In this paper, we propose a two-stage semi-supervised calibration of noisy event rate (SCANER) estimator that can effectively overcome censoring induced dependency and attains more robust performance (i.e., not sensitive to misspecification of the imputation model) by fully utilizing both a small-labeled set of gold-standard survival outcomes annotated via manual chart review and a set of proxy features automatically captured via EHR in the unlabeled set. We validate the SCANER estimator by estimating the PFS rates for a virtual cohort of lung cancer patients from one large tertiary care center and the ICU-free survival rates for COVID patients from two large tertiary care centers. RESULTS In terms of survival rate estimates, the SCANER had very similar point estimates compared to the complete-case Kaplan Meier estimator. On the other hand, other benchmark methods for comparison, which fail to account for the induced dependency between event time and the censoring time conditioning on surrogate outcomes, produced biased results across all three case studies. In terms of standard errors, the SCANER estimator was more efficient than the KM estimator, with up to 50% efficiency gain. CONCLUSION The SCANER estimator achieves more efficient, robust, and accurate survival rate estimates compared to existing approaches. This promising new approach can also improve the resolution (i.e., granularity of event time) by using labels conditioning on multiple surrogates, particularly among less common or poorly coded conditions.
Collapse
Affiliation(s)
- Chuan Hong
- Duke University, Durham, NC, USA; Harvard Medical School, Boston, MA, USA
| | - Liang Liang
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Qianyu Yuan
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Kelly Cho
- Harvard Medical School, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA; Brigham and Women's Hospital, Boston, MA, USA
| | - Katherine P Liao
- Harvard Medical School, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA; Brigham and Women's Hospital, Boston, MA, USA
| | | | - David C Christiani
- Harvard T.H. Chan School of Public Health, Boston, MA, USA; Massachusetts General Hospital, Boston, MA, USA
| | - Tianxi Cai
- Harvard T.H. Chan School of Public Health, Boston, MA, USA; Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
23
|
Yin Y. Prediction and analysis of time series data based on granular computing. Front Comput Neurosci 2023; 17:1192876. [PMID: 37576071 PMCID: PMC10413556 DOI: 10.3389/fncom.2023.1192876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 07/06/2023] [Indexed: 08/15/2023] Open
Abstract
The advent of the Big Data era and the rapid development of the Internet of Things have led to a dramatic increase in the amount of data from various time series. How to classify, correlation rule mining and prediction of these large-sample time series data has a crucial role. However, due to the characteristics of high dimensionality, large data volume and transmission lag of sensor data, large sample time series data are affected by multiple factors and have complex characteristics such as multi-scale, non-linearity and burstiness. Traditional time series prediction methods are no longer applicable to the study of large sample time series data. Granular computing has unique advantages in dealing with continuous and complex data, and can compensate for the limitations of traditional support vector machines in dealing with large sample data. Therefore, this paper proposes to combine granular computing theory with support vector machines to achieve large-sample time series data prediction. Firstly, the definition of time series is analyzed, and the basic principles of traditional time series forecasting methods and granular computing are investigated. Secondly, in terms of predicting the trend of data changes, it is proposed to apply the fuzzy granulation algorithm to first convert the sample data into coarser granules. Then, it is combined with a support vector machine to predict the range of change of continuous time series data over a period of time. The results of the simulation experiments show that the proposed model is able to make accurate predictions of the range of data changes in future time periods. Compared with other prediction models, the proposed model reduces the complexity of the samples and improves the prediction accuracy.
Collapse
Affiliation(s)
- Yushan Yin
- School of Electro-Mechanical Engineering, Xidian University, Xi’an, China
| |
Collapse
|
24
|
Berloco F, Ciavarella S, Colucci S, Grieco LA, Guarini A, Zaccaria GM. ARGO 2.0: a Hybrid NLP/ML Framework for Diagnosis Standardization. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083100 DOI: 10.1109/embc40787.2023.10340022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
A relevant problem in medicine is the standardization of the diagnosis associated with a clinical case. Although diagnosis formulation is an intrinsically subjective and uncertain process, its standardization may take benefit from digital solutions automating the routines at the basis of such a decision. In this work, we propose ARGO 2.0: a framework for the development of decision support systems for diagnosis formulation. The framework can read free-text reports and store their clinically relevant information as personalized electronic Case Report Forms. A hybrid strategy, exploiting the synergy of Natural Language Processing and Machine Learning techniques, is used to automatically suggest a diagnosis in a standardized fashion. ARGO 2.0 has been designed to be template-independent and easily tailored to specific medical fields. We here demonstrate its feasibility in hemo lympho-pathology, by detailing its implementation, object of an ongoing validation campaign in a standing medical institute. ARGO 2.0 achieved an average Accuracy of 95.07%, an average precision of 94.85%, an average Recall of 96.31% and a F-Score of 95.32% onto the test set, outperforming both its embedded components, based on Natural Language Processing and Machine Learning.
Collapse
|
25
|
Penrod N, Okeh C, Velez Edwards DR, Barnhart K, Senapati S, Verma SS. Leveraging electronic health record data for endometriosis research. Front Digit Health 2023; 5:1150687. [PMID: 37342866 PMCID: PMC10278662 DOI: 10.3389/fdgth.2023.1150687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 05/10/2023] [Indexed: 06/23/2023] Open
Abstract
Endometriosis is a chronic, complex disease for which there are vast disparities in diagnosis and treatment between sociodemographic groups. Clinical presentation of endometriosis can vary from asymptomatic disease-often identified during (in)fertility consultations-to dysmenorrhea and debilitating pelvic pain. Because of this complexity, delayed diagnosis (mean time to diagnosis is 1.7-3.6 years) and misdiagnosis is common. Early and accurate diagnosis of endometriosis remains a research priority for patient advocates and healthcare providers. Electronic health records (EHRs) have been widely adopted as a data source in biomedical research. However, they remain a largely untapped source of data for endometriosis research. EHRs capture diverse, real-world patient populations and care trajectories and can be used to learn patterns of underlying risk factors for endometriosis which, in turn, can be used to inform screening guidelines to help clinicians efficiently and effectively recognize and diagnose the disease in all patient populations reducing inequities in care. Here, we provide an overview of the advantages and limitations of using EHR data to study endometriosis. We describe the prevalence of endometriosis observed in diverse populations from multiple healthcare institutions, examples of variables that can be extracted from EHRs to enhance the accuracy of endometriosis prediction, and opportunities to leverage longitudinal EHR data to improve our understanding of long-term health consequences for all patients.
Collapse
Affiliation(s)
- Nadia Penrod
- College of Agriculture and Life Sciences, Texas A&M University, College Station, TX, United States
| | - Chelsea Okeh
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, Philadelphia, PA, United States
| | - Digna R. Velez Edwards
- Department of Obstetrics and Gynecology, Vanderbilt University, Nashville, TN, United States
| | - Kurt Barnhart
- Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Suneeta Senapati
- Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Shefali S. Verma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, Philadelphia, PA, United States
| |
Collapse
|
26
|
Vassy JL, Posner DC, Ho YL, Gagnon DR, Galloway A, Tanukonda V, Houghton SC, Madduri RK, McMahon BH, Tsao PS, Damrauer SM, O’Donnell CJ, Assimes TL, Casas JP, Gaziano JM, Pencina MJ, Sun YV, Cho K, Wilson PW. Cardiovascular Disease Risk Assessment Using Traditional Risk Factors and Polygenic Risk Scores in the Million Veteran Program. JAMA Cardiol 2023; 8:564-574. [PMID: 37133828 PMCID: PMC10157509 DOI: 10.1001/jamacardio.2023.0857] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 03/09/2023] [Indexed: 05/04/2023]
Abstract
Importance Primary prevention of atherosclerotic cardiovascular disease (ASCVD) relies on risk stratification. Genome-wide polygenic risk scores (PRSs) are proposed to improve ASCVD risk estimation. Objective To determine whether genome-wide PRSs for coronary artery disease (CAD) and acute ischemic stroke improve ASCVD risk estimation with traditional clinical risk factors in an ancestrally diverse midlife population. Design, Setting, and Participants This was a prognostic analysis of incident events in a retrospectively defined longitudinal cohort conducted from January 1, 2011, to December 31, 2018. Included in the study were adults free of ASCVD and statin naive at baseline from the Million Veteran Program (MVP), a mega biobank with genetic, survey, and electronic health record data from a large US health care system. Data were analyzed from March 15, 2021, to January 5, 2023. Exposures PRSs for CAD and ischemic stroke derived from cohorts of largely European descent and risk factors, including age, sex, systolic blood pressure, total cholesterol, high-density lipoprotein (HDL) cholesterol, smoking, and diabetes status. Main Outcomes and Measures Incident nonfatal myocardial infarction (MI), ischemic stroke, ASCVD death, and composite ASCVD events. Results A total of 79 151 participants (mean [SD] age, 57.8 [13.7] years; 68 503 male [86.5%]) were included in the study. The cohort included participants from the following harmonized genetic ancestry and race and ethnicity categories: 18 505 non-Hispanic Black (23.4%), 6785 Hispanic (8.6%), and 53 861 non-Hispanic White (68.0%) with a median (5th-95th percentile) follow-up of 4.3 (0.7-6.9) years. From 2011 to 2018, 3186 MIs (4.0%), 1933 ischemic strokes (2.4%), 867 ASCVD deaths (1.1%), and 5485 composite ASCVD events (6.9%) were observed. CAD PRS was associated with incident MI in non-Hispanic Black (hazard ratio [HR], 1.10; 95% CI, 1.02-1.19), Hispanic (HR, 1.26; 95% CI, 1.09-1.46), and non-Hispanic White (HR, 1.23; 95% CI, 1.18-1.29) participants. Stroke PRS was associated with incident stroke in non-Hispanic White participants (HR, 1.15; 95% CI, 1.08-1.21). A combined CAD plus stroke PRS was associated with ASCVD deaths among non-Hispanic Black (HR, 1.19; 95% CI, 1.03-1.17) and non-Hispanic (HR, 1.11; 95% CI, 1.03-1.21) participants. The combined PRS was also associated with composite ASCVD across all ancestry groups but greater among non-Hispanic White (HR, 1.20; 95% CI, 1.16-1.24) than non-Hispanic Black (HR, 1.11; 95% CI, 1.05-1.17) and Hispanic (HR, 1.12; 95% CI, 1.00-1.25) participants. Net reclassification improvement from adding PRS to a traditional risk model was modest for the intermediate risk group for composite CVD among men (5-year risk >3.75%, 0.38%; 95% CI, 0.07%-0.68%), among women, (6.79%; 95% CI, 3.01%-10.58%), for age older than 55 years (0.25%; 95% CI, 0.03%-0.47%), and for ages 40 to 55 years (1.61%; 95% CI, -0.07% to 3.30%). Conclusions and Relevance Study results suggest that PRSs derived predominantly in European samples were statistically significantly associated with ASCVD in the multiancestry midlife and older-age MVP cohort. Overall, modest improvement in discrimination metrics were observed with addition of PRSs to traditional risk factors with greater magnitude in women and younger age groups.
Collapse
Affiliation(s)
- Jason L. Vassy
- Veterans Affairs Boston Healthcare System, Boston, Massachusetts
- Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, Massachusetts
| | - Daniel C. Posner
- Veterans Affairs Boston Healthcare System, Boston, Massachusetts
| | - Yuk-Lam Ho
- Veterans Affairs Boston Healthcare System, Boston, Massachusetts
| | - David R. Gagnon
- Veterans Affairs Boston Healthcare System, Boston, Massachusetts
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
| | - Ashley Galloway
- Veterans Affairs Boston Healthcare System, Boston, Massachusetts
| | | | | | - Ravi K. Madduri
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois
- University of Chicago Consortium for Advanced Science and Engineering, The University of Chicago, Chicago, Illinois
| | - Benjamin H. McMahon
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, New Mexico
| | - Philip S. Tsao
- Palo Alto VA Healthcare System, Palo Alto, California
- Stanford Cardiovascular Institute, Stanford University, Stanford, California
| | - Scott M. Damrauer
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, Pennsylvania
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia
| | | | - Themistocles L. Assimes
- Palo Alto VA Healthcare System, Palo Alto, California
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, California
- Stanford Cardiovascular Institute, Stanford University, Stanford, California
| | - Juan P. Casas
- Veterans Affairs Boston Healthcare System, Boston, Massachusetts
- Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, Massachusetts
| | - J. Michael Gaziano
- Veterans Affairs Boston Healthcare System, Boston, Massachusetts
- Division of Aging, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
| | - Michael J. Pencina
- Department of Biostatistics, Duke University Medical Center, Durham, North Carolina
| | - Yan V. Sun
- Veterans Affairs Atlanta Healthcare System, Decatur, Georgia
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia
| | - Kelly Cho
- Veterans Affairs Boston Healthcare System, Boston, Massachusetts
- Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, Massachusetts
| | - Peter W.F. Wilson
- Veterans Affairs Atlanta Healthcare System, Decatur, Georgia
- Division of Cardiology, Emory University School of Medicine, Atlanta, Georgia
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia
| |
Collapse
|
27
|
Huang S, Cai T, Weber BN, He Z, Dahal KP, Hong C, Hou J, Seyok T, Cagan A, DiCarli MF, Joseph J, Kim SC, Solomon DH, Cai T, Liao KP. Association Between Inflammation, Incident Heart Failure, and Heart Failure Subtypes in Patients With Rheumatoid Arthritis. Arthritis Care Res (Hoboken) 2023; 75:1036-1045. [PMID: 34623035 PMCID: PMC8989720 DOI: 10.1002/acr.24804] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 09/27/2021] [Accepted: 10/05/2021] [Indexed: 12/14/2022]
Abstract
OBJECTIVE In rheumatoid arthritis (RA), there are limited data on risk factors for the clinical heart failure (HF) subtypes of HF with reduced ejection fraction (HFrEF) and HF with preserved ejection fraction (HFpEF). This study examined the association between inflammation and incident HF subtypes in RA. Because inflammation changes over time with disease activity, we hypothesized that the effect of inflammation may be stronger at the 5-year follow-up than at the standard 10-year follow-up from general population studies of cardiovascular risk. METHODS We studied an electronic health record (EHR)-based RA cohort with data pre- and post-RA incidence. We applied a validated approach to identify HF and extract ejection fraction to classify HFrEF and HFpEF. Follow-up started from the RA incidence date (index date) to the earliest occurrence of incident HF, death, last EHR encounter, or 10 years. Baseline inflammation was assessed using erythrocyte sedimentation rate or C-reactive protein values. Covariates included demographic characteristics, established HF risk factors, and RA-related factors. We tested the association between baseline inflammation with incident HF and its subtypes using Cox proportional hazards models. RESULTS We studied 9,087 patients with RA; 8.2% developed HF during 10 years of follow-up. Elevated inflammation was associated with increased risk for HF at both 5- and 10-year follow-ups (hazard ratio [HR] 1.66, 95% confidence interval [95% CI] 1.12-2.46 and HR 1.46, 95% CI 1.13-1.90, respectively), which is also seen for HFpEF at 5 years (HR 1.72, 95% CI 1.09-2.70) and 10 years (HR 1.45, 95% CI 1.07-1.94). HFrEF was not associated with inflammation for either follow-up time. CONCLUSION Elevated inflammation early in RA diagnosis was associated with HF; this association was driven by HFpEF and not HFrEF, suggesting a window of opportunity for prevention of HFpEF in RA.
Collapse
Affiliation(s)
- Sicong Huang
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
- Section of Rheumatology
- Veterans Administration Boston Healthcare System
| | - Tianrun Cai
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
- Veterans Administration Boston Healthcare System
| | - Brittany N. Weber
- Brigham and Women’s Hospital and Harvard Medical School
- Cardiovascular Division
| | - Zeling He
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
| | - Kumar P. Dahal
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
- Veterans Administration Boston Healthcare System
| | - Chuan Hong
- Veterans Administration Boston Healthcare System
- Department of Biomedical Informatics, Harvard Medical School
- Biostatistics, Harvard T.H. Chan School of Public Health
| | - Jue Hou
- Veterans Administration Boston Healthcare System
- Biostatistics, Harvard T.H. Chan School of Public Health
| | - Thany Seyok
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
| | - Andrew Cagan
- Brigham and Women’s Hospital and Harvard Medical School
- Research Information Science and Computing, Mass General Brigham
| | - Marcelo F. DiCarli
- Brigham and Women’s Hospital and Harvard Medical School
- Cardiovascular Division
| | - Jacob Joseph
- Brigham and Women’s Hospital and Harvard Medical School
- Veterans Administration Boston Healthcare System
- Cardiovascular Division
| | - Seoyoung C. Kim
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
- Division of Pharmacoepidemiology and Pharmacoeconomics
| | - Daniel H. Solomon
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
| | - Tianxi Cai
- Veterans Administration Boston Healthcare System
- Department of Biomedical Informatics, Harvard Medical School
- Biostatistics, Harvard T.H. Chan School of Public Health
| | - Katherine P. Liao
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
- Section of Rheumatology
- Veterans Administration Boston Healthcare System
- Department of Biomedical Informatics, Harvard Medical School
| |
Collapse
|
28
|
Zhang HG, Honerlaw JP, Maripuri M, Samayamuthu MJ, Beaulieu-Jones BR, Baig HS, L'Yi S, Ho YL, Morris M, Panickan VA, Wang X, Weber GM, Liao KP, Visweswaran S, Tan BWQ, Yuan W, Gehlenborg N, Muralidhar S, Ramoni RB, Kohane IS, Xia Z, Cho K, Cai T, Brat GA. Potential pitfalls in the use of real-world data for studying long COVID. Nat Med 2023; 29:1040-1043. [PMID: 37055567 PMCID: PMC10205658 DOI: 10.1038/s41591-023-02274-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2023]
Affiliation(s)
- Harrison G Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Jacqueline P Honerlaw
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | - Monika Maripuri
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | | | | | - Huma S Baig
- Department of Surgery, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Sehi L'Yi
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Xuan Wang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Katherine P Liao
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bryce W Q Tan
- Department of Medicine, National University Hospital, Singapore, Singapore, Singapore
| | - William Yuan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Sumitra Muralidhar
- Office of Research and Development, US Department of Veterans Affairs, Washington DC, USA
| | - Rachel B Ramoni
- Office of Research and Development, US Department of Veterans Affairs, Washington DC, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Kelly Cho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | - Gabriel A Brat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
29
|
He T, Belouali A, Patricoski J, Lehmann H, Ball R, Anagnostou V, Kreimeyer K, Botsis T. Trends and opportunities in computable clinical phenotyping: A scoping review. J Biomed Inform 2023; 140:104335. [PMID: 36933631 DOI: 10.1016/j.jbi.2023.104335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/07/2023] [Accepted: 03/09/2023] [Indexed: 03/18/2023]
Abstract
Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.
Collapse
Affiliation(s)
- Ting He
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Anas Belouali
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jessica Patricoski
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Harold Lehmann
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Robert Ball
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Valsamo Anagnostou
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kory Kreimeyer
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Taxiarchis Botsis
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
30
|
James KN, Phadke S, Wong TC, Chowdhury S. Artificial Intelligence in the Genetic Diagnosis of Rare Disease. Clin Lab Med 2023; 43:127-143. [PMID: 36764805 DOI: 10.1016/j.cll.2022.09.023] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Affiliation(s)
- Kiely N James
- Genomics, Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA
| | - Sujal Phadke
- Genomics, Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA
| | - Terence C Wong
- Genomics, Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA
| | - Shimul Chowdhury
- Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA.
| |
Collapse
|
31
|
Wan NC, Yaqoob AA, Ong HH, Zhao J, Wei WQ. Evaluating resources composing the PheMAP knowledge base to enhance high-throughput phenotyping. J Am Med Inform Assoc 2023; 30:456-465. [PMID: 36451277 PMCID: PMC9933070 DOI: 10.1093/jamia/ocac234] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 10/28/2022] [Accepted: 11/23/2022] [Indexed: 12/02/2022] Open
Abstract
OBJECTIVE A previous study, PheMAP, combined independent, online resources to enable high-throughput phenotyping (HTP) using electronic health records (EHRs). However, online resources offer distinct quality descriptions of diseases which may affect phenotyping performance. We aimed to evaluate the phenotyping performance of single resource-based PheMAPs and investigate an optimized strategy for HTP. MATERIALS AND METHODS We compared how each resource produced top-ranked concept unique identifiers (CUIs) by term frequency-inverse document frequency with Jaccard matrices comparing single resources and the original PheMAP. We correlated top-ranked concepts from each resource to features used in established Phenotype KnowledgeBase (PheKB) algorithms for hypothyroidism, type II diabetes mellitus (T2DM), and dementias. Using resources separately, we calculated multiple phenotype risk scores for individuals from Vanderbilt University Medical Center's BioVU DNA Biobank and compared phenotyping performance against rule-based eMERGE algorithms. Lastly, we implemented an ensemble strategy which classified patient case/control status based upon PheMAP resource agreement. RESULTS Jaccard similarity matrices indicate that the similarity of CUIs comprising single resource-based PheMAPs varies. Single resource-based PheMAPs generated from MedlinePlus and MedicineNet outperformed others but only encompass 81.6% of overall disease phenotypes. We propose the PheMAP-Ensemble which provides higher average accuracy and precision than the combined average accuracy and precision of single resource-based PheMAPs. While offering complete phenotype coverage, PheMAP-Ensemble significantly increases phenotyping recall compared to the original iteration. CONCLUSIONS Resources comprising the PheMAP produce different phenotyping performance when implemented individually. The ensemble method significantly improves the quality of PheMAP by fully utilizing dissimilar resources to capture accurate phenotyping data from EHRs.
Collapse
Affiliation(s)
- Nicholas C Wan
- Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, USA
| | - Ali A Yaqoob
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Henry H Ong
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Juan Zhao
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
32
|
Yang S, Varghese P, Stephenson E, Tu K, Gronsbell J. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 2023; 30:367-381. [PMID: 36413056 PMCID: PMC9846699 DOI: 10.1093/jamia/ocac216] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/27/2022] [Accepted: 10/27/2022] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used. MATERIALS AND METHODS We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies. RESULTS Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions. DISCUSSION Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released. CONCLUSION Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.
Collapse
Affiliation(s)
- Siyue Yang
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | | | - Ellen Stephenson
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Karen Tu
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
33
|
Carrell DS, Gruber S, Floyd JS, Bann MA, Cushing-Haugen KL, Johnson RL, Graham V, Cronkite DJ, Hazlehurst BL, Felcher AH, Bejan CA, Kennedy A, Shinde MU, Karami S, Ma Y, Stojanovic D, Zhao Y, Ball R, Nelson JC. Improving Methods of Identifying Anaphylaxis for Medical Product Safety Surveillance Using Natural Language Processing and Machine Learning. Am J Epidemiol 2022; 192:283-295. [PMID: 36331289 PMCID: PMC9896464 DOI: 10.1093/aje/kwac182] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 07/06/2022] [Accepted: 10/11/2022] [Indexed: 11/06/2022] Open
Abstract
We sought to determine whether machine learning and natural language processing (NLP) applied to electronic medical records could improve performance of automated health-care claims-based algorithms to identify anaphylaxis events using data on 516 patients with outpatient, emergency department, or inpatient anaphylaxis diagnosis codes during 2015-2019 in 2 integrated health-care institutions in the Northwest United States. We used one site's manually reviewed gold-standard outcomes data for model development and the other's for external validation based on cross-validated area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), and sensitivity. In the development site 154 (64%) of 239 potential events met adjudication criteria for anaphylaxis compared with 180 (65%) of 277 in the validation site. Logistic regression models using only structured claims data achieved a cross-validated AUC of 0.58 (95% CI: 0.54, 0.63). Machine learning improved cross-validated AUC to 0.62 (0.58, 0.66); incorporating NLP-derived covariates further increased cross-validated AUCs to 0.70 (0.66, 0.75) in development and 0.67 (0.63, 0.71) in external validation data. A classification threshold with cross-validated PPV of 79% and cross-validated sensitivity of 66% in development data had cross-validated PPV of 78% and cross-validated sensitivity of 56% in external data. Machine learning and NLP-derived data improved identification of validated anaphylaxis events.
Collapse
Affiliation(s)
- David S Carrell
- Correspondence to Dr. David Carrell, Kaiser Permanente Washington Health Research Institute, 1730 Minor Avenue, Suite 1600, Seattle, WA 98101 (e-mail: )
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Nelson AE, Arbeeva L. Narrative Review of Machine Learning in Rheumatic and Musculoskeletal Diseases for Clinicians and Researchers: Biases, Goals, and Future Directions. J Rheumatol 2022; 49:1191-1200. [PMID: 35840150 PMCID: PMC9633365 DOI: 10.3899/jrheum.220326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/21/2022] [Indexed: 11/22/2022]
Abstract
There has been rapid growth in the use of artificial intelligence (AI) analytics in medicine in recent years, including in rheumatic and musculoskeletal diseases (RMDs). Such methods represent a challenge to clinicians, patients, and researchers, given the "black box" nature of most algorithms, the unfamiliarity of the terms, and the lack of awareness of potential issues around these analyses. Therefore, this review aims to introduce this subject area in a way that is relevant and meaningful to clinicians and researchers. We hope to provide some insights into relevant strengths and limitations, reporting guidelines, as well as recent examples of such analyses in key areas, with a focus on lessons learned and future directions in diagnosis, phenotyping, prognosis, and precision medicine in RMDs.
Collapse
Affiliation(s)
- Amanda E Nelson
- A.E. Nelson, MD, MSCR, Department of Medicine, Division of Rheumatology, Allergy, and Immunology, University of North Carolina at Chapel Hill;
| | - Liubov Arbeeva
- L. Arbeeva, MS, Thurston Arthritis Research Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
35
|
Hamamoto R, Koyama T, Kouno N, Yasuda T, Yui S, Sudo K, Hirata M, Sunami K, Kubo T, Takasawa K, Takahashi S, Machino H, Kobayashi K, Asada K, Komatsu M, Kaneko S, Yatabe Y, Yamamoto N. Introducing AI to the molecular tumor board: one direction toward the establishment of precision medicine using large-scale cancer clinical and biological information. Exp Hematol Oncol 2022; 11:82. [PMID: 36316731 PMCID: PMC9620610 DOI: 10.1186/s40164-022-00333-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 10/05/2022] [Indexed: 11/10/2022] Open
Abstract
Since U.S. President Barack Obama announced the Precision Medicine Initiative in his New Year's State of the Union address in 2015, the establishment of a precision medicine system has been emphasized worldwide, particularly in the field of oncology. With the advent of next-generation sequencers specifically, genome analysis technology has made remarkable progress, and there are active efforts to apply genome information to diagnosis and treatment. Generally, in the process of feeding back the results of next-generation sequencing analysis to patients, a molecular tumor board (MTB), consisting of experts in clinical oncology, genetic medicine, etc., is established to discuss the results. On the other hand, an MTB currently involves a large amount of work, with humans searching through vast databases and literature, selecting the best drug candidates, and manually confirming the status of available clinical trials. In addition, as personalized medicine advances, the burden on MTB members is expected to increase in the future. Under these circumstances, introducing cutting-edge artificial intelligence (AI) technology and information and communication technology to MTBs while reducing the burden on MTB members and building a platform that enables more accurate and personalized medical care would be of great benefit to patients. In this review, we introduced the latest status of elemental technologies that have potential for AI utilization in MTB, and discussed issues that may arise in the future as we progress with AI implementation.
Collapse
Affiliation(s)
- Ryuji Hamamoto
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Takafumi Koyama
- grid.272242.30000 0001 2168 5385Department of Experimental Therapeutics, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| | - Nobuji Kouno
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.258799.80000 0004 0372 2033Department of Surgery, Graduate School of Medicine, Kyoto University, Yoshida-konoe-cho, Sakyo-ku, Kyoto, 606-8303 Japan
| | - Tomohiro Yasuda
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.417547.40000 0004 1763 9564Research and Development Group, Hitachi, Ltd., 1-280 Higashi-koigakubo, Kokubunji, Tokyo, 185-8601 Japan
| | - Shuntaro Yui
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.417547.40000 0004 1763 9564Research and Development Group, Hitachi, Ltd., 1-280 Higashi-koigakubo, Kokubunji, Tokyo, 185-8601 Japan
| | - Kazuki Sudo
- grid.272242.30000 0001 2168 5385Department of Experimental Therapeutics, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.272242.30000 0001 2168 5385Department of Medical Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| | - Makoto Hirata
- grid.272242.30000 0001 2168 5385Department of Genetic Medicine and Services, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| | - Kuniko Sunami
- grid.272242.30000 0001 2168 5385Department of Laboratory Medicine, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| | - Takashi Kubo
- grid.272242.30000 0001 2168 5385Department of Laboratory Medicine, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| | - Ken Takasawa
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Satoshi Takahashi
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Hidenori Machino
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Kazuma Kobayashi
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Ken Asada
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Masaaki Komatsu
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Syuzo Kaneko
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Yasushi Yatabe
- grid.272242.30000 0001 2168 5385Department of Diagnostic Pathology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.272242.30000 0001 2168 5385Division of Molecular Pathology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| | - Noboru Yamamoto
- grid.272242.30000 0001 2168 5385Department of Experimental Therapeutics, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| |
Collapse
|
36
|
Nogues IE, Wen J, Lin Y, Liu M, Tedeschi SK, Geva A, Cai T, Hong C. Weakly Semi-supervised phenotyping using Electronic Health records. J Biomed Inform 2022; 134:104175. [PMID: 36064111 PMCID: PMC10112494 DOI: 10.1016/j.jbi.2022.104175] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 04/23/2022] [Accepted: 08/15/2022] [Indexed: 01/07/2023]
Abstract
OBJECTIVE Electronic Health Record (EHR) based phenotyping is a crucial yet challenging problem in the biomedical field. Though clinicians typically determine patient-level diagnoses via manual chart review, the sheer volume and heterogeneity of EHR data renders such tasks challenging, time-consuming, and prohibitively expensive, thus leading to a scarcity of clinical annotations in EHRs. Weakly supervised learning algorithms have been successfully applied to various EHR phenotyping problems, due to their ability to leverage information from large quantities of unlabeled samples to better inform predictions based on a far smaller number of patients. However, most weakly supervised methods are subject to the challenge to choose the right cutoff value to generate an optimal classifier. Furthermore, since they only utilize the most informative features (i.e., main ICD and NLP counts) they may fail for episodic phenotypes that cannot be consistently detected via ICD and NLP data. In this paper, we propose a label-efficient, weakly semi-supervised deep learning algorithm for EHR phenotyping (WSS-DL), which overcomes the limitations above. MATERIALS AND METHODS WSS-DL classifies patient-level disease status through a series of learning stages: 1) generating silver standard labels, 2) deriving enhanced-silver-standard labels by fitting a weakly supervised deep learning model to data with silver standard labels as outcomes and high dimensional EHR features as input, and 3) obtaining the final prediction score and classifier by fitting a supervised learning model to data with a minimal number of gold standard labels as the outcome, and the enhanced-silver-standard labels and a minimal set of most informative EHR features as input. To assess the generalizability of WSS-DL across different phenotypes and medical institutions, we apply WSS-DL to classify a total of 17 diseases, including both acute and chronic conditions, using EHR data from three healthcare systems. Additionally, we determine the minimum quantity of training labels required by WSS-DL to outperform existing supervised and semi-supervised phenotyping methods. RESULTS The proposed method, in combining the strengths of deep learning and weakly semi-supervised learning, successfully leverages the crucial phenotyping information contained in EHR features from unlabeled samples. Indeed, the deep learning model's ability to handle high-dimensional EHR features allows it to generate strong phenotype status predictions from silver standard labels. These predictions, in turn, provide highly effective features in the final logistic regression stage, leading to high phenotyping accuracy in notably small subsets of labeled data (e.g. n = 40 labeled samples). CONCLUSION Our method's high performance in EHR datasets with very small numbers of labels indicates its potential value in aiding doctors to diagnose rare diseases as well as conditions susceptible to misdiagnosis.
Collapse
Affiliation(s)
| | - Jun Wen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yucong Lin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Center for Statistical Science, Tsinghua University, Beijing, China
| | - Molei Liu
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Sara K Tedeschi
- Department of Medicine, Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Alon Geva
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Department of Anesthesiology, Critical Care, and Pain Medicine, and Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA; Department of Anesthesia, Harvard Medical School, Boston, MA, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Chuan Hong
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
37
|
Ferolito B, do Valle IF, Gerlovin H, Costa L, Casas JP, Gaziano JM, Gagnon DR, Begoli E, Barabási AL, Cho K. Visualizing novel connections and genetic similarities across diseases using a network-medicine based approach. Sci Rep 2022; 12:14914. [PMID: 36050444 PMCID: PMC9436158 DOI: 10.1038/s41598-022-19244-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 08/26/2022] [Indexed: 11/08/2022] Open
Abstract
Understanding the genetic relationships between human disorders could lead to better treatment and prevention strategies, especially for individuals with multiple comorbidities. A common resource for studying genetic-disease relationships is the GWAS Catalog, a large and well curated repository of SNP-trait associations from various studies and populations. Some of these populations are contained within mega-biobanks such as the Million Veteran Program (MVP), which has enabled the genetic classification of several diseases in a large well-characterized and heterogeneous population. Here we aim to provide a network of the genetic relationships among diseases and to demonstrate the utility of quantifying the extent to which a given resource such as MVP has contributed to the discovery of such relations. We use a network-based approach to evaluate shared variants among thousands of traits in the GWAS Catalog repository. Our results indicate many more novel disease relationships that did not exist in early studies and demonstrate that the network can reveal clusters of diseases mechanistically related. Finally, we show novel disease connections that emerge when MVP data is included, highlighting methodology that can be used to indicate the contributions of a given biobank.
Collapse
Affiliation(s)
- Brian Ferolito
- VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA.
| | - Italo Faria do Valle
- VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA
- Center for Complex Network Research, Department of Physics, Northeastern University, Boston, 02115, USA
| | - Hanna Gerlovin
- VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA
| | - Lauren Costa
- VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA
| | - Juan P Casas
- VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA
- Brigham and Women's Hospital, Division of Aging, Department of Medicine, Harvard Medical School, Boston, 02115, USA
| | - J Michael Gaziano
- VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA
- Brigham and Women's Hospital, Division of Aging, Department of Medicine, Harvard Medical School, Boston, 02115, USA
| | - David R Gagnon
- VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA
- School of Public Health, Department of Biostatistics, Boston University, Boston, 02215, USA
| | - Edmon Begoli
- Oak Ridge National Laboratory, Oak Ridge, 37830, USA
| | - Albert-László Barabási
- Center for Complex Network Research, Department of Physics, Northeastern University, Boston, 02115, USA
| | - Kelly Cho
- VA Boston Healthcare System, Massachusetts Veterans Epidemiology and Research Information Center, (MAVERIC), 150 S. Huntington Avenue, Boston, 02130, USA
- Brigham and Women's Hospital, Division of Aging, Department of Medicine, Harvard Medical School, Boston, 02115, USA
| |
Collapse
|
38
|
Noori A, Magdamo C, Liu X, Tyagi T, Li Z, Kondepudi A, Alabsi H, Rudmann E, Wilcox D, Brenner L, Robbins GK, Moura L, Zafar S, Benson NM, Hsu J, R Dickson J, Serrano-Pozo A, Hyman BT, Blacker D, Westover MB, Mukerji SS, Das S. Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phenotyping of Cognitive Status in Electronic Health Records: Diagnostic Study. J Med Internet Res 2022; 24:e40384. [PMID: 36040790 PMCID: PMC9472045 DOI: 10.2196/40384] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 07/29/2022] [Accepted: 07/31/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Electronic health records (EHRs) with large sample sizes and rich information offer great potential for dementia research, but current methods of phenotyping cognitive status are not scalable. OBJECTIVE The aim of this study was to evaluate whether natural language processing (NLP)-powered semiautomated annotation can improve the speed and interrater reliability of chart reviews for phenotyping cognitive status. METHODS In this diagnostic study, we developed and evaluated a semiautomated NLP-powered annotation tool (NAT) to facilitate phenotyping of cognitive status. Clinical experts adjudicated the cognitive status of 627 patients at Mass General Brigham (MGB) health care, using NAT or traditional chart reviews. Patient charts contained EHR data from two data sets: (1) records from January 1, 2017, to December 31, 2018, for 100 Medicare beneficiaries from the MGB Accountable Care Organization and (2) records from 2 years prior to COVID-19 diagnosis to the date of COVID-19 diagnosis for 527 MGB patients. All EHR data from the relevant period were extracted; diagnosis codes, medications, and laboratory test values were processed and summarized; clinical notes were processed through an NLP pipeline; and a web tool was developed to present an integrated view of all data. Cognitive status was rated as cognitively normal, cognitively impaired, or undetermined. Assessment time and interrater agreement of NAT compared to manual chart reviews for cognitive status phenotyping was evaluated. RESULTS NAT adjudication provided higher interrater agreement (Cohen κ=0.89 vs κ=0.80) and significant speed up (time difference mean 1.4, SD 1.3 minutes; P<.001; ratio median 2.2, min-max 0.4-20) over manual chart reviews. There was moderate agreement with manual chart reviews (Cohen κ=0.67). In the cases that exhibited disagreement with manual chart reviews, NAT adjudication was able to produce assessments that had broader clinical consensus due to its integrated view of highlighted relevant information and semiautomated NLP features. CONCLUSIONS NAT adjudication improves the speed and interrater reliability for phenotyping cognitive status compared to manual chart reviews. This study underscores the potential of an NLP-based clinically adjudicated method to build large-scale dementia research cohorts from EHRs.
Collapse
Affiliation(s)
- Ayush Noori
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Colin Magdamo
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Xiao Liu
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Tanish Tyagi
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Zhaozhi Li
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Akhil Kondepudi
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Haitham Alabsi
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Emily Rudmann
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Vaccine and Immunotherapy Center, Division of Infectious Disease, Boston, MA, United States
| | - Douglas Wilcox
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Laura Brenner
- Harvard Medical School, Boston, MA, United States
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Gregory K Robbins
- Harvard Medical School, Boston, MA, United States
- Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
| | - Lidia Moura
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Sahar Zafar
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Nicole M Benson
- Harvard Medical School, Boston, MA, United States
- Mongan Institute, Massachusetts General Hospital, Boston, MA, United States
- McLean Hospital, Belmont, MA, United States
| | - John Hsu
- Harvard Medical School, Boston, MA, United States
- Mongan Institute, Massachusetts General Hospital, Boston, MA, United States
| | - John R Dickson
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Alberto Serrano-Pozo
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Bradley T Hyman
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Deborah Blacker
- Harvard Medical School, Boston, MA, United States
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States
| | - M Brandon Westover
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Shibani S Mukerji
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
| | - Sudeshna Das
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| |
Collapse
|
39
|
Brandt PS, Pacheco JA, Adekkanattu P, Sholle ET, Abedian S, Stone DJ, Knaack DM, Xu J, Xu Z, Peng Y, Benda NC, Wang F, Luo Y, Jiang G, Pathak J, Rasmussen LV. Design and validation of a FHIR-based EHR-driven phenotyping toolbox. J Am Med Inform Assoc 2022; 29:1449-1460. [PMID: 35799370 PMCID: PMC9382394 DOI: 10.1093/jamia/ocac063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 04/04/2022] [Accepted: 06/17/2022] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVES To develop and validate a standards-based phenotyping tool to author electronic health record (EHR)-based phenotype definitions and demonstrate execution of the definitions against heterogeneous clinical research data platforms. MATERIALS AND METHODS We developed an open-source, standards-compliant phenotyping tool known as the PhEMA Workbench that enables a phenotype representation using the Fast Healthcare Interoperability Resources (FHIR) and Clinical Quality Language (CQL) standards. We then demonstrated how this tool can be used to conduct EHR-based phenotyping, including phenotype authoring, execution, and validation. We validated the performance of the tool by executing a thrombotic event phenotype definition at 3 sites, Mayo Clinic (MC), Northwestern Medicine (NM), and Weill Cornell Medicine (WCM), and used manual review to determine precision and recall. RESULTS An initial version of the PhEMA Workbench has been released, which supports phenotype authoring, execution, and publishing to a shared phenotype definition repository. The resulting thrombotic event phenotype definition consisted of 11 CQL statements, and 24 value sets containing a total of 834 codes. Technical validation showed satisfactory performance (both NM and MC had 100% precision and recall and WCM had a precision of 95% and a recall of 84%). CONCLUSIONS We demonstrate that the PhEMA Workbench can facilitate EHR-driven phenotype definition, execution, and phenotype sharing in heterogeneous clinical research data environments. A phenotype definition that integrates with existing standards-compliant systems, and the use of a formal representation facilitates automation and can decrease potential for human error.
Collapse
Affiliation(s)
- Pascal S Brandt
- Corresponding Author: Pascal S. Brandt, Department of Biomedical Informatics & Medical Education, University of Washington, Box 358047, Seattle, WA 98195, USA;
| | - Jennifer A Pacheco
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Prakash Adekkanattu
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Evan T Sholle
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Sajjad Abedian
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Daniel J Stone
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - David M Knaack
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Jie Xu
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Zhenxing Xu
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Yifan Peng
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Natalie C Benda
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Fei Wang
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Guoqian Jiang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Jyotishman Pathak
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| |
Collapse
|
40
|
Krantz MS, Kerchberger VE, Wei WQ. Novel Analysis Methods to Mine Immune-Mediated Phenotypes and Find Genetic Variation Within the Electronic Health Record (Roadmap for Phenotype to Genotype: Immunogenomics). THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2022; 10:1757-1762. [PMID: 35487368 PMCID: PMC9624141 DOI: 10.1016/j.jaip.2022.04.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 04/13/2022] [Accepted: 04/18/2022] [Indexed: 06/14/2023]
Abstract
The field of immunogenomics has the opportunity for accelerated genetic discovery aided by the maturation of electronic health records (EHRs) linked to DNA biobanks. Novel analysis methods in deep phenotyping of EHR data allow the full realization of the paired and increasingly dense genetic/phenotypic information available. This enables researchers to uncover genetic risk factors for the prevention and optimal treatment of immune-mediated diseases and immune-mediated adverse drug reactions. This article reviews the background of EHRs linked to DNA biobanks, potential applications to immunogenomic discovery, and current and emerging techniques in EHR-based deep phenotyping.
Collapse
Affiliation(s)
- Matthew S Krantz
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tenn.
| | - V Eric Kerchberger
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tenn; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tenn
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tenn
| |
Collapse
|
41
|
Liang L, Hou J, Uno H, Cho K, Ma Y, Cai T. Semi-supervised approach to event time annotation using longitudinal electronic health records. LIFETIME DATA ANALYSIS 2022; 28:428-491. [PMID: 35753014 PMCID: PMC10044535 DOI: 10.1007/s10985-022-09557-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 05/13/2022] [Indexed: 06/15/2023]
Abstract
Large clinical datasets derived from insurance claims and electronic health record (EHR) systems are valuable sources for precision medicine research. These datasets can be used to develop models for personalized prediction of risk or treatment response. Efficiently deriving prediction models using real world data, however, faces practical and methodological challenges. Precise information on important clinical outcomes such as time to cancer progression are not readily available in these databases. The true clinical event times typically cannot be approximated well based on simple extracts of billing or procedure codes. Whereas, annotating event times manually is time and resource prohibitive. In this paper, we propose a two-step semi-supervised multi-modal automated time annotation (MATA) method leveraging multi-dimensional longitudinal EHR encounter records. In step I, we employ a functional principal component analysis approach to estimate the underlying intensity functions based on observed point processes from the unlabeled patients. In step II, we fit a penalized proportional odds model to the event time outcomes with features derived in step I in the labeled data where the non-parametric baseline function is approximated using B-splines. Under regularity conditions, the resulting estimator of the feature effect vector is shown as root-n consistent. We demonstrate the superiority of our approach relative to existing approaches through simulations and a real data example on annotating lung cancer recurrence in an EHR cohort of lung cancer patients from Veteran Health Administration.
Collapse
Affiliation(s)
- Liang Liang
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Jue Hou
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Hajime Uno
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Kelly Cho
- Massachusetts Veterans Epidemiology Research and Information Center, US Department of Veteran Affairs, Boston, MA, USA
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Yanyuan Ma
- Department of Statistics, Penn State University, University Park, PA, Boston, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
42
|
Ge T, Irvin MR, Patki A, Srinivasasainagendra V, Lin YF, Tiwari HK, Armstrong ND, Benoit B, Chen CY, Choi KW, Cimino JJ, Davis BH, Dikilitas O, Etheridge B, Feng YCA, Gainer V, Huang H, Jarvik GP, Kachulis C, Kenny EE, Khan A, Kiryluk K, Kottyan L, Kullo IJ, Lange C, Lennon N, Leong A, Malolepsza E, Miles AD, Murphy S, Namjou B, Narayan R, O'Connor MJ, Pacheco JA, Perez E, Rasmussen-Torvik LJ, Rosenthal EA, Schaid D, Stamou M, Udler MS, Wei WQ, Weiss ST, Ng MCY, Smoller JW, Lebo MS, Meigs JB, Limdi NA, Karlson EW. Development and validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations. Genome Med 2022; 14:70. [PMID: 35765100 PMCID: PMC9241245 DOI: 10.1186/s13073-022-01074-2] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 06/16/2022] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Type 2 diabetes (T2D) is a worldwide scourge caused by both genetic and environmental risk factors that disproportionately afflicts communities of color. Leveraging existing large-scale genome-wide association studies (GWAS), polygenic risk scores (PRS) have shown promise to complement established clinical risk factors and intervention paradigms, and improve early diagnosis and prevention of T2D. However, to date, T2D PRS have been most widely developed and validated in individuals of European descent. Comprehensive assessment of T2D PRS in non-European populations is critical for equitable deployment of PRS to clinical practice that benefits global populations. METHODS We integrated T2D GWAS in European, African, and East Asian populations to construct a trans-ancestry T2D PRS using a newly developed Bayesian polygenic modeling method, and assessed the prediction accuracy of the PRS in the multi-ethnic Electronic Medical Records and Genomics (eMERGE) study (11,945 cases; 57,694 controls), four Black cohorts (5137 cases; 9657 controls), and the Taiwan Biobank (4570 cases; 84,996 controls). We additionally evaluated a post hoc ancestry adjustment method that can express the polygenic risk on the same scale across ancestrally diverse individuals and facilitate the clinical implementation of the PRS in prospective cohorts. RESULTS The trans-ancestry PRS was significantly associated with T2D status across the ancestral groups examined. The top 2% of the PRS distribution can identify individuals with an approximately 2.5-4.5-fold of increase in T2D risk, which corresponds to the increased risk of T2D for first-degree relatives. The post hoc ancestry adjustment method eliminated major distributional differences in the PRS across ancestries without compromising its predictive performance. CONCLUSIONS By integrating T2D GWAS from multiple populations, we developed and validated a trans-ancestry PRS, and demonstrated its potential as a meaningful index of risk among diverse patients in clinical settings. Our efforts represent the first step towards the implementation of the T2D PRS into routine healthcare.
Collapse
Affiliation(s)
- Tian Ge
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA.
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Marguerite R Irvin
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Amit Patki
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Vinodh Srinivasasainagendra
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Yen-Feng Lin
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan
- Department of Public Health & Medical Humanities, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
- Institute of Behavioral Medicine, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| | - Hemant K Tiwari
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Nicole D Armstrong
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Barbara Benoit
- Mass General Brigham Research Information Science & Computing, Boston, MA, USA
| | - Chia-Yen Chen
- Translational Biology, Biogen Inc., Cambridge, MA, USA
| | - Karmel W Choi
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - James J Cimino
- Informatics Institute, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Brittney H Davis
- Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ozan Dikilitas
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
- Department of Internal Medicine, Mayo Clinician-Investigator Training Program, Mayo Clinic, Rochester, MN, USA
| | - Bethany Etheridge
- Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Yen-Chen Anne Feng
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Vivian Gainer
- Mass General Brigham Research Information Science & Computing, Boston, MA, USA
| | - Hailiang Huang
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Gail P Jarvik
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | | | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Atlas Khan
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, USA
| | - Krzysztof Kiryluk
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, USA
| | - Leah Kottyan
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Christoph Lange
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Niall Lennon
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Aaron Leong
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, USA
- Diabetes Unit, Massachusetts General Hospital, Boston, MA, USA
| | | | - Ayme D Miles
- Informatics Institute, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Shawn Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | - Bahram Namjou
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Renuka Narayan
- Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | | | - Jennifer A Pacheco
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Emma Perez
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Mass General Brigham Personalized Medicine, Boston, MA, USA
| | - Laura J Rasmussen-Torvik
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Elisabeth A Rosenthal
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Daniel Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Maria Stamou
- Division of Endocrinology, Massachusetts General Hospital, Boston, MA, USA
| | - Miriam S Udler
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Maggie C Y Ng
- Vanderbilt Genetics Institute, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jordan W Smoller
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matthew S Lebo
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Mass General Brigham Personalized Medicine, Boston, MA, USA
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
| | - James B Meigs
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Nita A Limdi
- Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Elizabeth W Karlson
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Mass General Brigham Personalized Medicine, Boston, MA, USA
| |
Collapse
|
43
|
Ghosh D, Mastej E, Jain R, Choi YS. Causal Inference in Radiomics: Framework, Mechanisms, and Algorithms. Front Neurosci 2022; 16:884708. [PMID: 35812228 PMCID: PMC9261933 DOI: 10.3389/fnins.2022.884708] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Accepted: 05/20/2022] [Indexed: 12/30/2022] Open
Abstract
The widespread use of machine learning algorithms in radiomics has led to a proliferation of flexible prognostic models for clinical outcomes. However, a limitation of these techniques is their black-box nature, which prevents the ability for increased mechanistic phenomenological understanding. In this article, we develop an inferential framework for estimating causal effects with radiomics data. A new challenge is that the exposure of interest is latent so that new estimation procedures are needed. We leverage a multivariate version of partial least squares for causal effect estimation. The methodology is illustrated with applications to two radiomics datasets, one in osteosarcoma and one in glioblastoma.
Collapse
Affiliation(s)
- Debashis Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, United States
- *Correspondence: Debashis Ghosh
| | - Emily Mastej
- Computational Biosciences Program, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Rajan Jain
- Department of Radiology and Neurosurgery, New York University Langone Medical Center, New York, NY, United States
| | - Yoon Seong Choi
- Department of Radiology, Yonsei University College of Medicine, Seoul, South Korea
| |
Collapse
|
44
|
Link NB, Huang S, Cai T, Sun J, Dahal K, Costa L, Cho K, Liao K, Cai T, Hong C. Binary acronym disambiguation in clinical notes from electronic health records with an application in computational phenotyping. Int J Med Inform 2022; 162:104753. [PMID: 35405530 DOI: 10.1016/j.ijmedinf.2022.104753] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 03/11/2022] [Accepted: 03/27/2022] [Indexed: 01/05/2023]
Abstract
OBJECTIVE The use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce a semi-supervised method for binary acronym disambiguation, the task of classifying a target sense for acronyms in the clinical EHR notes. METHODS We developed a semi-supervised ensemble machine learning (CASEml) algorithm to automatically identify when an acronym means a target sense by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard semi-supervised method and a baseline metric selecting the most frequent acronym sense. Along with evaluating the performance of these methods for specific instances of acronyms, we evaluated the impact of acronym disambiguation on NLP-driven phenotyping of rheumatoid arthritis. RESULTS CASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art semi-supervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis. CONCLUSION CASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and semi-supervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.
Collapse
Affiliation(s)
- Nicholas B Link
- VA Boston Healthcare System, Boston, MA, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.
| | - Sicong Huang
- VA Boston Healthcare System, Boston, MA, United States; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, United States
| | - Tianrun Cai
- VA Boston Healthcare System, Boston, MA, United States; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, United States
| | - Jiehuan Sun
- VA Boston Healthcare System, Boston, MA, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Kumar Dahal
- VA Boston Healthcare System, Boston, MA, United States; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, United States
| | - Lauren Costa
- VA Boston Healthcare System, Boston, MA, United States
| | - Kelly Cho
- VA Boston Healthcare System, Boston, MA, United States
| | - Katherine Liao
- VA Boston Healthcare System, Boston, MA, United States; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, United States
| | - Tianxi Cai
- VA Boston Healthcare System, Boston, MA, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Chuan Hong
- VA Boston Healthcare System, Boston, MA, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| |
Collapse
|
45
|
Chan JTH, Liew DFL, Stojanova J, McMaster C. Better Pharmacovigilance Through Artificial Intelligence: What Is Needed To Make This A Reality? HEALTH POLICY AND TECHNOLOGY 2022. [DOI: 10.1016/j.hlpt.2022.100638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|
46
|
Wang DD, Li Y, Nguyen XMT, Song RJ, Ho YL, Hu FB, Willett WC, Wilson PWF, Cho K, Gaziano JM, Djoussé L. Dietary Sodium and Potassium Intake and Risk of Non-Fatal Cardiovascular Diseases: The Million Veteran Program. Nutrients 2022; 14:nu14051121. [PMID: 35268096 PMCID: PMC8912456 DOI: 10.3390/nu14051121] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 03/02/2022] [Accepted: 03/03/2022] [Indexed: 11/16/2022] Open
Abstract
Objective: To examine the association between intakes of sodium and potassium and the ratio of sodium to potassium and incident myocardial infarction and stroke. Design, Setting and Participants: Prospective cohort study of 180,156 Veterans aged 19 to 107 years with plausible dietary intake measured by food frequency questionnaire (FFQ) who were free of cardiovascular disease (CVD) and cancer at baseline in the VA Million Veteran Program (MVP). Main outcome measures: CVD defined as non-fatal myocardial infarction (MI) or acute ischemic stroke (AIS) ascertained using high-throughput phenotyping algorithms applied to electronic health records. Results: During up to 8 years of follow-up, we documented 4090 CVD cases (2499 MI and 1712 AIS). After adjustment for confounding factors, a higher sodium intake was associated with a higher risk of CVD, whereas potassium intake was inversely associated with the risk of CVD [hazard ratio (HR) comparing extreme quintiles, 95% confidence interval (CI): 1.09 (95% CI: 0.99−1.21, p trend = 0.01) for sodium and 0.87 (95% CI: 0.79−0.96, p trend = 0.005) for potassium]. In addition, the ratio of sodium to potassium (Na/K ratio) was positively associated with the risk of CVD (HR comparing extreme quintiles = 1.26, 95% CI: 1.14−1.39, p trend < 0.0001). The associations of Na/K ratio were consistent for two subtypes of CVD; one standard deviation increment in the ratio was associated with HRs (95% CI) of 1.12 (1.06−1.19) for MI and 1.11 (1.03−1.19) for AIS. In secondary analyses, the observed associations were consistent across race and status for diabetes, hypertension, and high cholesterol at baseline. Associations appeared to be more pronounced among participants with poor dietary quality. Conclusions: A high sodium intake and a low potassium intake were associated with a higher risk of CVD in this large population of US veterans.
Collapse
Affiliation(s)
- Dong D Wang
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA
- The Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
- Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Yanping Li
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA
- Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Xuan-Mai T Nguyen
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA
- Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Rebecca J Song
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA 02115, USA
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA
| | - Frank B Hu
- The Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
- Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Walter C Willett
- The Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
- Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Peter W F Wilson
- Atlanta VA Medical Center, Atlanta, GA 30033, USA
- Emory Clinical Cardiovascular Research Institute, Atlanta, GA 30033, USA
| | - Kelly Cho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA
- Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - J Michael Gaziano
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA
- Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Luc Djoussé
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA 02111, USA
- Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
- Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
47
|
Cade BE, Hassan SM, Dashti HS, Kiernan M, Pavlova MK, Redline S, Karlson EW. Sleep apnea phenotyping and relationship to disease in a large clinical biobank. JAMIA Open 2022; 5:ooab117. [PMID: 35156000 PMCID: PMC8826997 DOI: 10.1093/jamiaopen/ooab117] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 12/08/2021] [Accepted: 12/28/2021] [Indexed: 11/14/2022] Open
Abstract
Objective Sleep apnea is associated with a broad range of pathophysiology. While electronic health record (EHR) information has the potential for revealing relationships between sleep apnea and associated risk factors and outcomes, practical challenges hinder its use. Our objectives were to develop a sleep apnea phenotyping algorithm that improves the precision of EHR case/control information using natural language processing (NLP); identify novel associations between sleep apnea and comorbidities in a large clinical biobank; and investigate the relationship between polysomnography statistics and comorbid disease using NLP phenotyping. Materials and Methods We performed clinical chart reviews on 300 participants putatively diagnosed with sleep apnea and applied International Classification of Sleep Disorders criteria to classify true cases and noncases. We evaluated 2 NLP and diagnosis code-only methods for their abilities to maximize phenotyping precision. The lead algorithm was used to identify incident and cross-sectional associations between sleep apnea and common comorbidities using 4876 NLP-defined sleep apnea cases and 3× matched controls. Results The optimal NLP phenotyping strategy had improved model precision (≥0.943) compared to the use of one diagnosis code (≤0.733). Of the tested diseases, 170 disorders had significant incidence odds ratios (ORs) between cases and controls, 8 of which were confirmed using polysomnography (n = 4544), and 281 disorders had significant prevalence OR between sleep apnea cases versus controls, 41 of which were confirmed using polysomnography data. Discussion and Conclusion An NLP-informed algorithm can improve the accuracy of case-control sleep apnea ascertainment and thus improve the performance of phenome-wide, genetic, and other EHR analyses of a highly prevalent disorder. Sleep apnea is a common disease in which breathing partially or completely pauses during sleep, leading to less oxygen in the blood, repeated awakenings, and increased risk of developing multiple diseases. Current studies of sleep apnea often have relatively few participants due to the challenge of performing overnight sleep recordings. Electronic health record (EHR) billing code diagnoses of sleep apnea could be repurposed to increase the size of research studies, but the accuracy of the diagnoses is reduced. We developed a reusable algorithm that improves the accuracy of EHR sleep apnea diagnoses using natural language processing to extract information from clinical notes. As a proof of concept, we used the algorithm to identify hundreds of diseases that are increased among participants with sleep apnea compared to similar patients without sleep apnea. Many of these disease relationships with sleep apnea have not been previously recognized. This improved algorithm will help to accelerate future large-scale investigations of the causes and consequences of sleep apnea.
Collapse
Affiliation(s)
- Brian E Cade
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, Massachusetts, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA
| | - Syed Moin Hassan
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, Massachusetts, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, Massachusetts, USA
- Division of Pulmonary Disease and Critical Care Medicine, University of Vermont, Burlington, Vermont, USA
| | - Hassan S Dashti
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
- Department of Anesthesia, Pain, and Critical Care Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Melissa Kiernan
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, Massachusetts, USA
- NeuroCare Center for Sleep, Newton, Massachusetts, USA
| | - Milena K Pavlova
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, Massachusetts, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, Massachusetts, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, Massachusetts, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, Massachusetts, USA
- Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
| | - Elizabeth W Karlson
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
- Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Boston, Massachusetts, USA
| |
Collapse
|
48
|
Zhang Y, Liu M, Neykov M, Cai T. Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2022; 23:83. [PMID: 37974910 PMCID: PMC10653017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
Electronic Health Record (EHR) data, a rich source for biomedical research, have been successfully used to gain novel insight into a wide range of diseases. Despite its potential, EHR is currently underutilized for discovery research due to its major limitation in the lack of precise phenotype information. To overcome such difficulties, recent efforts have been devoted to developing supervised algorithms to accurately predict phenotypes based on relatively small training datasets with gold standard labels extracted via chart review. However, supervised methods typically require a sizable training set to yield generalizable algorithms, especially when the number of candidate features, p , is large. In this paper, we propose a semi-supervised (SS) EHR phenotyping method that borrows information from both a small, labeled dataset (where both the label Y and the feature set X are observed) and a much larger, weakly-labeled dataset in which the feature set X is accompanied only by a surrogate label S that is available to all patients. Under a working prior assumption that S is related to X only through Y and allowing it to hold approximately, we propose a prior adaptive semi-supervised (PASS) estimator that incorporates the prior knowledge by shrinking the estimator towards a direction derived under the prior. We derive asymptotic theory for the proposed estimator and justify its efficiency and robustness to prior information of poor quality. We also demonstrate its superiority over existing estimators under various scenarios via simulation studies and on three real-world EHR phenotyping studies at a large tertiary hospital.
Collapse
Affiliation(s)
- Yichi Zhang
- Department of Computer Science and Statistics, University of Rhode Island
| | - Molei Liu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| | - Matey Neykov
- Department of Statistics and Data Science, Carnegie Mellon University
| | - Tianxi Cai
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| |
Collapse
|
49
|
Artificial Intelligence in Clinical Immunology. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_83] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
50
|
Liang L, Kim N, Hou J, Cai T, Dahal K, Lin C, Finan S, Savovoa G, Rosso M, Polgar-Tucsanyi M, Weiner H, Chitnis T, Cai T, Xia Z. Temporal trends of multiple sclerosis disease activity: Electronic health records indicators. Mult Scler Relat Disord 2022; 57:103333. [PMID: 35158446 PMCID: PMC8849591 DOI: 10.1016/j.msard.2021.103333] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 10/03/2021] [Accepted: 10/14/2021] [Indexed: 01/03/2023]
Abstract
BACKGROUND Long-term data on multiple sclerosis (MS) inflammatory disease activity are limited. We examined electronic health records (EHR) indicators of disease activity in people with MS. METHODS We analyzed prospectively collected research registry data and linked EHR data in a clinic-based cohort from 2000 to 2016. We used the trend of the yearly incident relapse rate from the registry data as benchmark. We then calculated the temporal trends of potentially relevant EHR measures, including mean count of the MS diagnostic code, mentions of MS-related concepts, MS-related health utilizations and selected prescriptions. RESULTS 1,555 MS patients had both registry and EHR data. Between 2000 and 2016, the registry data showed a declining trend in the yearly incident relapse rate, parallel to an increasing trend of DMT usage. Among the EHR measures, covariate-adjusted frequency of diagnostic code of MS, procedure codes of MS-related imaging studies and emergency room visits, and electronic prescription for steroids declined over time, mirroring the temporal trend of the benchmark yearly incident relapse rate. CONCLUSION This study highlights EHR indicators of MS relapse that could enable large-scale examination of long-term disease activities or inform individual patient monitoring in clinical settings where EHR data are available.
Collapse
Affiliation(s)
- Liang Liang
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Nicole Kim
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Jue Hou
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Tianrun Cai
- Division of Rheumatology, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Kumar Dahal
- Division of Rheumatology, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Chen Lin
- Clinical Natural Language Processing Program, Boston Children’s Hospital, Boston, MA, USA
| | - Sean Finan
- Clinical Natural Language Processing Program, Boston Children’s Hospital, Boston, MA, USA
| | - Guergana Savovoa
- Clinical Natural Language Processing Program, Boston Children’s Hospital, Boston, MA, USA
| | - Mattia Rosso
- Department of Neurology, Brigham and Women’s Hospital, Boston, MA, USA
| | | | - Howard Weiner
- Department of Neurology, Brigham and Women’s Hospital, Boston, MA, USA
| | - Tanuja Chitnis
- Department of Neurology, Brigham and Women’s Hospital, Boston, MA, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Zongqi Xia
- Department of Neurology and Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|