1
|
Grabowska ME, Van Driest SL, Robinson JR, Patrick AE, Guardo C, Gangireddy S, Ong HH, Feng Q, Carroll R, Kannankeril PJ, Wei WQ. Developing and evaluating pediatric phecodes (Peds-Phecodes) for high-throughput phenotyping using electronic health records. J Am Med Inform Assoc 2024; 31:386-395. [PMID: 38041473 PMCID: PMC10797257 DOI: 10.1093/jamia/ocad233] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 10/04/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023] Open
Abstract
OBJECTIVE Pediatric patients have different diseases and outcomes than adults; however, existing phecodes do not capture the distinctive pediatric spectrum of disease. We aim to develop specialized pediatric phecodes (Peds-Phecodes) to enable efficient, large-scale phenotypic analyses of pediatric patients. MATERIALS AND METHODS We adopted a hybrid data- and knowledge-driven approach leveraging electronic health records (EHRs) and genetic data from Vanderbilt University Medical Center to modify the most recent version of phecodes to better capture pediatric phenotypes. First, we compared the prevalence of patient diagnoses in pediatric and adult populations to identify disease phenotypes differentially affecting children and adults. We then used clinical domain knowledge to remove phecodes representing phenotypes unlikely to affect pediatric patients and create new phecodes for phenotypes relevant to the pediatric population. We further compared phenome-wide association study (PheWAS) outcomes replicating known pediatric genotype-phenotype associations between Peds-Phecodes and phecodes. RESULTS The Peds-Phecodes aggregate 15 533 ICD-9-CM codes and 82 949 ICD-10-CM codes into 2051 distinct phecodes. Peds-Phecodes replicated more known pediatric genotype-phenotype associations than phecodes (248 vs 192 out of 687 SNPs, P < .001). DISCUSSION We introduce Peds-Phecodes, a high-throughput EHR phenotyping tool tailored for use in pediatric populations. We successfully validated the Peds-Phecodes using genetic replication studies. Our findings also reveal the potential use of Peds-Phecodes in detecting novel genotype-phenotype associations for pediatric conditions. We expect that Peds-Phecodes will facilitate large-scale phenomic and genomic analyses in pediatric populations. CONCLUSION Peds-Phecodes capture higher-quality pediatric phenotypes and deliver superior PheWAS outcomes compared to phecodes.
Collapse
Affiliation(s)
- Monika E Grabowska
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Sara L Van Driest
- Department of Pediatrics and the Center for Pediatric Precision Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Jamie R Robinson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Pediatric Surgery, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Anna E Patrick
- Department of Pediatrics and the Center for Pediatric Precision Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Chris Guardo
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Srushti Gangireddy
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Henry H Ong
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - QiPing Feng
- Department of Medicine, Division of Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Robert Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Prince J Kannankeril
- Department of Pediatrics and the Center for Pediatric Precision Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| |
Collapse
|
2
|
Bastarache L, Delozier S, Pandit A, He J, Lewis A, Annis AC, LeFaive J, Denny JC, Carroll RJ, Altman RB, Hughey JJ, Zawistowski M, Peterson JF. The phenotype-genotype reference map: Improving biobank data science through replication. Am J Hum Genet 2023; 110:1522-1533. [PMID: 37607538 PMCID: PMC10502848 DOI: 10.1016/j.ajhg.2023.07.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/26/2023] [Accepted: 07/27/2023] [Indexed: 08/24/2023] Open
Abstract
Population-scale biobanks linked to electronic health record data provide vast opportunities to extend our knowledge of human genetics and discover new phenotype-genotype associations. Given their dense phenotype data, biobanks can also facilitate replication studies on a phenome-wide scale. Here, we introduce the phenotype-genotype reference map (PGRM), a set of 5,879 genetic associations from 523 GWAS publications that can be used for high-throughput replication experiments. PGRM phenotypes are standardized as phecodes, ensuring interoperability between biobanks. We applied the PGRM to five ancestry-specific cohorts from four independent biobanks and found evidence of robust replications across a wide array of phenotypes. We show how the PGRM can be used to detect data corruption and to empirically assess parameters for phenome-wide studies. Finally, we use the PGRM to explore factors associated with replicability of GWAS results.
Collapse
Affiliation(s)
- Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Sarah Delozier
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Anita Pandit
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Jing He
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Adam Lewis
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Aubrey C Annis
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Jonathon LeFaive
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Russ B Altman
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Jacob J Hughey
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Matthew Zawistowski
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Josh F Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
3
|
Abstract
Electronic health records (EHRs) are a rich source of data for researchers, but extracting meaningful information out of this highly complex data source is challenging. Phecodes represent one strategy for defining phenotypes for research using EHR data. They are a high-throughput phenotyping tool based on ICD (International Classification of Diseases) codes that can be used to rapidly define the case/control status of thousands of clinically meaningful diseases and conditions. Phecodes were originally developed to conduct phenome-wide association studies to scan for phenotypic associations with common genetic variants. Since then, phecodes have been used to support a wide range of EHR-based phenotyping methods, including the phenotype risk score. This review aims to comprehensively describe the development, validation, and applications of phecodes and suggest some future directions for phecodes and high-throughput phenotyping.
Collapse
Affiliation(s)
- Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA;
| |
Collapse
|