2
|
Beesley LJ, Salvatore M, Fritsche LG, Pandit A, Rao A, Brummett C, Willer CJ, Lisabeth LD, Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat Med 2020; 39:773-800. [PMID: 31859414 PMCID: PMC7983809 DOI: 10.1002/sim.8445] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 09/10/2019] [Accepted: 11/16/2019] [Indexed: 01/03/2023]
Abstract
Biobanks linked to electronic health records provide rich resources for health-related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis-generating studies of disease-treatment, disease-exposure, and disease-gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank-based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank-based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.
Collapse
Affiliation(s)
| | | | | | - Anita Pandit
- University of Michigan, Department of Biostatistics
| | - Arvind Rao
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | - Chad Brummett
- University of Michigan, Department of Anesthesiology
| | - Cristen J. Willer
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | | | | |
Collapse
|
3
|
Glicksberg BS, Oskotsky B, Thangaraj PM, Giangreco N, Badgeley MA, Johnson KW, Datta D, Rudrapatna VA, Rappoport N, Shervey MM, Miotto R, Goldstein TC, Rutenberg E, Frazier R, Lee N, Israni S, Larsen R, Percha B, Li L, Dudley JT, Tatonetti NP, Butte AJ. PatientExploreR: an extensible application for dynamic visualization of patient clinical history from electronic health records in the OMOP common data model. Bioinformatics 2019; 35:4515-4518. [PMID: 31214700 PMCID: PMC6821222 DOI: 10.1093/bioinformatics/btz409] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 03/20/2019] [Accepted: 06/13/2019] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Electronic health records (EHRs) are quickly becoming omnipresent in healthcare, but interoperability issues and technical demands limit their use for biomedical and clinical research. Interactive and flexible software that interfaces directly with EHR data structured around a common data model (CDM) could accelerate more EHR-based research by making the data more accessible to researchers who lack computational expertise and/or domain knowledge. RESULTS We present PatientExploreR, an extensible application built on the R/Shiny framework that interfaces with a relational database of EHR data in the Observational Medical Outcomes Partnership CDM format. PatientExploreR produces patient-level interactive and dynamic reports and facilitates visualization of clinical data without any programming required. It allows researchers to easily construct and export patient cohorts from the EHR for analysis with other software. This application could enable easier exploration of patient-level data for physicians and researchers. PatientExploreR can incorporate EHR data from any institution that employs the CDM for users with approved access. The software code is free and open source under the MIT license, enabling institutions to install and users to expand and modify the application for their own purposes. AVAILABILITY AND IMPLEMENTATION PatientExploreR can be freely obtained from GitHub: https://github.com/BenGlicksberg/PatientExploreR. We provide instructions for how researchers with approved access to their institutional EHR can use this package. We also release an open sandbox server of synthesized patient data for users without EHR access to explore: http://patientexplorer.ucsf.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Benjamin S Glicksberg
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Boris Oskotsky
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Phyllis M Thangaraj
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Medicine, Columbia University, New York, NY, USA
| | - Nicholas Giangreco
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Medicine, Columbia University, New York, NY, USA
| | - Marcus A Badgeley
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Kipp W Johnson
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Debajyoti Datta
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Vivek A Rudrapatna
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Division of Gastroenterology, Department of Medicine, University of California, San Francisco, CA, USA
| | - Nadav Rappoport
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Mark M Shervey
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Riccardo Miotto
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Theodore C Goldstein
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Eugenia Rutenberg
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Remi Frazier
- Enterprise Information and Analytics, University of California, San Francisco, San Francisco, CA, USA
| | - Nelson Lee
- Enterprise Information and Analytics, University of California, San Francisco, San Francisco, CA, USA
| | - Sharat Israni
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Rick Larsen
- Enterprise Information and Analytics, University of California, San Francisco, San Francisco, CA, USA
| | - Bethany Percha
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Li Li
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Joel T Dudley
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Medicine, Columbia University, New York, NY, USA
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Center for Data-Driven Insights and Innovation, University of California Health, Oakland, CA, USA
| |
Collapse
|
4
|
Glicksberg BS, Oskotsky B, Giangreco N, Thangaraj PM, Rudrapatna V, Datta D, Frazier R, Lee N, Larsen R, Tatonetti NP, Butte AJ. ROMOP: a light-weight R package for interfacing with OMOP-formatted electronic health record data. JAMIA Open 2019; 2:10-14. [PMID: 31633087 PMCID: PMC6800657 DOI: 10.1093/jamiaopen/ooy059] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 10/26/2018] [Accepted: 12/02/2018] [Indexed: 12/03/2022] Open
Abstract
OBJECTIVES Electronic health record (EHR) data are increasingly used for biomedical discoveries. The nature of the data, however, requires expertise in both data science and EHR structure. The Observational Medical Out-comes Partnership (OMOP) common data model (CDM) standardizes the language and structure of EHR data to promote interoperability of EHR data for research. While the OMOP CDM is valuable and more attuned to research purposes, it still requires extensive domain knowledge to utilize effectively, potentially limiting more widespread adoption of EHR data for research and quality improvement. MATERIALS AND METHODS We have created ROMOP: an R package for direct interfacing with EHR data in the OMOP CDM format. RESULTS ROMOP streamlines typical EHR-related data processes. Its functions include exploration of data types, extraction and summarization of patient clinical and demographic data, and patient searches using any CDM vocabulary concept. CONCLUSION ROMOP is freely available under the Massachusetts Institute of Technology (MIT) license and can be obtained from GitHub (http://github.com/BenGlicksberg/ROMOP). We detail instructions for setup and use in the Supplementary Materials. Additionally, we provide a public sandbox server containing synthesized clinical data for users to explore OMOP data and ROMOP (http://romop.ucsf.edu).
Collapse
Affiliation(s)
- Benjamin S Glicksberg
- Department of Pediatrics Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA
| | - Boris Oskotsky
- Department of Pediatrics Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA
| | - Nicholas Giangreco
- Departments of Biomedical Informatics, Systems Biology, and Medicine, Columbia University, New York, New York, USA
| | - Phyllis M Thangaraj
- Departments of Biomedical Informatics, Systems Biology, and Medicine, Columbia University, New York, New York, USA
| | - Vivek Rudrapatna
- Department of Pediatrics Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA
| | - Debajyoti Datta
- Department of Pediatrics Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA
| | - Remi Frazier
- Academic Research Systems, Department of Enterprise Data Warehouse University of California San Francisco, San Francisco, California, USA
| | - Nelson Lee
- Academic Research Systems, Department of Enterprise Data Warehouse University of California San Francisco, San Francisco, California, USA
| | - Rick Larsen
- Academic Research Systems, Department of Enterprise Data Warehouse University of California San Francisco, San Francisco, California, USA
| | - Nicholas P Tatonetti
- Departments of Biomedical Informatics, Systems Biology, and Medicine, Columbia University, New York, New York, USA
| | - Atul J Butte
- Department of Pediatrics Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|