1
|
Rashid R, Copelli S, Silverstein JC, Becich MJ. REDCap and the National Mesothelioma Virtual Bank-a scalable and sustainable model for rare disease biorepositories. J Am Med Inform Assoc 2023; 30:1634-1644. [PMID: 37487555 PMCID: PMC10531116 DOI: 10.1093/jamia/ocad132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 05/16/2023] [Accepted: 07/10/2023] [Indexed: 07/26/2023] Open
Abstract
OBJECTIVE Rare disease research requires data sharing networks to power translational studies. We describe novel use of Research Electronic Data Capture (REDCap), a web application for managing clinical data, by the National Mesothelioma Virtual Bank, a federated biospecimen, and data sharing network. MATERIALS AND METHODS National Mesothelioma Virtual Bank (NMVB) uses REDCap to integrate honest broker activities, enabling biospecimen and associated clinical data provisioning to investigators. A Web Portal Query tool was developed to source and visualize REDCap data in interactive, faceted search, enabling cohort discovery by public users. An AWS Lambda function behind an API calculates the counts visually presented, while protecting record level data. The user-friendly interface, quick responsiveness, automatic generation from REDCap, and flexibility to new data, was engineered to sustain the NMVB research community. RESULTS NMVB implementations enabled a network of 8 research institutions with over 2000 mesothelioma cases, including clinical annotations and biospecimens, and public users' cohort discovery and summary statistics. NMVB usage and impact is demonstrated by high website visits (>150 unique queries per month), resource use requests (>50 letter of interests), and citations (>900) to papers published using NMVB resources. DISCUSSION NMVB's REDCap implementation and query tool is a framework for implementing federated and integrated rare disease biobanks and registries. Advantages of this framework include being low-cost, modular, scalable, and efficient. Future advances to NVMB's implementations will include incorporation of -omics data and development of downstream analysis tools to advance mesothelioma and rare disease research. CONCLUSION NVMB presents a framework for integrating biobanks and patient registries to enable translational research for rare diseases.
Collapse
Affiliation(s)
- Rumana Rashid
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
- Medical Scientist Training Program, University of Pittsburgh-Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Susan Copelli
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
| | - Michael J Becich
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
2
|
Pennington JW, Ruth B, Miller JM, Peterson J, Xu B, Masino A, Krantz I, Manganella J, Gomes T, Stiles D, Kenna M, Hood LJ, Germiller J, Crenshaw EB. Perspective on the Development of a Large-Scale Clinical Data Repository for Pediatric Hearing Research. Ear Hear 2021; 41:231-238. [PMID: 31408044 PMCID: PMC7007829 DOI: 10.1097/aud.0000000000000779] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The use of "big data" for pediatric hearing research requires new approaches to both data collection and research methods. The widespread deployment of electronic health record systems creates new opportunities and corresponding challenges in the secondary use of large volumes of audiological and medical data. Opportunities include cost-effective hypothesis generation, rapid cohort expansion for rare conditions, and observational studies based on sample sizes in the thousands to tens of thousands. Challenges include finding and forming appropriately skilled teams, access to data, data quality assessment, and engagement with a research community new to big data. The authors share their experience and perspective on the work required to build and validate a pediatric hearing research database that integrates clinical data for over 185,000 patients from the electronic health record systems of three major academic medical centers.
Collapse
Affiliation(s)
- Jeffrey W. Pennington
- Department of Biomedical and Health Informatics, The Children’s Hospital Of Philadelphia, Philadelphia, PA, USA
| | - Byron Ruth
- Department of Biomedical and Health Informatics, The Children’s Hospital Of Philadelphia, Philadelphia, PA, USA
| | - Jeffrey M. Miller
- Department of Biomedical and Health Informatics, The Children’s Hospital Of Philadelphia, Philadelphia, PA, USA
| | - Joy Peterson
- Center for Childhood Communication, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Baichen Xu
- Center for Childhood Communication, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Aaron Masino
- Department of Biomedical and Health Informatics, The Children’s Hospital Of Philadelphia, Philadelphia, PA, USA
| | - Ian Krantz
- Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- The Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Juliana Manganella
- Department of Otolaryngology and Communication Enhancement, Boston Children's Hospital, Boston, MA, USA
| | - Tamar Gomes
- Department of Otolaryngology and Communication Enhancement, Boston Children's Hospital, Boston, MA, USA
| | - Derek Stiles
- Department of Otolaryngology and Communication Enhancement, Boston Children's Hospital, Boston, MA, USA
| | - Margaret Kenna
- Department of Otolaryngology and Communication Enhancement, Boston Children's Hospital, Boston, MA, USA
| | - Linda J. Hood
- Department of Hearing and Speech Sciences, Vanderbilt Bill Wilkerson Center, Vanderbilt University, Nashville, TN, USA
| | - John Germiller
- Division of Otolaryngology, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Otorhinolaryngology: Head and Neck Surgery, The Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - E. Bryan Crenshaw
- Center for Childhood Communication, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Otorhinolaryngology: Head and Neck Surgery, The Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
3
|
Gagalova KK, Leon Elizalde MA, Portales-Casamar E, Görges M. What You Need to Know Before Implementing a Clinical Research Data Warehouse: Comparative Review of Integrated Data Repositories in Health Care Institutions. JMIR Form Res 2020; 4:e17687. [PMID: 32852280 PMCID: PMC7484778 DOI: 10.2196/17687] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 06/09/2020] [Accepted: 07/17/2020] [Indexed: 12/23/2022] Open
Abstract
Background Integrated data repositories (IDRs), also referred to as clinical data warehouses, are platforms used for the integration of several data sources through specialized analytical tools that facilitate data processing and analysis. IDRs offer several opportunities for clinical data reuse, and the number of institutions implementing an IDR has grown steadily in the past decade. Objective The architectural choices of major IDRs are highly diverse and determining their differences can be overwhelming. This review aims to explore the underlying models and common features of IDRs, provide a high-level overview for those entering the field, and propose a set of guiding principles for small- to medium-sized health institutions embarking on IDR implementation. Methods We reviewed manuscripts published in peer-reviewed scientific literature between 2008 and 2020, and selected those that specifically describe IDR architectures. Of 255 shortlisted articles, we found 34 articles describing 29 different architectures. The different IDRs were analyzed for common features and classified according to their data processing and integration solution choices. Results Despite common trends in the selection of standard terminologies and data models, the IDRs examined showed heterogeneity in the underlying architecture design. We identified 4 common architecture models that use different approaches for data processing and integration. These different approaches were driven by a variety of features such as data sources, whether the IDR was for a single institution or a collaborative project, the intended primary data user, and purpose (research-only or including clinical or operational decision making). Conclusions IDR implementations are diverse and complex undertakings, which benefit from being preceded by an evaluation of requirements and definition of scope in the early planning stage. Factors such as data source diversity and intended users of the IDR influence data flow and synchronization, both of which are crucial factors in IDR architecture planning.
Collapse
Affiliation(s)
- Kristina K Gagalova
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada.,Research Institute, BC Children's Hospital, Vancouver, BC, Canada
| | - M Angelica Leon Elizalde
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,School of Population and Public Health, University of British Columbia, Vancouver, BC, Canada
| | - Elodie Portales-Casamar
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,Department of Pediatrics, University of British Columbia, Vancouver, BC, Canada
| | - Matthias Görges
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,Department of Anesthesiology, Pharmacology and Therapeutics, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
4
|
Trifan A, Oliveira JL. Patient data discovery platforms as enablers of biomedical and translational research: A systematic review. J Biomed Inform 2019; 93:103154. [PMID: 30922867 DOI: 10.1016/j.jbi.2019.103154] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Revised: 03/15/2019] [Accepted: 03/18/2019] [Indexed: 11/28/2022]
Abstract
BACKGROUND The global shift from paper health records to electronic ones has led to an impressive growth of biomedical digital data along the past two decades. Exploring and extracting knowledge from these data has the potential to enhance translational research and lead to positive outcomes for the population's health and healthcare. OBECTIVE The aim of this study was to conduct a systematic review to identify software platforms that enable discovery, secondary use and interoperability of biomedical data. Additionally, we aim evaluating the identified solutions in terms of clinical interest and main healthcare-related outcomes. METHODS A systematic search of the scientific literature published and indexed in Pubmed between January 2014 and September 2018 was performed. Inclusion criteria were as follows: relevance for the topic of biomedical data discovery, English language, and free full text. To increase the recall, we developed a semi-automatic and incremental methodology to retrieve articles that cite one or more of the previous set. RESULTS A total number of 500 candidate papers were retrieved through this methodology. Of these, 85 were eligible for abstract assessment. Finally, 37 studies qualified for a full-text review, and 20 provided enough information for the study objectives. CONCLUSIONS This study revealed that biomedical discovery platforms are both a current necessity and a significantly innovative agent in the area of healthcare. The outcomes that were identified, in terms of scientific publications, clinical studies and research collaborations stand as evidence.
Collapse
|
5
|
Dietrich G, Krebs J, Fette G, Ertl M, Kaspar M, Störk S, Puppe F. Ad Hoc Information Extraction for Clinical Data Warehouses. Methods Inf Med 2018; 57:e22-e29. [PMID: 29801178 PMCID: PMC6193399 DOI: 10.3414/me17-02-0010] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Background:
Clinical Data Warehouses (CDW) reuse Electronic health records (EHR) to make their data retrievable for research purposes or patient recruitment for clinical trials. However, much information are hidden in unstructured data like discharge letters. They can be preprocessed and converted to structured data via information extraction (IE), which is unfortunately a laborious task and therefore usually not available for most of the text data in CDW.
Objectives:
The goal of our work is to provide an ad hoc IE service that allows users to query text data ad hoc in a manner similar to querying structured data in a CDW. While search engines just return text snippets, our systems also returns frequencies (e.g. how many patients exist with “heart failure” including textual synonyms or how many patients have an LVEF < 45) based on the content of discharge letters or textual reports for special investigations like heart echo. Three subtasks are addressed: (1) To recognize and to exclude negations and their scopes, (2) to extract concepts, i.e. Boolean values and (3) to extract numerical values.
Methods:
We implemented an extended version of the NegEx-algorithm for German texts that detects negations and determines their scope. Furthermore, our document oriented CDW PaDaWaN was extended with query functions, e.g. context sensitive queries and regex queries, and an extraction mode for computing the frequencies for Boolean and numerical values.
Results:
Evaluations in chest X-ray reports and in discharge letters showed high F1-scores for the three subtasks: Detection of negated concepts in chest X-ray reports with an F1-score of 0.99 and in discharge letters with 0.97; of Boolean values in chest X-ray reports about 0.99, and of numerical values in chest X-ray reports and discharge letters also around 0.99 with the exception of the concept age.
Discussion:
The advantages of an ad hoc IE over a standard IE are the low development effort (just entering the concept with its variants), the promptness of the results and the adaptability by the user to his or her particular question. Disadvantage are usually lower accuracy and confidence.
This ad hoc information extraction approach is novel and exceeds existing systems: Roogle [
1
] extracts predefined concepts from texts at preprocessing and makes them retrievable at runtime. Dr. Warehouse [
2
] applies negation detection and indexes the produced subtexts which include affirmed findings. Our approach combines negation detection and the extraction of concepts. But the extraction does not take place during preprocessing, but at runtime. That provides an ad hoc, dynamic, interactive and adjustable information extraction of random concepts and even their values on the fly at runtime.
Conclusions:
We developed an ad hoc information extraction query feature for Boolean and numerical values within a CDW with high recall and precision based on a pipeline that detects and removes negations and their scope in clinical texts.
Collapse
Affiliation(s)
- Georg Dietrich
- Computer Science, University of Wuerzburg, Wuerzburg, Germany
- Correspondence to: Georg Dietrich University of WuerzburgComputer ScienceAm Hubland97070 WuerzburgGermany
| | - Jonathan Krebs
- Computer Science, University of Wuerzburg, Wuerzburg, Germany
| | - Georg Fette
- Computer Science, University of Wuerzburg, Wuerzburg, Germany
- Comprehensive Heart Failure Center (CHFC), University Hospital of Wuerzburg, Wuerzburg, Germany
| | - Maximilian Ertl
- Service Center Medical Informatics, University Hospital of Wuerzburg, Wuerzburg, Germany
| | - Mathias Kaspar
- Comprehensive Heart Failure Center (CHFC), University Hospital of Wuerzburg, Wuerzburg, Germany
| | - Stefan Störk
- Comprehensive Heart Failure Center (CHFC), University Hospital of Wuerzburg, Wuerzburg, Germany
| | - Frank Puppe
- Computer Science, University of Wuerzburg, Wuerzburg, Germany
| |
Collapse
|
6
|
Tao S, Cui L, Wu X, Zhang GQ. Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2017:1685-1694. [PMID: 29854239 PMCID: PMC5977665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution.
Collapse
Affiliation(s)
- Shiqiang Tao
- Institute for Biomedical Informatics, University of Kentucky, Lexington, KY
| | - Licong Cui
- Computer Science Department, University of Kentucky, Lexington, KY
| | - Xi Wu
- Computer Science Department, University of Kentucky, Lexington, KY
| | - Guo-Qiang Zhang
- Institute for Biomedical Informatics, University of Kentucky, Lexington, KY
| |
Collapse
|
7
|
Williams R, Kontopantelis E, Buchan I, Peek N. Clinical code set engineering for reusing EHR data for research: A review. J Biomed Inform 2017; 70:1-13. [PMID: 28442434 DOI: 10.1016/j.jbi.2017.04.010] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Revised: 03/21/2017] [Accepted: 04/13/2017] [Indexed: 01/26/2023]
Abstract
INTRODUCTION The construction of reliable, reusable clinical code sets is essential when re-using Electronic Health Record (EHR) data for research. Yet code set definitions are rarely transparent and their sharing is almost non-existent. There is a lack of methodological standards for the management (construction, sharing, revision and reuse) of clinical code sets which needs to be addressed to ensure the reliability and credibility of studies which use code sets. OBJECTIVE To review methodological literature on the management of sets of clinical codes used in research on clinical databases and to provide a list of best practice recommendations for future studies and software tools. METHODS We performed an exhaustive search for methodological papers about clinical code set engineering for re-using EHR data in research. This was supplemented with papers identified by snowball sampling. In addition, a list of e-phenotyping systems was constructed by merging references from several systematic reviews on this topic, and the processes adopted by those systems for code set management was reviewed. RESULTS Thirty methodological papers were reviewed. Common approaches included: creating an initial list of synonyms for the condition of interest (n=20); making use of the hierarchical nature of coding terminologies during searching (n=23); reviewing sets with clinician input (n=20); and reusing and updating an existing code set (n=20). Several open source software tools (n=3) were discovered. DISCUSSION There is a need for software tools that enable users to easily and quickly create, revise, extend, review and share code sets and we provide a list of recommendations for their design and implementation. CONCLUSION Research re-using EHR data could be improved through the further development, more widespread use and routine reporting of the methods by which clinical codes were selected.
Collapse
Affiliation(s)
- Richard Williams
- MRC Health eResearch Centre, University of Manchester, Manchester, UK; NIHR Greater Manchester Primary Care Patient Safety Translational Research Centre, University of Manchester, Manchester, UK.
| | - Evangelos Kontopantelis
- MRC Health eResearch Centre, University of Manchester, Manchester, UK; NIHR School for Primary Care Research, University of Manchester, Manchester, UK
| | - Iain Buchan
- MRC Health eResearch Centre, University of Manchester, Manchester, UK; NIHR Greater Manchester Primary Care Patient Safety Translational Research Centre, University of Manchester, Manchester, UK; NIHR Manchester Biomedical Research Centre, University of Manchester, Manchester, UK
| | - Niels Peek
- MRC Health eResearch Centre, University of Manchester, Manchester, UK; NIHR Greater Manchester Primary Care Patient Safety Translational Research Centre, University of Manchester, Manchester, UK
| |
Collapse
|
8
|
Felmeister AS, Masino AJ, Rivera TJ, Resnick AC, Pennington JW. The biorepository portal toolkit: an honest brokered, modular service oriented software tool set for biospecimen-driven translational research. BMC Genomics 2016; 17 Suppl 4:434. [PMID: 27535360 PMCID: PMC5001241 DOI: 10.1186/s12864-016-2797-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND High throughput molecular sequencing and increased biospecimen variety have introduced significant informatics challenges for research biorepository infrastructures. We applied a modular system integration approach to develop an operational biorepository management system. This method enables aggregation of the clinical, specimen and genomic data collected for biorepository resources. METHODS We introduce an electronic Honest Broker (eHB) and Biorepository Portal (BRP) open source project that, in tandem, allow for data integration while protecting patient privacy. This modular approach allows data and specimens to be associated with a biorepository subject at any time point asynchronously. This lowers the bar to develop new research projects based on scientific merit without institutional review for a proposal. RESULTS By facilitating the automated de-identification of specimen and associated clinical and genomic data we create a future proofed specimen set that can withstand new workflows and be connected to new associated information over time. Thus facilitating collaborative advanced genomic and tissue research. CONCLUSIONS As of Janurary of 2016 there are 23 unique protocols/patient cohorts being managed in the Biorepository Portal (BRP). There are over 4000 unique subject records in the electronic honest broker (eHB), over 30,000 specimens accessioned and 8 institutions participating in various biobanking activities using this tool kit. We specifically set out to build rich annotation of biospecimens with longitudinal clinical data; BRP/REDCap integration for multi-institutional repositories; EMR integration; further annotated specimens with genomic data specific to a domain; build application hooks for experiments at the specimen level integrated with analytic software; while protecting privacy per the Office of Civil Rights (OCR) and HIPAA.
Collapse
Affiliation(s)
- Alex S Felmeister
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA, USA.
- College of Computing and Informatics, Drexel University, 3141 Chestnut Street, Philadelphia, PA, USA.
| | - Aaron J Masino
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA, USA
| | - Tyler J Rivera
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA, USA
| | - Adam C Resnick
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA, USA
- Department of Neurosurgery, Perelman School of Medicine at the University of Pennsylvania, 3400 Civic Center Boulevard, Building 421, Philadelphia, PA, USA
| | - Jeffrey W Pennington
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA, USA
| |
Collapse
|
9
|
Badgeley MA, Shameer K, Glicksberg BS, Tomlinson MS, Levin MA, McCormick PJ, Kasarskis A, Reich DL, Dudley JT. EHDViz: clinical dashboard development using open-source technologies. BMJ Open 2016; 6:e010579. [PMID: 27013597 PMCID: PMC4809078 DOI: 10.1136/bmjopen-2015-010579] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
OBJECTIVE To design, develop and prototype clinical dashboards to integrate high-frequency health and wellness data streams using interactive and real-time data visualisation and analytics modalities. MATERIALS AND METHODS We developed a clinical dashboard development framework called electronic healthcare data visualization (EHDViz) toolkit for generating web-based, real-time clinical dashboards for visualising heterogeneous biomedical, healthcare and wellness data. The EHDViz is an extensible toolkit that uses R packages for data management, normalisation and producing high-quality visualisations over the web using R/Shiny web server architecture. We have developed use cases to illustrate utility of EHDViz in different scenarios of clinical and wellness setting as a visualisation aid for improving healthcare delivery. RESULTS Using EHDViz, we prototyped clinical dashboards to demonstrate the contextual versatility of EHDViz toolkit. An outpatient cohort was used to visualise population health management tasks (n=14,221), and an inpatient cohort was used to visualise real-time acuity risk in a clinical unit (n=445), and a quantified-self example using wellness data from a fitness activity monitor worn by a single individual was also discussed (n-of-1). The back-end system retrieves relevant data from data source, populates the main panel of the application and integrates user-defined data features in real-time and renders output using modern web browsers. The visualisation elements can be customised using health features, disease names, procedure names or medical codes to populate the visualisations. The source code of EHDViz and various prototypes developed using EHDViz are available in the public domain at http://ehdviz.dudleylab.org. CONCLUSIONS Collaborative data visualisations, wellness trend predictions, risk estimation, proactive acuity status monitoring and knowledge of complex disease indicators are essential components of implementing data-driven precision medicine. As an open-source visualisation framework capable of integrating health assessment, EHDViz aims to be a valuable toolkit for rapid design, development and implementation of scalable clinical data visualisation dashboards.
Collapse
Affiliation(s)
- Marcus A Badgeley
- Harris Center for Precision Wellness, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
| | - Khader Shameer
- Harris Center for Precision Wellness, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
| | - Benjamin S Glicksberg
- Harris Center for Precision Wellness, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
| | - Max S Tomlinson
- Harris Center for Precision Wellness, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
| | - Matthew A Levin
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
- Department of Anesthesiology, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
| | - Patrick J McCormick
- Department of Anesthesiology, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
| | - Andrew Kasarskis
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
| | - David L Reich
- Department of Anesthesiology, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
| | - Joel T Dudley
- Harris Center for Precision Wellness, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
| |
Collapse
|
10
|
Xu J, Rasmussen LV, Shaw PL, Jiang G, Kiefer RC, Mo H, Pacheco JA, Speltz P, Zhu Q, Denny JC, Pathak J, Thompson WK, Montague E. Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research. J Am Med Inform Assoc 2015. [PMID: 26224336 DOI: 10.1093/jamia/ocv070] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
OBJECTIVE To review and evaluate available software tools for electronic health record-driven phenotype authoring in order to identify gaps and needs for future development. MATERIALS AND METHODS Candidate phenotype authoring tools were identified through (1) literature search in four publication databases (PubMed, Embase, Web of Science, and Scopus) and (2) a web search. A collection of tools was compiled and reviewed after the searches. A survey was designed and distributed to the developers of the reviewed tools to discover their functionalities and features. RESULTS Twenty-four different phenotype authoring tools were identified and reviewed. Developers of 16 of these identified tools completed the evaluation survey (67% response rate). The surveyed tools showed commonalities but also varied in their capabilities in algorithm representation, logic functions, data support and software extensibility, search functions, user interface, and data outputs. DISCUSSION Positive trends identified in the evaluation included: algorithms can be represented in both computable and human readable formats; and most tools offer a web interface for easy access. However, issues were also identified: many tools were lacking advanced logic functions for authoring complex algorithms; the ability to construct queries that leveraged un-structured data was not widely implemented; and many tools had limited support for plug-ins or external analytic software. CONCLUSIONS Existing phenotype authoring tools could enable clinical researchers to work with electronic health record data more efficiently, but gaps still exist in terms of the functionalities of such tools. The present work can serve as a reference point for the future development of similar tools.
Collapse
Affiliation(s)
- Jie Xu
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Pamela L Shaw
- Galter Health Science Library, Clinical and Translational Sciences Institute (NUCATS), Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Richard C Kiefer
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Huan Mo
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Jennifer A Pacheco
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Peter Speltz
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Qian Zhu
- Department of Information Systems, University of Maryland, Baltimore County (UMBC), Baltimore, MD, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Jyotishman Pathak
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - William K Thompson
- Center for Biomedical Research Informatics, NorthShore University Health System, Evanston, IL, USA
| | - Enid Montague
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| |
Collapse
|
11
|
McArt DG, Blayney JK, Boyle DP, Irwin GW, Moran M, Hutchinson RA, Bankhead P, Kieran D, Wang Y, Dunne PD, Kennedy RD, Mullan PB, Harkin DP, Catherwood MA, James JA, Salto-Tellez M, Hamilton PW. PICan: An integromics framework for dynamic cancer biomarker discovery. Mol Oncol 2015; 9:1234-40. [PMID: 25814194 DOI: 10.1016/j.molonc.2015.02.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Revised: 12/23/2014] [Accepted: 02/05/2015] [Indexed: 02/05/2023] Open
Abstract
Modern cancer research on prognostic and predictive biomarkers demands the integration of established and emerging high-throughput technologies. However, these data are meaningless unless carefully integrated with patient clinical outcome and epidemiological information. Integrated datasets hold the key to discovering new biomarkers and therapeutic targets in cancer. We have developed a novel approach and set of methods for integrating and interrogating phenomic, genomic and clinical data sets to facilitate cancer biomarker discovery and patient stratification. Applied to a known paradigm, the biological and clinical relevance of TP53, PICan was able to recapitulate the known biomarker status and prognostic significance at a DNA, RNA and protein levels.
Collapse
Affiliation(s)
- Darragh G McArt
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - Jaine K Blayney
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - David P Boyle
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - Gareth W Irwin
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - Michael Moran
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - Ryan A Hutchinson
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - Peter Bankhead
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - Declan Kieran
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - Yinhai Wang
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - Philip D Dunne
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - Richard D Kennedy
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - Paul B Mullan
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - D Paul Harkin
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - Mark A Catherwood
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - Jacqueline A James
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom
| | - Manuel Salto-Tellez
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom.
| | - Peter W Hamilton
- Centre for Cancer Research and Cell Biology (CCRCB), Queen's University Belfast, Belfast, United Kingdom.
| |
Collapse
|
12
|
Dixit A, Dobson RJB. CohortExplorer: A Generic Application Programming Interface for Entity Attribute Value Database Schemas. JMIR Med Inform 2014; 2:e32. [PMID: 25601296 PMCID: PMC4288104 DOI: 10.2196/medinform.3339] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Revised: 08/03/2014] [Accepted: 09/19/2014] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Most electronic data capture (EDC) and electronic data management (EDM) systems developed to collect and store clinical data from participants recruited into studies are based on generic entity-attribute-value (EAV) database schemas which enable rapid and flexible deployment in a range of study designs. The drawback to such schemas is that they are cumbersome to query with structured query language (SQL). The problem increases when researchers involved in multiple studies use multiple electronic data capture and management systems each with variation on the EAV schema. OBJECTIVE The aim of this study is to develop a generic application which allows easy and rapid exploration of data and metadata stored under EAV schemas that are organized into a survey format (questionnaires/events, questions, values), in other words, the Clinical Data Interchange Standards Consortium (CDISC) Observational Data Model (ODM). METHODS CohortExplorer is written in Perl programming language and uses the concept of SQL abstract which allows the SQL query to be treated like a hash (key-value pairs). RESULTS We have developed a tool, CohortExplorer, which once configured for a EAV system will "plug-n-play" with EAV schemas, enabling the easy construction of complex queries through an abstracted interface. To demonstrate the utility of the CohortExplorer system, we show how it can be used with the popular EAV based frameworks; Opal (OBiBa) and REDCap. CONCLUSIONS The application is available under a GPL-3+ license at the CPAN website. Currently the application only provides datasource application programming interfaces (APIs) for Opal and REDCap. In the future the application will be available with datasource APIs for all major electronic data capture and management systems such as OpenClinica and LabKey. At present the application is only compatible with EAV systems where the metadata is organized into surveys, questionnaires and events. Further work is needed to make the application compatible with EAV schemas where the metadata is organized into hierarchies such as Informatics for Integrating Biology & the Bedside (i2b2). A video tutorial demonstrating the application setup, datasource configuration, and search features is available on YouTube. The application source code is available at the GitHub website and the users are encouraged to suggest new features and contribute to the development of APIs for new EAV systems.
Collapse
Affiliation(s)
- Abhishek Dixit
- Institute of Psychiatry, NIHR Biomedical Research Centre for Mental Health & Biomedical Research Unit for Dementia, South London and Maudsley NHS Foundation Trust & Institute of Psychiatry, Kings College London, London, United Kingdom.
| | | |
Collapse
|
13
|
Hanauer DA, Hruby GW, Fort DG, Rasmussen LV, Mendonça EA, Weng C. What Is Asked in Clinical Data Request Forms? A Multi-site Thematic Analysis of Forms Towards Better Data Access Support. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2014; 2014:616-25. [PMID: 25954367 PMCID: PMC4419980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Many academic medical centers have aggregated data from multiple clinical systems into centralized repositories. These repositories can then be queried by skilled data analysts who act as intermediaries between the data stores and the research teams. To obtain data, researchers are often expected to complete a data request form. Such forms are meant to support record-keeping and, most importantly, provide a means for conveying complex data needs in a clear and understandable manner. Yet little is known about how data request forms are constructed and how effective they are likely to be. We conducted a content analysis of ten data request forms from CTSA-supported institutions. We found that most of the forms over-emphasized the collection of metadata that were not considered germane to the actual data needs. Based on our findings, we provide recommendations to improve the quality of data request forms in support of clinical and translational research.
Collapse
Affiliation(s)
- David A Hanauer
- Dept. of Pediatrics, University of Michigan, Ann Arbor, MI ; School of Information, University of Michigan, Ann Arbor, MI
| | - Gregory W Hruby
- Dept. of Biomedical Informatics, Columbia University, New York, NY
| | - Daniel G Fort
- Dept. of Biomedical Informatics, Columbia University, New York, NY
| | - Luke V Rasmussen
- Dept. of Preventive Medicine, Northwestern University, Chicago, IL
| | - Eneida A Mendonça
- Dept. Pediatrics, University of Wisconsin, Madison, WI ; Dept. of Biostatistics & Medical Informatics, University of Wisconsin, Madison, WI
| | - Chunhua Weng
- Dept. of Biomedical Informatics, Columbia University, New York, NY
| |
Collapse
|
14
|
Secondary use of clinical data: the Vanderbilt approach. J Biomed Inform 2014; 52:28-35. [PMID: 24534443 DOI: 10.1016/j.jbi.2014.02.003] [Citation(s) in RCA: 168] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2013] [Revised: 12/21/2013] [Accepted: 02/04/2014] [Indexed: 01/04/2023]
Abstract
The last decade has seen an exponential growth in the quantity of clinical data collected nationwide, triggering an increase in opportunities to reuse the data for biomedical research. The Vanderbilt research data warehouse framework consists of identified and de-identified clinical data repositories, fee-for-service custom services, and tools built atop the data layer to assist researchers across the enterprise. Providing resources dedicated to research initiatives benefits not only the research community, but also clinicians, patients and institutional leadership. This work provides a summary of our approach in the secondary use of clinical data for research domain, including a description of key components and a list of lessons learned, designed to assist others assembling similar services and infrastructure.
Collapse
|