1
|
Chuang YN, Tang R, Jiang X, Hu X. SPeC: A Soft Prompt-Based Calibration on Performance Variability of Large Language Model in Clinical Notes Summarization. J Biomed Inform 2024; 151:104606. [PMID: 38325698 DOI: 10.1016/j.jbi.2024.104606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 01/26/2024] [Accepted: 02/04/2024] [Indexed: 02/09/2024]
Abstract
Electronic health records (EHRs) store an extensive array of patient information, encompassing medical histories, diagnoses, treatments, and test outcomes. These records are crucial for enabling healthcare providers to make well-informed decisions regarding patient care. Summarizing clinical notes further assists healthcare professionals in pinpointing potential health risks and making better-informed decisions. This process contributes to reducing errors and enhancing patient outcomes by ensuring providers have access to the most pertinent and current patient data. Recent research has shown that incorporating instruction prompts with large language models (LLMs) substantially boosts the efficacy of summarization tasks. However, we show that this approach also leads to increased performance variance, resulting in significantly distinct summaries even when instruction prompts share similar meanings. To tackle this challenge, we introduce a model-agnostic Soft Prompt-BasedCalibration (SPeC) pipeline that employs soft prompts to lower variance while preserving the advantages of prompt-based summarization. Experimental findings on multiple clinical note tasks and LLMs indicate that our method not only bolsters performance but also effectively regulates variance across different LLMs, providing a more consistent and reliable approach to summarizing critical medical information.
Collapse
Affiliation(s)
| | - Ruixiang Tang
- Rice University, Houston, TX, United States of America
| | - Xiaoqian Jiang
- University of Texas Health Science Center, Houston, TX, United States of America
| | - Xia Hu
- Rice University, Houston, TX, United States of America.
| |
Collapse
|
2
|
Oniani D, Parmanto B, Saptono A, Bove A, Freburger J, Visweswaran S, Cappella N, McLay B, Silverstein JC, Becich MJ, Delitto A, Skidmore E, Wang Y. ReDWINE: A clinical datamart with text analytical capabilities to facilitate rehabilitation research. Int J Med Inform 2023; 177:105144. [PMID: 37459703 PMCID: PMC10528160 DOI: 10.1016/j.ijmedinf.2023.105144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/14/2023] [Accepted: 07/06/2023] [Indexed: 08/12/2023]
Abstract
Rehabilitation research focuses on determining the components of a treatment intervention, the mechanism of how these components lead to recovery and rehabilitation, and ultimately the optimal intervention strategies to maximize patients' physical, psychologic, and social functioning. Traditional randomized clinical trials that study and establish new interventions face challenges, such as high cost and time commitment. Observational studies that use existing clinical data to observe the effect of an intervention have shown several advantages over RCTs. Electronic Health Records (EHRs) have become an increasingly important resource for conducting observational studies. To support these studies, we developed a clinical research datamart, called ReDWINE (Rehabilitation Datamart With Informatics iNfrastructure for rEsearch), that transforms the rehabilitation-related EHR data collected from the UPMC health care system to the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to facilitate rehabilitation research. The standardized EHR data stored in ReDWINE will further reduce the time and effort required by investigators to pool, harmonize, clean, and analyze data from multiple sources, leading to more robust and comprehensive research findings. ReDWINE also includes deployment of data visualization and data analytics tools to facilitate cohort definition and clinical data analysis. These include among others the Open Health Natural Language Processing (OHNLP) toolkit, a high-throughput NLP pipeline, to provide text analytical capabilities at scale in ReDWINE. Using this comprehensive representation of patient data in ReDWINE for rehabilitation research will facilitate real-world evidence for health interventions and outcomes.
Collapse
Affiliation(s)
- David Oniani
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bambang Parmanto
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Andi Saptono
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Allyn Bove
- Department of Physical Therapy, University of Pittsburgh, Pittsburgh, PA, USA
| | - Janet Freburger
- Department of Physical Therapy, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA; Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nickie Cappella
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA; Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - Brian McLay
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA; Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA; Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - Michael J Becich
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA; Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - Anthony Delitto
- Department of Physical Therapy, University of Pittsburgh, Pittsburgh, PA, USA
| | - Elizabeth Skidmore
- Department of Occupational Therapy, University of Pittsburgh, Pittsburgh, PA, USA
| | - Yanshan Wang
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA; Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA; Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
3
|
Rogers JR, Pavisic J, Ta CN, Liu C, Soroush A, Cheung YK, Hripcsak G, Weng C. Leveraging electronic health record data for clinical trial planning by assessing eligibility criteria's impact on patient count and safety. J Biomed Inform 2022; 127:104032. [PMID: 35189334 PMCID: PMC8920749 DOI: 10.1016/j.jbi.2022.104032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 02/14/2022] [Accepted: 02/15/2022] [Indexed: 10/19/2022]
Abstract
OBJECTIVE To present an approach on using electronic health record (EHR) data that assesses how different eligibility criteria, either individually or in combination, can impact patient count and safety (exemplified by all-cause hospitalization risk) and further assist with criteria selection for prospective clinical trials. MATERIALS AND METHODS Trials in three disease domains - relapsed/refractory (r/r) lymphoma/leukemia; hepatitis C virus (HCV); stages 3 and 4 chronic kidney disease (CKD) - were analyzed as case studies for this approach. For each disease domain, criteria were identified and all criteria combinations were used to create EHR cohorts. Per combination, two values were derived: (1) number of eligible patients meeting the selected criteria; (2) hospitalization risk, measured as the hazard ratio between those that qualified and those that did not. From these values, k-means clustering was applied to derive which criteria combinations maximized patient counts but minimized hospitalization risk. RESULTS Criteria combinations that reduced hospitalization risk without substantial reductions on patient counts were as follows: for r/r lymphoma/leukemia (23 trials; 9 criteria; 623 patients), applying no infection and adequate absolute neutrophil count while forgoing no prior malignancy; for HCV (15; 7; 751), applying no human immunodeficiency virus and no hepatocellular carcinoma while forgoing no decompensated liver disease/cirrhosis; for CKD (10; 9; 23893), applying no congestive heart failure. CONCLUSIONS Within each disease domain, the more drastic effects were generally driven by a few criteria. Similar criteria across different disease domains introduce different changes. Although results are contingent on the trial sample and the EHR data used, this approach demonstrates how EHR data can inform the impact on safety and available patients when exploring different criteria combinations for designing clinical trials.
Collapse
Affiliation(s)
- James R. Rogers
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - Jovana Pavisic
- Department of Pediatrics, Division of Pediatric Hematology, Oncology, and Stem Cell Transplantation, Columbia University Irving Medical Center, New York, NY
| | - Casey N. Ta
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - Ali Soroush
- Department of Biomedical Informatics, Columbia University, New York, NY,Division of Gastroenterology, Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | | | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY,Medical Informatics Services, New York-Presbyterian Hospital, New York, NY
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, United States.
| |
Collapse
|
4
|
Taxter AJ, Natter MD. Using the Electronic Health Record to Enhance Care in Pediatric Rheumatology. Rheum Dis Clin North Am 2021; 48:245-258. [PMID: 34798950 DOI: 10.1016/j.rdc.2021.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The electronic health record (EHR) ecosystem is undergoing rapid evolution in response to new rules and regulations promulgated by the US HITECH Act (2009) and the 21st Century Cures Act (2016), which together promote and support enhanced information use, access, exchange, as well as vendor-agnostic application development. By leveraging emerging new standards and technology for EHR data interchange, for example, FHIR and SMART, pediatric rheumatology clinical care, research, and quality improvement communities will have the opportunity to streamline documentation workflows, integrate patient-reported outcomes into clinical care, reuse clinical data for research purposes, and embed implementation science approaches within the EHR.
Collapse
Affiliation(s)
- Alysha J Taxter
- Nationwide Children's Hospital, 700 Children's Drive, Columbus, OH 43205, USA.
| | - Marc D Natter
- Computational Health Informatics Program, Boston Children's Hospital, 300 Longwood Avenue BCH3187, Boston, MA 02115, USA; Mass General Hospital for Children, 55 Fruit Street, Boston, MA 02114, USA
| |
Collapse
|
5
|
Mazzotti DR. Landscape of biomedical informatics standards and terminologies for clinical sleep medicine research: A systematic review. Sleep Med Rev 2021; 60:101529. [PMID: 34455108 DOI: 10.1016/j.smrv.2021.101529] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 05/14/2021] [Accepted: 07/03/2021] [Indexed: 12/31/2022]
Abstract
A systematic literature review was conducted to understand the current landscape of standards and terminologies used in clinical sleep medicine. Literature search on PubMed, EMBASE, Medline and Web of Science was performed in March 2021 using terms related to sleep, terminologies, standards, harmonization, semantics, ontology, and electronic health records (EHR). Systematic review was carried out according to PRISMA. Among 128 included studies, 35 were eligible for review. Articles were broadly classified into six topics: standard terminology efforts, reporting standards, databases and resources, data integration efforts, EHR abstraction and standards for automated sleep scoring. This review highlights the progress and challenges related to establishing computable terminologies in sleep medicine, and identifies gaps, limitations and research opportunities related to data integration that could improve adoption of clinical research informatics in this field. There is a need for the systematic adoption of standardized terminologies in all areas of sleep medicine. Existing data aggregation resources could be leveraged to support the development of an integrated infrastructure and subsequent deployment in EHR systems within sleep centers. Ultimately, the adoption of standardized practices for documenting sleep disorders and related traits facilitates data sharing, thus accelerating discovery and clinical translation of informatics approaches applied to sleep medicine.
Collapse
Affiliation(s)
- Diego R Mazzotti
- Division of Medical Informatics, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, USA.
| |
Collapse
|
6
|
Rogers JR, Hripcsak G, Cheung YK, Weng C. Clinical comparison between trial participants and potentially eligible patients using electronic health record data: A generalizability assessment method. J Biomed Inform 2021; 119:103822. [PMID: 34044156 DOI: 10.1016/j.jbi.2021.103822] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/19/2021] [Accepted: 05/20/2021] [Indexed: 01/21/2023]
Abstract
OBJECTIVE To present a generalizability assessment method that compares baseline clinical characteristics of trial participants (TP) to potentially eligible (PE) patients as presented in their electronic health record (EHR) data while controlling for clinical setting and recruitment period. METHODS For each clinical trial, a clinical event was defined to identify patients of interest using available EHR data from one clinical setting during the trial's recruitment timeframe. The trial's eligibility criteria were then applied and patients were separated into two mutually exclusive groups: (1) TP, which were patients that participated in the trial per trial enrollment data; (2) PE, the remaining patients. The primary outcome was standardized differences in clinical characteristics between TP and PE per trial. A standardized difference was considered prominent if its absolute value was greater than or equal to 0.1. The secondary outcome was the difference in mean propensity scores (PS) between TP and PE per trial, in which the PS represented prediction for a patient to be in the trial. Three diverse trials were selected for illustration: one focused on hepatitis C virus (HCV) patients receiving a liver transplantation; one focused on leukemia patients and lymphoma patients; and one focused on appendicitis patients. RESULTS For the HCV trial, 43 TP and 83 PE were found, with 61 characteristics evaluated. Prominent differences were found among 69% of characteristics, with a mean PS difference of 0.13. For the leukemia/lymphoma trial, 23 TP and 23 PE were found, with 39 characteristics evaluated. Prominent differences were found among 82% of characteristics, with a mean PS difference of 0.76. For the appendicitis trial, 123 TP and 242 PE were found, with 52 characteristics evaluated. Prominent differences were found among 52% of characteristics, with a mean PS difference of 0.15. CONCLUSIONS Differences in clinical characteristics were observed between TP and PE among all three trials. In two of the three trials, not all of the differences necessarily compromised trial generalizability and subsets of PE could be considered similar to their corresponding TP. In the remaining trial, lack of generalizability appeared present, but may be a result of other factors such as small sample size or site recruitment strategy. These inconsistent findings suggest eligibility criteria alone are sometimes insufficient in defining a target group to generalize to. With caveats in limited scalability, EHR data quality, and lack of patient perspective on trial participation, this generalizability assessment method that incorporates control for temporality and clinical setting promise to better pinpoint clinical patterns and trial considerations.
Collapse
Affiliation(s)
- James R Rogers
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY, United States; Medical Informatics Services, New York-Presbyterian Hospital, New York, NY, United States
| | - Ying Kuen Cheung
- Department of Biostatistics, Columbia University, New York, NY, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, United States.
| |
Collapse
|
7
|
Fu S, Leung LY, Raulli AO, Kallmes DF, Kinsman KA, Nelson KB, Clark MS, Luetmer PH, Kingsbury PR, Kent DM, Liu H. Assessment of the impact of EHR heterogeneity for clinical research through a case study of silent brain infarction. BMC Med Inform Decis Mak 2020; 20:60. [PMID: 32228556 PMCID: PMC7106829 DOI: 10.1186/s12911-020-1072-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 03/12/2020] [Indexed: 01/14/2023] Open
Abstract
Background The rapid adoption of electronic health records (EHRs) holds great promise for advancing medicine through practice-based knowledge discovery. However, the validity of EHR-based clinical research is questionable due to poor research reproducibility caused by the heterogeneity and complexity of healthcare institutions and EHR systems, the cross-disciplinary nature of the research team, and the lack of standard processes and best practices for conducting EHR-based clinical research. Method We developed a data abstraction framework to standardize the process for multi-site EHR-based clinical studies aiming to enhance research reproducibility. The framework was implemented for a multi-site EHR-based research project, the ESPRESSO project, with the goal to identify individuals with silent brain infarctions (SBI) at Tufts Medical Center (TMC) and Mayo Clinic. The heterogeneity of healthcare institutions, EHR systems, documentation, and process variation in case identification was assessed quantitatively and qualitatively. Result We discovered a significant variation in the patient populations, neuroimaging reporting, EHR systems, and abstraction processes across the two sites. The prevalence of SBI for patients over age 50 for TMC and Mayo is 7.4 and 12.5% respectively. There is a variation regarding neuroimaging reporting where TMC are lengthy, standardized and descriptive while Mayo’s reports are short and definitive with more textual variations. Furthermore, differences in the EHR system, technology infrastructure, and data collection process were identified. Conclusion The implementation of the framework identified the institutional and process variations and the heterogeneity of EHRs across the sites participating in the case study. The experiment demonstrates the necessity to have a standardized process for data abstraction when conducting EHR-based clinical studies.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Lester Y Leung
- Department of Neurology, Tufts Medical Center, Boston, MA, USA
| | | | | | | | | | | | | | - Paul R Kingsbury
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - David M Kent
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
8
|
Vassy JL, Ho YL, Honerlaw J, Cho K, Gaziano JM, Wilson PWF, Gagnon DR. Yield and bias in defining a cohort study baseline from electronic health record data. J Biomed Inform 2018; 78:54-59. [PMID: 29305952 PMCID: PMC5846098 DOI: 10.1016/j.jbi.2017.12.017] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 11/07/2017] [Accepted: 12/31/2017] [Indexed: 01/24/2023]
Abstract
AIMS Despite growing interest in using electronic health records (EHR) to create longitudinal cohort studies, the distribution and missingness of EHR data might introduce selection bias and information bias to such analyses. We aimed to examine the yield and potential for these healthcare process biases in defining a study baseline using EHR data, using the example of cholesterol and blood pressure (BP) measurements. METHODS We created a virtual cohort study of cardiovascular disease (CVD) from patients with eligible cholesterol profiles in the New England (NE) and Southeast (SE) networks of the Veterans Health Administration in the United States. Using clinical data from the EHR, we plotted the yield of patients with BP measurements within an expanding timeframe around an index date of cholesterol testing. We compared three groups: (1) patients with BP from the exact index date; (2) patients with BP not on the index date but within the network-specific 90th percentile around the index date; and (3) patients with no BP within the network-specific 90th percentile. RESULTS Among 589,361 total patients in the two networks, 146,636 (61.0%) of 240,479 patients from NE and 289,906 (83.1%) of 348,882 patients from SE had BP measurements on the index date. Ninety percent had BP measured within 11 days of the index date in NE and within 5 days of the index date in SE. Group 3 in both networks had fewer available race data, fewer comorbidities and CVD medications, and fewer health system encounters. CONCLUSIONS Requiring same-day risk factor measurement in the creation of a virtual CVD cohort study from EHR data might exclude 40% of eligible patients, but including patients with infrequent visits might introduce bias. Data visualization can inform study-specific strategies to address these challenges for the research use of EHR data.
Collapse
Affiliation(s)
- Jason L Vassy
- VA Boston Healthcare System, Boston, MA, USA; Division of General Internal Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA.
| | - Yuk-Lam Ho
- VA Boston Healthcare System, Boston, MA, USA
| | | | - Kelly Cho
- VA Boston Healthcare System, Boston, MA, USA; Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - J Michael Gaziano
- VA Boston Healthcare System, Boston, MA, USA; Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Peter W F Wilson
- Atlanta VA Medical Center, Atlanta, GA, USA; Emory University Schools of Medicine and Public Health, Atlanta, GA, USA
| | - David R Gagnon
- VA Boston Healthcare System, Boston, MA, USA; Boston University School of Public Health, Boston, MA, USA
| |
Collapse
|
9
|
Oliveira BM, Guimarães RV, Antunes L, Rodrigues PP. Sifting Through Chaos: Extracting Information from Unstructured Legal Opinions. Stud Health Technol Inform 2018; 247:441-445. [PMID: 29677999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Abiding to the law is, in some cases, a delicate balance between the rights of different players. Re-using health records is such a case. While the law grants reuse rights to public administration documents, in which health records produced in public health institutions are included, it also grants privacy to personal records. To safeguard a correct usage of data, public hospitals in Portugal employ jurists that are responsible for allowing or withholding access rights to health records. To help decision making, these jurists can consult the legal opinions issued by the national committee on public administration documents usage. While these legal opinions are of undeniable value, due to their doctrine contribution, they are only available in a format best suited from printing, forcing individual consultation of each document, with no option, whatsoever of clustered search, filtering or indexing, which are standard operations nowadays in a document management system. When having to decide on tens of data requests a day, it becomes unfeasible to consult the hundreds of legal opinions already available. With the objective to create a modern document management system, we devised an open, platform agnostic system that extracts and compiles the legal opinions, ex-tracts its contents and produces metadata, allowing for a fast searching and filtering of said legal opinions.
Collapse
Affiliation(s)
| | - Rui Vasconcellos Guimarães
- MEDCIDS - Community Medicine, Health Information and Decision Department, Faculty of Medicine of the University of Porto, Portugal
| | - Luís Antunes
- INESC-TEC - Faculty of Sciences of the University of Porto, Portugal
| | | |
Collapse
|
10
|
Luo J, Chen W, Wu M, Weng C. Systematic data ingratiation of clinical trial recruitment locations for geographic-based query and visualization. Int J Med Inform 2017; 108:85-91. [PMID: 29132636 PMCID: PMC5866921 DOI: 10.1016/j.ijmedinf.2017.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Revised: 09/27/2017] [Accepted: 10/02/2017] [Indexed: 10/18/2022]
Abstract
BACKGROUND Prior studies of clinical trial planning indicate that it is crucial to search and screen recruitment sites before starting to enroll participants. However, currently there is no systematic method developed to support clinical investigators to search candidate recruitment sites according to their interested clinical trial factors. OBJECTIVE In this study, we aim at developing a new approach to integrating the location data of over one million heterogeneous recruitment sites that are stored in clinical trial documents. The integrated recruitment location data can be searched and visualized using a map-based information retrieval method. The method enables systematic search and analysis of recruitment sites across a large amount of clinical trials. METHODS The location data of more than 1.4 million recruitment sites of over 183,000 clinical trials was normalized and integrated using a geocoding method. The integrated data can be used to support geographic information retrieval of recruitment sites. Additionally, the information of over 6000 clinical trial target disease conditions and close to 4000 interventions was also integrated into the system and linked to the recruitment locations. Such data integration enabled the construction of a novel map-based query system. The system will allow clinical investigators to search and visualize candidate recruitment sites for clinical trials based on target conditions and interventions. RESULTS The evaluation results showed that the coverage of the geographic location mapping for the 1.4 million recruitment sites was 99.8%. The evaluation of 200 randomly retrieved recruitment sites showed that the correctness of geographic information mapping was 96.5%. The recruitment intensities of the top 30 countries were also retrieved and analyzed. The data analysis results indicated that the recruitment intensity varied significantly across different countries and geographic areas. CONCLUSION This study contributed a new data processing framework to extract and integrate the location data of heterogeneous recruitment sites from clinical trial documents. The developed system can support effective retrieval and analysis of potential recruitment sites using target clinical trial factors.
Collapse
Affiliation(s)
- Jake Luo
- Department of Health Informatics and Administration, University of Wisconsin Milwaukee, Milwaukee, WI,United States; Biomedical Data and Language Processing Center, University of Wisconsin Milwaukee, Milwaukee, WI, United States
| | - Weiheng Chen
- Department of Health Informatics and Administration, University of Wisconsin Milwaukee, Milwaukee, WI,United States; Biomedical Data and Language Processing Center, University of Wisconsin Milwaukee, Milwaukee, WI, United States
| | - Min Wu
- Department of Health Informatics and Administration, University of Wisconsin Milwaukee, Milwaukee, WI,United States; Biomedical Data and Language Processing Center, University of Wisconsin Milwaukee, Milwaukee, WI, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York City, NY, United States
| |
Collapse
|
11
|
Alonso-Calvo R, Paraiso-Medina S, Perez-Rey D, Alonso-Oset E, van Stiphout R, Yu S, Taylor M, Buffa F, Fernandez-Lozano C, Pazos A, Maojo V. A semantic interoperability approach to support integration of gene expression and clinical data in breast cancer. Comput Biol Med 2017; 87:179-186. [PMID: 28601027 DOI: 10.1016/j.compbiomed.2017.06.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Revised: 05/30/2017] [Accepted: 06/02/2017] [Indexed: 11/19/2022]
Abstract
INTRODUCTION The introduction of omics data and advances in technologies involved in clinical treatment has led to a broad range of approaches to represent clinical information. Within this context, patient stratification across health institutions due to omic profiling presents a complex scenario to carry out multi-center clinical trials. METHODS This paper presents a standards-based approach to ensure semantic integration required to facilitate the analysis of clinico-genomic clinical trials. To ensure interoperability across different institutions, we have developed a Semantic Interoperability Layer (SIL) to facilitate homogeneous access to clinical and genetic information, based on different well-established biomedical standards and following International Health (IHE) recommendations. RESULTS The SIL has shown suitability for integrating biomedical knowledge and technologies to match the latest clinical advances in healthcare and the use of genomic information. This genomic data integration in the SIL has been tested with a diagnostic classifier tool that takes advantage of harmonized multi-center clinico-genomic data for training statistical predictive models. CONCLUSIONS The SIL has been adopted in national and international research initiatives, such as the EURECA-EU research project and the CIMED collaborative Spanish project, where the proposed solution has been applied and evaluated by clinical experts focused on clinico-genomic studies.
Collapse
Affiliation(s)
- Raul Alonso-Calvo
- Biomedical Informatics Group, DIA & DLSIIS, ETSI Informáticos, Universidad Politécnica de Madrid, Spain.
| | - Sergio Paraiso-Medina
- Biomedical Informatics Group, DIA & DLSIIS, ETSI Informáticos, Universidad Politécnica de Madrid, Spain.
| | - David Perez-Rey
- Biomedical Informatics Group, DIA & DLSIIS, ETSI Informáticos, Universidad Politécnica de Madrid, Spain.
| | - Enrique Alonso-Oset
- Biomedical Informatics Group, DIA & DLSIIS, ETSI Informáticos, Universidad Politécnica de Madrid, Spain.
| | - Ruud van Stiphout
- Department of Oncology, Old Road Campus Research Building, Oxford, OX3 7DQ, United Kingdom.
| | - Sheng Yu
- Department of Oncology, Old Road Campus Research Building, Oxford, OX3 7DQ, United Kingdom.
| | - Marian Taylor
- Department of Oncology, Old Road Campus Research Building, Oxford, OX3 7DQ, United Kingdom.
| | - Francesca Buffa
- Department of Oncology, Old Road Campus Research Building, Oxford, OX3 7DQ, United Kingdom.
| | - Carlos Fernandez-Lozano
- Department of Information and Communication Technologies, Faculty of Computer Science, University of A Coruna, 15071, A Coruña, Spain.
| | - Alejandro Pazos
- Department of Information and Communication Technologies, Faculty of Computer Science, University of A Coruna, 15071, A Coruña, Spain.
| | - Victor Maojo
- Biomedical Informatics Group, DIA & DLSIIS, ETSI Informáticos, Universidad Politécnica de Madrid, Spain.
| |
Collapse
|
12
|
Abstract
OBJECTIVES To summarize significant developments in Clinical Research Informatics (CRI) over the past two years and discuss future directions. METHODS Survey of advances, open problems and opportunities in this field based on exploration of current literature. RESULTS Recent advances are structured according to three use cases of clinical research: Protocol feasibility, patient identification/ recruitment and clinical trial execution. DISCUSSION CRI is an evolving, dynamic field of research. Global collaboration, open metadata, content standards with semantics and computable eligibility criteria are key success factors for future developments in CRI.
Collapse
Affiliation(s)
- M Dugas
- Prof. Dr. Martin Dugas, Institute of Medical Informatics, University of Münster, Albert-Schweitzer-Campus 1
- A11, D-48149 Münster, Germany, Tel: +49 251 83 55262, E-mail:
| |
Collapse
|
13
|
Yamamoto K, Ota K, Akiya I, Shintani A. A pragmatic method for transforming clinical research data from the research electronic data capture "REDCap" to Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model (SDTM): Development and evaluation of REDCap2SDTM. J Biomed Inform 2017; 70:65-76. [PMID: 28487263 DOI: 10.1016/j.jbi.2017.05.003] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Revised: 04/29/2017] [Accepted: 05/04/2017] [Indexed: 10/19/2022]
Abstract
The Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model (SDTM) can be used for new drug application studies as well as secondarily for creating a clinical research data warehouse to leverage clinical research study data across studies conducted within the same disease area. However, currently not all clinical research uses Clinical Data Acquisition Standards Harmonization (CDASH) beginning in the set-up phase of the study. Once already initiated, clinical studies that have not utilized CDASH are difficult to map in the SDTM format. In addition, most electronic data capture (EDC) systems are not equipped to export data in SDTM format; therefore, in many cases, statistical software is used to generate SDTM datasets from accumulated clinical data. In order to facilitate efficient secondary use of accumulated clinical research data using SDTM, it is necessary to develop a new tool to enable mapping of information for SDTM, even during or after the clinical research. REDCap is an EDC system developed by Vanderbilt University and is used globally by over 2100 institutions across 108 countries. In this study, we developed a simulated clinical trial to evaluate a tool called REDCap2SDTM that maps information in the Field Annotation of REDCap to SDTM and executes data conversion, including when data must be pivoted to accommodate the SDTM format, dynamically, by parsing the mapping information using R. We confirmed that generating SDTM data and the define.xml file from REDCap using REDCap2SDTM was possible. Conventionally, generation of SDTM data and the define.xml file from EDC systems requires the creation of individual programs for each clinical study. However, our proposed method can be used to generate this data and file dynamically without programming because it only involves entering the mapping information into the Field Annotation, and additional data into specific files. Our proposed method is adaptable not only to new drug application studies but also to all types of research, including observational and public health studies. Our method is also adaptable to clinical data collected with CDASH at the beginning of a study in non-standard format. We believe that this tool will reduce the workload of new drug application studies and will support data sharing and reuse of clinical research data in academia.
Collapse
Affiliation(s)
- Keiichi Yamamoto
- REDCap Group, Department of Medical Innovation, Osaka University Hospital, Osaka 565-0871, Japan.
| | - Keiko Ota
- REDCap Group, Department of Medical Innovation, Osaka University Hospital, Osaka 565-0871, Japan
| | | | - Ayumi Shintani
- REDCap Group, Department of Medical Innovation, Osaka University Hospital, Osaka 565-0871, Japan; Department of Clinical Epidemiology and Biostatistics, Graduate School of Medicine, Osaka University, Osaka 565-0871, Japan
| |
Collapse
|
14
|
Menti E, Lanera C, Lorenzoni G, Giachino DF, Marchi MD, Gregori D, Berchialla P. Bayesian Machine Learning Techniques for revealing complex interactions among genetic and clinical factors in association with extra-intestinal Manifestations in IBD patients. AMIA Annu Symp Proc 2017; 2016:884-893. [PMID: 28269885 PMCID: PMC5333221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The objective of the study is to assess the predictive performance of three different techniques as classifiers for extra-intestinal manifestations in 152 patients with Crohn's disease. Naïve Bayes, Bayesian Additive Regression Trees and Bayesian Networks implemented using a Greedy Thick Thinning algorithm for learning dependencies among variables and EM algorithm for learning conditional probabilities associated to each variable are taken into account. Three sets of variables were considered: (i) disease characteristics: presentation, behavior and location (ii) risk factors: age, gender, smoke and familiarity and (iii) genetic polymorphisms of the NOD2, CD14, TNFA, IL12B, and IL1RN genes, whose involvement in Crohn's disease is known or suspected. Extra-intestinal manifestations occurred in 75 patients. Bayesian Networks achieved accuracy of 82% when considering only clinical factors and 89% when considering also genetic information, outperforming the other techniques. CD14 has a small predicting capability. Adding TNFA, IL12B to the 3020insC NOD2 variant improved the accuracy.
Collapse
Affiliation(s)
- E Menti
- Unit of Biostatistics, Epidemiology and Public Health, University of Padova, Italy
| | - C Lanera
- Unit of Biostatistics, Epidemiology and Public Health, University of Padova, Italy
| | - G Lorenzoni
- Unit of Biostatistics, Epidemiology and Public Health, University of Padova, Italy
| | - Daniela F Giachino
- Medical Genetics Unit, Department of Clinical and Biological Sciences, University of Torino, Italy
| | - Mario De Marchi
- Medical Genetics Unit, Department of Clinical and Biological Sciences, University of Torino, Italy
| | - Dario Gregori
- Unit of Biostatistics, Epidemiology and Public Health, University of Padova, Italy
| | - Paola Berchialla
- Medical Statistics Unit, Department of Clinical and Biological Sciences, University of Torino, Italy
| |
Collapse
|
15
|
Dunn WD, Cobb J, Levey AI, Gutman DA. REDLetr: Workflow and tools to support the migration of legacy clinical data capture systems to REDCap. Int J Med Inform 2016; 93:103-10. [PMID: 27396629 PMCID: PMC5452680 DOI: 10.1016/j.ijmedinf.2016.06.015] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Revised: 06/21/2016] [Accepted: 06/26/2016] [Indexed: 11/20/2022]
Abstract
OBJECTIVE A memory clinic at an academic medical center has relied on several ad hoc data capture systems including Microsoft Access and Excel for cognitive assessments over the last several years. However these solutions are challenging to maintain and limit the potential of hypothesis-driven or longitudinal research. REDCap, a secure web application based on PHP and MySQL, is a practical solution for improving data capture and organization. Here, we present a workflow and toolset to facilitate legacy data migration and real-time clinical research data collection into REDCap as well as challenges encountered. MATERIALS AND METHODS Legacy data consisted of neuropsychological tests stored in over 4000 Excel workbooks. Functions for data extraction, norm scoring, converting to REDCap-compatible formats, accessing the REDCap API, and clinical report generation were developed and executed in Python. RESULTS Over 400 unique data points for each workbook were migrated and integrated into our REDCap database. Moving forward, our REDCap-based system replaces the Excel-based data collection method as well as eases the integration into the standard clinical research workflow and Electronic Health Record. CONCLUSION In the age of growing data, efficient organization and storage of clinical and research data is critical for advancing research and providing efficient patient care. We believe that the workflow and tools described in this work to promote legacy data integration as well as real time data collection into REDCap ultimately facilitate these goals.
Collapse
Affiliation(s)
- William D Dunn
- Department of Neurology, Emory University, Atlanta, GA, USA; Department of Biomedical Informatics, Emory University, Atlanta, GA, USA
| | - Jake Cobb
- College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
| | - Allan I Levey
- Department of Neurology, Emory University, Atlanta, GA, USA
| | - David A Gutman
- Department of Neurology, Emory University, Atlanta, GA, USA; Department of Biomedical Informatics, Emory University, Atlanta, GA, USA.
| |
Collapse
|
16
|
Abstract
Translational bioinformatics and clinical research (biomedical) informatics are the primary domains related to informatics activities that support translational research. Translational bioinformatics focuses on computational techniques in genetics, molecular biology, and systems biology. Clinical research (biomedical) informatics involves the use of informatics in discovery and management of new knowledge relating to health and disease. This article details 3 projects that are hybrid applications of translational bioinformatics and clinical research (biomedical) informatics: The Cancer Genome Atlas, the cBioPortal for Cancer Genomics, and the Memorial Sloan Kettering Cancer Center clinical variants and results database, all designed to facilitate insights into cancer biology and clinical/therapeutic correlations.
Collapse
Affiliation(s)
- S Joseph Sirintrapun
- Department of Pathology, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA.
| | - Ahmet Zehir
- Memorial Sloan Kettering Cancer Center, 417 East 68th Street, New York, NY 10065, USA
| | - Aijazuddin Syed
- Memorial Sloan Kettering Cancer Center, 417 East 68th Street, New York, NY 10065, USA
| | - JianJiong Gao
- Memorial Sloan Kettering Cancer Center, 417 East 68th Street, New York, NY 10065, USA
| | - Nikolaus Schultz
- Memorial Sloan Kettering Cancer Center, 417 East 68th Street, New York, NY 10065, USA
| | - Donavan T Cheng
- Memorial Sloan Kettering Cancer Center, 417 East 68th Street, New York, NY 10065, USA
| |
Collapse
|
17
|
Tien M, Kashyap R, Wilson GA, Hernandez-Torres V, Jacob AK, Schroeder DR, Mantilla CB. Retrospective Derivation and Validation of an Automated Electronic Search Algorithm to Identify Post Operative Cardiovascular and Thromboembolic Complications. Appl Clin Inform 2015; 6:565-76. [PMID: 26448798 DOI: 10.4338/aci-2015-03-ra-0026] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 07/28/2015] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND With increasing numbers of hospitals adopting electronic medical records, electronic search algorithms for identifying postoperative complications can be invaluable tools to expedite data abstraction and clinical research to improve patient outcomes. OBJECTIVES To derive and validate an electronic search algorithm to identify postoperative thromboembolic and cardiovascular complications such as deep venous thrombosis, pulmonary embolism, or myocardial infarction within 30 days of total hip or knee arthroplasty. METHODS A total of 34 517 patients undergoing total hip or knee arthroplasty between January 1, 1996 and December 31, 2013 were identified. Using a derivation cohort of 418 patients, several iterations of a free-text electronic search were developed and refined for each complication. Subsequently, the automated search algorithm was validated on an independent cohort of 2 857 patients, and the sensitivity and specificities were compared to the results of manual chart review. RESULTS In the final derivation subset, the automated search algorithm achieved a sensitivity of 91% and specificity of 85% for deep vein thrombosis, a sensitivity of 96% and specificity of 100% for pulmonary embolism, and a sensitivity of 100% and specificity of 95% for myocardial infarction. When applied to the validation cohort, the search algorithm achieved a sensitivity of 97% and specificity of 99% for deep vein thrombosis, a sensitivity of 97% and specificity of 100% for pulmonary embolism, and a sensitivity of 100% and specificity of 99% for myocardial infarction. CONCLUSIONS The derivation and validation of an electronic search strategy can accelerate the data abstraction process for research, quality improvement, and enhancement of patient care, while maintaining superb reliability compared to manual review.
Collapse
Affiliation(s)
- M Tien
- Mayo Clinic, College of Medicine , Rochester, MN, United States
| | - R Kashyap
- Mayo Clinic , Department of Anesthesiology, Rochester, MN, United States
| | - G A Wilson
- Mayo Clinic , Division of Pulmonary and Critical Care Medicine, Rochester, MN, United States
| | - V Hernandez-Torres
- Mayo Clinic , Department of Anesthesiology, Rochester, MN, United States
| | - A K Jacob
- Mayo Clinic , Department of Anesthesiology, Rochester, MN, United States
| | - D R Schroeder
- Mayo Clinic, Health Sciences Research - Biomedical Statistics and Informatics , Rochester, MN, United States
| | - C B Mantilla
- Mayo Clinic , Department of Anesthesiology, Rochester, MN, United States
| |
Collapse
|
18
|
Huser V, Sastry C, Breymaier M, Idriss A, Cimino JJ. Standardizing data exchange for clinical research protocols and case report forms: An assessment of the suitability of the Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM). J Biomed Inform 2015; 57:88-99. [PMID: 26188274 DOI: 10.1016/j.jbi.2015.06.023] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2014] [Revised: 03/27/2015] [Accepted: 06/26/2015] [Indexed: 01/27/2023]
Abstract
Efficient communication of a clinical study protocol and case report forms during all stages of a human clinical study is important for many stakeholders. An electronic and structured study representation format that can be used throughout the whole study life-span can improve such communication and potentially lower total study costs. The most relevant standard for representing clinical study data, applicable to unregulated as well as regulated studies, is the Operational Data Model (ODM) in development since 1999 by the Clinical Data Interchange Standards Consortium (CDISC). ODM's initial objective was exchange of case report forms data but it is increasingly utilized in other contexts. An ODM extension called Study Design Model, introduced in 2011, provides additional protocol representation elements. Using a case study approach, we evaluated ODM's ability to capture all necessary protocol elements during a complete clinical study lifecycle in the Intramural Research Program of the National Institutes of Health. ODM offers the advantage of a single format for institutions that deal with hundreds or thousands of concurrent clinical studies and maintain a data warehouse for these studies. For each study stage, we present a list of gaps in the ODM standard and identify necessary vendor or institutional extensions that can compensate for such gaps. The current version of ODM (1.3.2) has only partial support for study protocol and study registration data mainly because it is outside the original development goal. ODM provides comprehensive support for representation of case report forms (in both the design stage and with patient level data). Inclusion of requirements of observational, non-regulated or investigator-initiated studies (outside Food and Drug Administration (FDA) regulation) can further improve future revisions of the standard.
Collapse
|
19
|
Leskošek B, Pajntar M. Lightweight application for generating clinical research information systems: MAGIC. Wien Klin Wochenschr 2015; 127 Suppl 5:S228-34. [PMID: 25994874 DOI: 10.1007/s00508-015-0794-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 04/20/2015] [Indexed: 12/01/2022]
Abstract
BACKGROUND Our purpose was to build and test a lightweight solution for generating clinical research information systems (CRIS) that would allow non-IT professionals with basic knowledge of computer usage to quickly define and build a ready-to-use, safe and secure web-based clinical research system for data management. We use the acronym MAGIC (Medical Application Generator InteraCtive) for the system. METHODS The generated CRIS should be very easy to build and use, so a common LAMP (Linux Apache MySQL Perl) platform was used, which also enables short development cycles. The application was built and tested using eXtreme Programming (XP) principles by a small development team consisting of one informatics specialist, one physician and one graphical designer/programmer. RESULTS The parameter and graphical user interface (GUI) definitions for the CRIS can be made by non-IT professionals using an intuitive English-language-like formalism called application definition language (ADL). From these definitions, the MAGIC builds an end-user CRIS that can be used on a wide variety of platforms (from standard workstations to hand-held devices). A working example of a national health-care-quality assessment program is presented to illustrate this process. CONCLUSION The lightweight application for generating CRIS (MAGIC) has proven to be useful for both clinical and analytical users in real working environment. To achieve better performance and interoperability, we are planning to recompile the application using XML schemas (XSD) in HL7 CDA or openEHR archetypes formats used for parameters definition and for data interchange between different information systems.
Collapse
Affiliation(s)
- Brane Leskošek
- Faculty of Medicine, University of Maribor, Taborska ulica 8, 2000, Maribor, Slovenia. .,Faculty of Medicine, Institute for Biostatistics and Medical Informatics, University of Ljubljana, Vrazov trg 2, 1000, Ljubljana, Slovenia.
| | - Marjan Pajntar
- Faculty of Medicine, University of Ljubljana, Vrazov trg 2, 1000, Ljubljana, Slovenia
| |
Collapse
|
20
|
Abstract
Translational bioinformatics and clinical research (biomedical) informatics are the primary domains related to informatics activities that support translational research. Translational bioinformatics focuses on computational techniques in genetics, molecular biology, and systems biology. Clinical research (biomedical) informatics involves the use of informatics in discovery and management of new knowledge relating to health and disease. This article details 3 projects that are hybrid applications of translational bioinformatics and clinical research (biomedical) informatics: The Cancer Genome Atlas, the cBioPortal for Cancer Genomics, and the Memorial Sloan Kettering Cancer Center clinical variants and results database, all designed to facilitate insights into cancer biology and clinical/therapeutic correlations.
Collapse
Affiliation(s)
- S Joseph Sirintrapun
- Department of Pathology, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA.
| | - Ahmet Zehir
- Memorial Sloan Kettering Cancer Center, 417 East 68th Street, New York, NY 10065, USA
| | - Aijazuddin Syed
- Memorial Sloan Kettering Cancer Center, 417 East 68th Street, New York, NY 10065, USA
| | - JianJiong Gao
- Memorial Sloan Kettering Cancer Center, 417 East 68th Street, New York, NY 10065, USA
| | - Nikolaus Schultz
- Memorial Sloan Kettering Cancer Center, 417 East 68th Street, New York, NY 10065, USA
| | - Donavan T Cheng
- Memorial Sloan Kettering Cancer Center, 417 East 68th Street, New York, NY 10065, USA
| |
Collapse
|
21
|
Alonso-Calvo R, Perez-Rey D, Paraiso-Medina S, Claerhout B, Hennebert P, Bucur A. Enabling semantic interoperability in multi-centric clinical trials on breast cancer. Comput Methods Programs Biomed 2015; 118:322-329. [PMID: 25682737 DOI: 10.1016/j.cmpb.2015.01.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2014] [Revised: 12/10/2014] [Accepted: 01/23/2015] [Indexed: 06/04/2023]
Abstract
BACKGROUND AND OBJECTIVES Post-genomic clinical trials require the participation of multiple institutions, and collecting data from several hospitals, laboratories and research facilities. This paper presents a standard-based solution to provide a uniform access endpoint to patient data involved in current clinical research. METHODS The proposed approach exploits well-established standards such as HL7 v3 or SPARQL and medical vocabularies such as SNOMED CT, LOINC and HGNC. A novel mechanism to exploit semantic normalization among HL7-based data models and biomedical ontologies has been created by using Semantic Web technologies. RESULTS Different types of queries have been used for testing the semantic interoperability solution described in this paper. The execution times obtained in the tests enable the development of end user tools within a framework that requires efficient retrieval of integrated data. CONCLUSIONS The proposed approach has been successfully tested by applications within the INTEGRATE and EURECA EU projects. These applications have been deployed and tested for: (i) patient screening, (ii) trial recruitment, and (iii) retrospective analysis; exploiting semantically interoperable access to clinical patient data from heterogeneous data sources.
Collapse
Affiliation(s)
- Raul Alonso-Calvo
- Biomedical Informatics Group, DLSIIS & DIA, Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, Madrid, Spain.
| | - David Perez-Rey
- Biomedical Informatics Group, DLSIIS & DIA, Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, Madrid, Spain
| | - Sergio Paraiso-Medina
- Biomedical Informatics Group, DLSIIS & DIA, Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, Madrid, Spain
| | - Brecht Claerhout
- Custodix NV, Kortrijksesteenweg 214b3, Sint-Martens-Latem, Belgium
| | | | - Anca Bucur
- PHILIPS Research Europe, High Tech Campus 34, Eindhoven, Netherlands
| |
Collapse
|
22
|
He S, Narus SP, Facelli JC, Lau LM, Botkin JR, Hurdle JF. A domain analysis model for eIRB systems: addressing the weak link in clinical research informatics. J Biomed Inform 2014; 52:121-9. [PMID: 24929181 PMCID: PMC4384433 DOI: 10.1016/j.jbi.2014.05.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2013] [Revised: 02/20/2014] [Accepted: 05/06/2014] [Indexed: 10/25/2022]
Abstract
Institutional Review Boards (IRBs) are a critical component of clinical research and can become a significant bottleneck due to the dramatic increase, in both volume and complexity of clinical research. Despite the interest in developing clinical research informatics (CRI) systems and supporting data standards to increase clinical research efficiency and interoperability, informatics research in the IRB domain has not attracted much attention in the scientific community. The lack of standardized and structured application forms across different IRBs causes inefficient and inconsistent proposal reviews and cumbersome workflows. These issues are even more prominent in multi-institutional clinical research that is rapidly becoming the norm. This paper proposes and evaluates a domain analysis model for electronic IRB (eIRB) systems, paving the way for streamlined clinical research workflow via integration with other CRI systems and improved IRB application throughput via computer-assisted decision support.
Collapse
Affiliation(s)
- Shan He
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.
| | - Scott P Narus
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Intermountain Medical Center, Intermountain Healthcare, Murray, UT, USA
| | - Julio C Facelli
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Lee Min Lau
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; 3M Health Information Systems, Murray, UT, USA
| | - Jefferey R Botkin
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - John F Hurdle
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
23
|
Sim I, Tu SW, Carini S, Lehmann HP, Pollock BH, Peleg M, Wittkowski KM. The Ontology of Clinical Research (OCRe): an informatics foundation for the science of clinical research. J Biomed Inform 2013; 52:78-91. [PMID: 24239612 DOI: 10.1016/j.jbi.2013.11.002] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2013] [Revised: 10/11/2013] [Accepted: 11/03/2013] [Indexed: 11/25/2022]
Abstract
To date, the scientific process for generating, interpreting, and applying knowledge has received less informatics attention than operational processes for conducting clinical studies. The activities of these scientific processes - the science of clinical research - are centered on the study protocol, which is the abstract representation of the scientific design of a clinical study. The Ontology of Clinical Research (OCRe) is an OWL 2 model of the entities and relationships of study design protocols for the purpose of computationally supporting the design and analysis of human studies. OCRe's modeling is independent of any specific study design or clinical domain. It includes a study design typology and a specialized module called ERGO Annotation for capturing the meaning of eligibility criteria. In this paper, we describe the key informatics use cases of each phase of a study's scientific lifecycle, present OCRe and the principles behind its modeling, and describe applications of OCRe and associated technologies to a range of clinical research use cases. OCRe captures the central semantics that underlies the scientific processes of clinical research and can serve as an informatics foundation for supporting the entire range of knowledge activities that constitute the science of clinical research.
Collapse
Affiliation(s)
- Ida Sim
- Department of Medicine, University of California, San Francisco, CA, United States.
| | - Samson W Tu
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States
| | - Simona Carini
- Department of Medicine, University of California, San Francisco, CA, United States
| | - Harold P Lehmann
- Division of Health Sciences Informatics, Johns Hopkins University, Baltimore, MD, United States
| | - Brad H Pollock
- Department of Epidemiology and Biostatistics, University of Texas Health Science Center at San Antonio, San Antonio, TX, United States
| | - Mor Peleg
- Department of Information Systems, University of Haifa, Haifa, Israel
| | - Knut M Wittkowski
- Department of Research Design and Biostatistics, The Rockefeller University, New York, NY, United States
| |
Collapse
|
24
|
Patel VN, Kaelber DC. Using aggregated, de-identified electronic health record data for multivariate pharmacosurveillance: a case study of azathioprine. J Biomed Inform 2013; 52:36-42. [PMID: 24177317 DOI: 10.1016/j.jbi.2013.10.009] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Revised: 09/30/2013] [Accepted: 10/22/2013] [Indexed: 10/26/2022]
Abstract
OBJECTIVE To demonstrate the use of aggregated and de-identified electronic health record (EHR) data for multivariate post-marketing pharmacosurveillance in a case study of azathioprine (AZA). METHODS Using aggregated, standardized, normalized, and de-identified, population-level data from the Explore platform (Explorys, Inc.) we searched over 10 million individuals, of which 14,580 were prescribed AZA based on RxNorm drug orders. Based on logical observation identifiers names and codes (LOINC) and vital sign data, we examined the following side effects: anemia, cell lysis, fever, hepatotoxicity, hypertension, nephrotoxicity, neutropenia, and neutrophilia. Patients prescribed AZA were compared to patients prescribed one of 11 other anti-rheumatologic drugs to determine the relative risk of side effect pairs. RESULTS Compared to AZA case report trends, hepatotoxicity (marked by elevated transaminases or elevated bilirubin) did not occur as an isolated event more frequently in patients prescribed AZA than other anti-rheumatic agents. While neutropenia occurred in 24% of patients (RR 1.15, 95% CI 1.07-1.23), neutrophilia was also frequent (45%) and increased in patients prescribed AZA (RR 1.28, 95% CI 1.22-1.34). After constructing a pairwise side effect network, neutropenia had no dependencies. A reduced risk of neutropenia was found in patients with co-existing elevations in total bilirubin or liver transaminases, supporting classic clinical knowledge that agranulocytosis is a largely unpredictable phenomenon. Rounding errors propagated in the statistically de-identified datasets for cohorts as small as 40 patients only contributed marginally to the calculated risk. CONCLUSION Our work demonstrates that aggregated, standardized, normalized and de-identified population level EHR data can provide both sufficient insight and statistical power to detect potential patterns of medication side effect associations, serving as a multivariate and generalizable approach to post-marketing drug surveillance.
Collapse
Affiliation(s)
- Vishal N Patel
- Center for Clinical Informatics Research and Education, The MetroHealth System, Case Western Reserve University, Cleveland, OH, United States; Center for Proteomics and Bioinformatics, The MetroHealth System, Case Western Reserve University, Cleveland, OH, United States.
| | - David C Kaelber
- Center for Clinical Informatics Research and Education, The MetroHealth System, Case Western Reserve University, Cleveland, OH, United States; Departments of Information Services, The MetroHealth System, Case Western Reserve University, Cleveland, OH, United States; Department of Internal Medicine, The MetroHealth System, Case Western Reserve University, Cleveland, OH, United States; Department of Pediatrics, The MetroHealth System, Case Western Reserve University, Cleveland, OH, United States; Departments of Epidemiology and Biostatistics, The MetroHealth System, Case Western Reserve University, Cleveland, OH, United States
| |
Collapse
|
25
|
Weng C, Li Y, Berhe S, Boland MR, Gao J, Hruby GW, Steinman RC, Lopez-Jimenez C, Busacca L, Hripcsak G, Bakken S, Bigger JT. An Integrated Model for Patient Care and Clinical Trials (IMPACT) to support clinical research visit scheduling workflow for future learning health systems. J Biomed Inform 2013; 46:642-52. [PMID: 23684593 DOI: 10.1016/j.jbi.2013.05.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Revised: 04/06/2013] [Accepted: 05/01/2013] [Indexed: 11/29/2022]
Abstract
We describe a clinical research visit scheduling system that can potentially coordinate clinical research visits with patient care visits and increase efficiency at clinical sites where clinical and research activities occur simultaneously. Participatory Design methods were applied to support requirements engineering and to create this software called Integrated Model for Patient Care and Clinical Trials (IMPACT). Using a multi-user constraint satisfaction and resource optimization algorithm, IMPACT automatically synthesizes temporal availability of various research resources and recommends the optimal dates and times for pending research visits. We conducted scenario-based evaluations with 10 clinical research coordinators (CRCs) from diverse clinical research settings to assess the usefulness, feasibility, and user acceptance of IMPACT. We obtained qualitative feedback using semi-structured interviews with the CRCs. Most CRCs acknowledged the usefulness of IMPACT features. Support for collaboration within research teams and interoperability with electronic health records and clinical trial management systems were highly requested features. Overall, IMPACT received satisfactory user acceptance and proves to be potentially useful for a variety of clinical research settings. Our future work includes comparing the effectiveness of IMPACT with that of existing scheduling solutions on the market and conducting field tests to formally assess user adoption.
Collapse
Affiliation(s)
- Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Payne P, Ervin D, Dhaval R, Borlawsky T, Lai A. TRIAD: The Translational Research Informatics and Data Management Grid. Appl Clin Inform 2011; 2:331-44. [PMID: 23616879 DOI: 10.4338/aci-2011-02-ra-0014] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2011] [Accepted: 06/15/2011] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE Multi-disciplinary and multi-site biomedical research programs frequently require infrastructures capable of enabling the collection, management, analysis, and dissemination of heterogeneous, multi-dimensional, and distributed data and knowledge collections spanning organizational boundaries. We report on the design and initial deployment of an extensible biomedical informatics platform that is intended to address such requirements. METHODS A common approach to distributed data, information, and knowledge management needs in the healthcare and life science settings is the deployment and use of a service-oriented architecture (SOA). Such SOA technologies provide for strongly-typed, semantically annotated, and stateful data and analytical services that can be combined into data and knowledge integration and analysis "pipelines." Using this overall design pattern, we have implemented and evaluated an extensible SOA platform for clinical and translational science applications known as the Translational Research Informatics and Data-management grid (TRIAD). TRIAD is a derivative and extension of the caGrid middleware and has an emphasis on supporting agile "working interoperability" between data, information, and knowledge resources. RESULTS Based upon initial verification and validation studies conducted in the context of a collection of driving clinical and translational research problems, we have been able to demonstrate that TRIAD achieves agile "working interoperability" between distributed data and knowledge sources. CONCLUSION Informed by our initial verification and validation studies, we believe TRIAD provides an example instance of a lightweight and readily adoptable approach to the use of SOA technologies in the clinical and translational research setting. Furthermore, our initial use cases illustrate the importance and efficacy of enabling "working interoperability" in heterogeneous biomedical environments.
Collapse
Affiliation(s)
- P Payne
- The Ohio State University, Department of Biomedical Informatics, Center for IT Innovations in Healthcare, and Center for Clinical and Translational Science , Columbus, OH
| | | | | | | | | |
Collapse
|