1
|
Schweinar A, Wagner F, Klingner C, Festag S, Spreckelsen C, Brodoehl S. Simplifying Multimodal Clinical Research Data Management: Introducing an Integrated and User-friendly Database Concept. Appl Clin Inform 2024; 15:234-249. [PMID: 38301729 PMCID: PMC10972680 DOI: 10.1055/a-2259-0008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/22/2023] [Indexed: 02/03/2024] Open
Abstract
BACKGROUND Clinical research, particularly in scientific data, grapples with the efficient management of multimodal and longitudinal clinical data. Especially in neuroscience, the volume of heterogeneous longitudinal data challenges researchers. While current research data management systems offer rich functionality, they suffer from architectural complexity that makes them difficult to install and maintain and require extensive user training. OBJECTIVES The focus is the development and presentation of a data management approach specifically tailored for clinical researchers involved in active patient care, especially in the neuroscientific environment of German university hospitals. Our design considers the implementation of FAIR (Findable, Accessible, Interoperable, and Reusable) principles and the secure handling of sensitive data in compliance with the General Data Protection Regulation. METHODS We introduce a streamlined database concept, featuring an intuitive graphical interface built on Hypertext Markup Language revision 5 (HTML5)/Cascading Style Sheets (CSS) technology. The system can be effortlessly deployed within local networks, that is, in Microsoft Windows 10 environments. Our design incorporates FAIR principles for effective data management. Moreover, we have streamlined data interchange through established standards like HL7 Clinical Document Architecture (CDA). To ensure data integrity, we have integrated real-time validation mechanisms that cover data type, plausibility, and Clinical Quality Language logic during data import and entry. RESULTS We have developed and evaluated our concept with clinicians using a sample dataset of subjects who visited our memory clinic over a 3-year period and collected several multimodal clinical parameters. A notable advantage is the unified data matrix, which simplifies data aggregation, anonymization, and export. THIS STREAMLINES DATA EXCHANGE AND ENHANCES DATABASE INTEGRATION WITH PLATFORMS LIKE KONSTANZ INFORMATION MINER (KNIME): . CONCLUSION Our approach offers a significant advancement for capturing and managing clinical research data, specifically tailored for small-scale initiatives operating within limited information technology (IT) infrastructures. It is designed for immediate, hassle-free deployment by clinicians and researchers.The database template and precompiled versions of the user interface are available at: https://github.com/stebro01/research_database_sqlite_i2b2.git.
Collapse
Affiliation(s)
- Anna Schweinar
- Biomagnetic Center, University Hospital Jena, Friedrich Schiller University, Jena, Germany
- Else Kröner Graduate School for Medical Students “JSAM,” Jena University Hospital, Jena, Germany
| | - Franziska Wagner
- Biomagnetic Center, University Hospital Jena, Friedrich Schiller University, Jena, Germany
- Department of Neurology, Jena University Hospital, Jena, Germany
| | - Carsten Klingner
- Biomagnetic Center, University Hospital Jena, Friedrich Schiller University, Jena, Germany
- Department of Neurology, Jena University Hospital, Jena, Germany
| | - Sven Festag
- Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital, Jena, Thüringen, Germany
| | - Cord Spreckelsen
- Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital, Jena, Thüringen, Germany
| | - Stefan Brodoehl
- Biomagnetic Center, University Hospital Jena, Friedrich Schiller University, Jena, Germany
- Department of Neurology, Jena University Hospital, Jena, Germany
| |
Collapse
|
2
|
Amirmahani F, Ebrahimi N, Molaei F, Faghihkhorasani F, Jamshidi Goharrizi K, Mirtaghi SM, Borjian‐Boroujeni M, Hamblin MR. Approaches for the integration of big data in translational medicine: single‐cell and computational methods. Ann N Y Acad Sci 2021; 1493:3-28. [DOI: 10.1111/nyas.14544] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 10/31/2020] [Accepted: 11/12/2020] [Indexed: 12/11/2022]
Affiliation(s)
- Farzane Amirmahani
- Genetics Division, Department of Cell and Molecular Biology and Microbiology, Faculty of Science and Technology University of Isfahan Isfahan Iran
| | - Nasim Ebrahimi
- Genetics Division, Department of Cell and Molecular Biology and Microbiology, Faculty of Science and Technology University of Isfahan Isfahan Iran
| | - Fatemeh Molaei
- Department of Anesthesiology, Faculty of Paramedical Jahrom University of Medical Sciences Jahrom Iran
| | | | | | | | | | - Michael R. Hamblin
- Laser Research Centre, Faculty of Health Science University of Johannesburg South Africa
| |
Collapse
|
3
|
Gu W, Yildirimman R, Van der Stuyft E, Verbeeck D, Herzinger S, Satagopam V, Barbosa-Silva A, Schneider R, Lange B, Lehrach H, Guo Y, Henderson D, Rowe A. Data and knowledge management in translational research: implementation of the eTRIKS platform for the IMI OncoTrack consortium. BMC Bioinformatics 2019; 20:164. [PMID: 30935364 PMCID: PMC6444691 DOI: 10.1186/s12859-019-2748-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 03/18/2019] [Indexed: 01/04/2023] Open
Abstract
Background For large international research consortia, such as those funded by the European Union’s Horizon 2020 programme or the Innovative Medicines Initiative, good data coordination practices and tools are essential for the successful collection, organization and analysis of the resulting data. Research consortia are attempting ever more ambitious science to better understand disease, by leveraging technologies such as whole genome sequencing, proteomics, patient-derived biological models and computer-based systems biology simulations. Results The IMI eTRIKS consortium is charged with the task of developing an integrated knowledge management platform capable of supporting the complexity of the data generated by such research programmes. In this paper, using the example of the OncoTrack consortium, we describe a typical use case in translational medicine. The tranSMART knowledge management platform was implemented to support data from observational clinical cohorts, drug response data from cell culture models and drug response data from mouse xenograft tumour models. The high dimensional (omics) data from the molecular analyses of the corresponding biological materials were linked to these collections, so that users could browse and analyse these to derive candidate biomarkers. Conclusions In all these steps, data mapping, linking and preparation are handled automatically by the tranSMART integration platform. Therefore, researchers without specialist data handling skills can focus directly on the scientific questions, without spending undue effort on processing the data and data integration, which are otherwise a burden and the most time-consuming part of translational research data analysis. Electronic supplementary material The online version of this article (10.1186/s12859-019-2748-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wei Gu
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | | | | | | | - Sascha Herzinger
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Venkata Satagopam
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Adriano Barbosa-Silva
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Bodo Lange
- Alacris Theranostics GmbH, Berlin, Germany
| | - Hans Lehrach
- Alacris Theranostics GmbH, Berlin, Germany.,Max Planck Institute for Molecular Genetics, Berlin, Germany.,Dahlem Centre for Genome Research and Medical Systems Biology, Berlin, Germany
| | - Yike Guo
- Data Science Institute, Imperial College London, London, UK
| | | | - Anthony Rowe
- Janssen Research and Development Ltd, High Wycombe, UK.
| | | |
Collapse
|
4
|
Satagopam V, Gu W, Eifes S, Gawron P, Ostaszewski M, Gebel S, Barbosa-Silva A, Balling R, Schneider R. Integration and Visualization of Translational Medicine Data for Better Understanding of Human Diseases. BIG DATA 2016; 4:97-108. [PMID: 27441714 PMCID: PMC4932659 DOI: 10.1089/big.2015.0057] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Translational medicine is a domain turning results of basic life science research into new tools and methods in a clinical environment, for example, as new diagnostics or therapies. Nowadays, the process of translation is supported by large amounts of heterogeneous data ranging from medical data to a whole range of -omics data. It is not only a great opportunity but also a great challenge, as translational medicine big data is difficult to integrate and analyze, and requires the involvement of biomedical experts for the data processing. We show here that visualization and interoperable workflows, combining multiple complex steps, can address at least parts of the challenge. In this article, we present an integrated workflow for exploring, analysis, and interpretation of translational medicine data in the context of human health. Three Web services-tranSMART, a Galaxy Server, and a MINERVA platform-are combined into one big data pipeline. Native visualization capabilities enable the biomedical experts to get a comprehensive overview and control over separate steps of the workflow. The capabilities of tranSMART enable a flexible filtering of multidimensional integrated data sets to create subsets suitable for downstream processing. A Galaxy Server offers visually aided construction of analytical pipelines, with the use of existing or custom components. A MINERVA platform supports the exploration of health and disease-related mechanisms in a contextualized analytical visualization system. We demonstrate the utility of our workflow by illustrating its subsequent steps using an existing data set, for which we propose a filtering scheme, an analytical pipeline, and a corresponding visualization of analytical results. The workflow is available as a sandbox environment, where readers can work with the described setup themselves. Overall, our work shows how visualization and interfacing of big data processing services facilitate exploration, analysis, and interpretation of translational medicine data.
Collapse
Affiliation(s)
- Venkata Satagopam
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Wei Gu
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Serge Eifes
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
- Information Technology for Translational Medicine (ITTM) S.A., Esch-Belval, Luxembourg
| | - Piotr Gawron
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Stephan Gebel
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Adriano Barbosa-Silva
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Rudi Balling
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| |
Collapse
|
5
|
Meeker D, Jiang X, Matheny ME, Farcas C, D'Arcy M, Pearlman L, Nookala L, Day ME, Kim KK, Kim H, Boxwala A, El-Kareh R, Kuo GM, Resnic FS, Kesselman C, Ohno-Machado L. A system to build distributed multivariate models and manage disparate data sharing policies: implementation in the scalable national network for effectiveness research. J Am Med Inform Assoc 2015; 22:1187-95. [PMID: 26142423 PMCID: PMC4639714 DOI: 10.1093/jamia/ocv017] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Accepted: 02/18/2015] [Indexed: 11/29/2022] Open
Abstract
Background Centralized and federated models for sharing data in research networks currently exist. To build multivariate data analysis for centralized networks, transfer of patient-level data to a central computation resource is necessary. The authors implemented distributed multivariate models for federated networks in which patient-level data is kept at each site and data exchange policies are managed in a study-centric manner. Objective The objective was to implement infrastructure that supports the functionality of some existing research networks (e.g., cohort discovery, workflow management, and estimation of multivariate analytic models on centralized data) while adding additional important new features, such as algorithms for distributed iterative multivariate models, a graphical interface for multivariate model specification, synchronous and asynchronous response to network queries, investigator-initiated studies, and study-based control of staff, protocols, and data sharing policies. Materials and Methods Based on the requirements gathered from statisticians, administrators, and investigators from multiple institutions, the authors developed infrastructure and tools to support multisite comparative effectiveness studies using web services for multivariate statistical estimation in the SCANNER federated network. Results The authors implemented massively parallel (map-reduce) computation methods and a new policy management system to enable each study initiated by network participants to define the ways in which data may be processed, managed, queried, and shared. The authors illustrated the use of these systems among institutions with highly different policies and operating under different state laws. Discussion and Conclusion Federated research networks need not limit distributed query functionality to count queries, cohort discovery, or independently estimated analytic models. Multivariate analyses can be efficiently and securely conducted without patient-level data transport, allowing institutions with strict local data storage requirements to participate in sophisticated analyses based on federated research networks.
Collapse
Affiliation(s)
- Daniella Meeker
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA Information Sciences Institute, University of Southern California, Marina Del Rey, CA
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| | - Michael E Matheny
- Geriatrics Research, Education, and Clinical Care Service Department of Biomedical Informatics, Division of General Internal Medicine, Department of Biostatistics
| | - Claudiu Farcas
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| | - Michel D'Arcy
- Information Sciences Institute, University of Southern California, Marina Del Rey, CA
| | - Laura Pearlman
- Information Sciences Institute, University of Southern California, Marina Del Rey, CA
| | | | - Michele E Day
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| | - Katherine K Kim
- Department of Pathology and Laboratory Medicine and Department of Internal Medicine, University of California Davis, Sacramento, CA
| | - Hyeoneui Kim
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| | - Aziz Boxwala
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| | - Robert El-Kareh
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| | - Grace M Kuo
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego
| | | | - Carl Kesselman
- Information Sciences Institute, University of Southern California, Marina Del Rey, CA
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| |
Collapse
|
6
|
Hazlehurst BL, Kurtz SE, Masica A, Stevens VJ, McBurnie MA, Puro JE, Vijayadeva V, Au DH, Brannon ED, Sittig DF. CER Hub: An informatics platform for conducting comparative effectiveness research using multi-institutional, heterogeneous, electronic clinical data. Int J Med Inform 2015; 84:763-73. [PMID: 26138036 DOI: 10.1016/j.ijmedinf.2015.06.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Revised: 02/17/2015] [Accepted: 06/02/2015] [Indexed: 02/08/2023]
Abstract
OBJECTIVES Comparative effectiveness research (CER) requires the capture and analysis of data from disparate sources, often from a variety of institutions with diverse electronic health record (EHR) implementations. In this paper we describe the CER Hub, a web-based informatics platform for developing and conducting research studies that combine comprehensive electronic clinical data from multiple health care organizations. METHODS The CER Hub platform implements a data processing pipeline that employs informatics standards for data representation and web-based tools for developing study-specific data processing applications, providing standardized access to the patient-centric electronic health record (EHR) across organizations. RESULTS The CER Hub is being used to conduct two CER studies utilizing data from six geographically distributed and demographically diverse health systems. These foundational studies address the effectiveness of medications for controlling asthma and the effectiveness of smoking cessation services delivered in primary care. DISCUSSION The CER Hub includes four key capabilities: the ability to process and analyze both free-text and coded clinical data in the EHR; a data processing environment supported by distributed data and study governance processes; a clinical data-interchange format for facilitating standardized extraction of clinical data from EHRs; and a library of shareable clinical data processing applications. CONCLUSION CER requires coordinated and scalable methods for extracting, aggregating, and analyzing complex, multi-institutional clinical data. By offering a range of informatics tools integrated into a framework for conducting studies using EHR data, the CER Hub provides a solution to the challenges of multi-institutional research using electronic medical record data.
Collapse
Affiliation(s)
- Brian L Hazlehurst
- Kaiser Permanente Northwest, Center for Health Research, Portland, OR, USA.
| | - Stephen E Kurtz
- Kaiser Permanente Northwest, Center for Health Research, Portland, OR, USA
| | - Andrew Masica
- Baylor Scott & White Health, Center for Clinical Effectiveness, Dallas, TX, USA
| | - Victor J Stevens
- Kaiser Permanente Northwest, Center for Health Research, Portland, OR, USA
| | - Mary Ann McBurnie
- Kaiser Permanente Northwest, Center for Health Research, Portland, OR, USA
| | | | | | - David H Au
- VA Puget Sound Health Care System, Seattle, WA, USA
| | | | - Dean F Sittig
- University of Texas Health Science Center, School of Biomedical Informatics, Houston, TX, USA
| |
Collapse
|
7
|
Rasmussen LV, Kiefer RC, Mo H, Speltz P, Thompson WK, Jiang G, Pacheco JA, Xu J, Zhu Q, Denny JC, Montague E, Pathak J. A Modular Architecture for Electronic Health Record-Driven Phenotyping. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2015; 2015:147-51. [PMID: 26306258 PMCID: PMC4525215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Increasing interest in and experience with electronic health record (EHR)-driven phenotyping has yielded multiple challenges that are at present only partially addressed. Many solutions require the adoption of a single software platform, often with an additional cost of mapping existing patient and phenotypic data to multiple representations. We propose a set of guiding design principles and a modular software architecture to bridge the gap to a standardized phenotype representation, dissemination and execution. Ongoing development leveraging this proposed architecture has shown its ability to address existing limitations.
Collapse
Affiliation(s)
| | | | - Huan Mo
- Vanderbilt University, Nashville, TN
| | | | | | | | | | - Jie Xu
- Northwestern University, Chicago, IL
| | - Qian Zhu
- University of Maryland Baltimore County, Baltimore, MD
| | | | | | | |
Collapse
|
8
|
Frey LJ, Sward KA, Newth CJL, Khemani RG, Cryer ME, Thelen JL, Enriquez R, Shaoyu S, Pollack MM, Harrison RE, Meert KL, Berg RA, Wessel DL, Shanley TP, Dalton H, Carcillo J, Jenkins TL, Dean JM. Virtualization of open-source secure web services to support data exchange in a pediatric critical care research network. J Am Med Inform Assoc 2015; 22:1271-6. [PMID: 25796596 DOI: 10.1093/jamia/ocv009] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2014] [Accepted: 01/21/2015] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVES To examine the feasibility of deploying a virtual web service for sharing data within a research network, and to evaluate the impact on data consistency and quality. MATERIAL AND METHODS Virtual machines (VMs) encapsulated an open-source, semantically and syntactically interoperable secure web service infrastructure along with a shadow database. The VMs were deployed to 8 Collaborative Pediatric Critical Care Research Network Clinical Centers. RESULTS Virtual web services could be deployed in hours. The interoperability of the web services reduced format misalignment from 56% to 1% and demonstrated that 99% of the data consistently transferred using the data dictionary and 1% needed human curation. CONCLUSIONS Use of virtualized open-source secure web service technology could enable direct electronic abstraction of data from hospital databases for research purposes.
Collapse
Affiliation(s)
- Lewis J Frey
- Biomedical Informatics Center, Department Public Health Sciences, Medical University of South Carolina, Charleston, USA
| | - Katherine A Sward
- College of Nursing; Department of Biomedical Informatics, University of Utah, Salt Lake City, USA
| | - Christopher J L Newth
- USC Keck School of Medicine; Department of Anesthesiology and Critical Care Medicine, Children's Hospital Los Angeles, Los Angeles, USA
| | - Robinder G Khemani
- USC Keck School of Medicine; Department of Anesthesiology and Critical Care Medicine, Children's Hospital Los Angeles, Los Angeles, USA
| | - Martin E Cryer
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, USA
| | - Julie L Thelen
- Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, USA
| | - Rene Enriquez
- Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, USA
| | - Su Shaoyu
- Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, USA
| | - Murray M Pollack
- Phoenix Children's Hospital, Department of Pediatrics, University of Arizona Phoenix, Phoenix, USA
| | - Rick E Harrison
- Department of Pediatrics, University of California at Los Angeles, Los Angeles, USA
| | - Kathleen L Meert
- Department of Pediatrics, Children's Hospital of Michigan, Detroit, USA
| | - Robert A Berg
- Department of Anesthesiology and Critical Care, The Children's Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, Philadelphia, USA
| | - David L Wessel
- Department of Pediatrics, Children's National Medical Center, Washington, DC, USA
| | - Thomas P Shanley
- Department of Pediatrics, University of Michigan, Ann Arbor, USA
| | - Heidi Dalton
- Department of Child Health, Phoenix Children's Hospital, University of Arizona College of Medicine-Phoenix, Phoenix, USA
| | - Joseph Carcillo
- Department of Critical Care Medicine, Children's Hospital of Pittsburgh, Pittsburgh, USA
| | - Tammara L Jenkins
- Eunice Kennedy Shriver National Institutes of Child Health and Human Development (NICHD), National Institutes of Health, Bethesda, USA
| | - J Michael Dean
- Department of Pediatrics, Division of Pediatric Critical Care Medicine, University of Utah School of Medicine; NICHD Collaborative Pediatric Critical Care Research Network, Salt Lake City, USA
| |
Collapse
|
9
|
Christoph J, Griebel L, Leb I, Engel I, Köpcke F, Toddenroth D, Prokosch HU, Laufer J, Marquardt K, Sedlmayr M. Secure Secondary Use of Clinical Data with Cloud-based NLP Services. Towards a Highly Scalable Research Infrastructure. Methods Inf Med 2014; 54:276-82. [PMID: 25377309 DOI: 10.3414/me13-01-0133] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2013] [Accepted: 10/08/2014] [Indexed: 01/26/2023]
Abstract
OBJECTIVES The secondary use of clinical data provides large opportunities for clinical and translational research as well as quality assurance projects. For such purposes, it is necessary to provide a flexible and scalable infrastructure that is compliant with privacy requirements. The major goals of the cloud4health project are to define such an architecture, to implement a technical prototype that fulfills these requirements and to evaluate it with three use cases. METHODS The architecture provides components for multiple data provider sites such as hospitals to extract free text as well as structured data from local sources and de-identify such data for further anonymous or pseudonymous processing. Free text documentation is analyzed and transformed into structured information by text-mining services, which are provided within a cloud-computing environment. Thus, newly gained annotations can be integrated along with the already available structured data items and the resulting data sets can be uploaded to a central study portal for further analysis. RESULTS Based on the architecture design, a prototype has been implemented and is under evaluation in three clinical use cases. Data from several hundred patients provided by a University Hospital and a private hospital chain have already been processed. CONCLUSIONS Cloud4health has shown how existing components for secondary use of structured data can be complemented with text-mining in a privacy compliant manner. The cloud-computing paradigm allows a flexible and dynamically adaptable service provision that facilitates the adoption of services by data providers without own investments in respective hardware resources and software tools.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - M Sedlmayr
- Dr. Martin Sedlmayr, Lehrstuhl für Medizinische Informatik, Friedrich-Alexander-Universität Erlangen-Nürnberg, Wetterkreuz 13, 91058 Erlangen, Germany, E-mail:
| |
Collapse
|
10
|
Lin CH, Wu NY, Liou DM. A multi-technique approach to bridge electronic case report form design and data standard adoption. J Biomed Inform 2014; 53:49-57. [PMID: 25200473 DOI: 10.1016/j.jbi.2014.08.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Revised: 08/22/2014] [Accepted: 08/30/2014] [Indexed: 10/24/2022]
Abstract
BACKGROUND AND OBJECTIVE The importance of data standards when integrating clinical research data has been recognized. The common data element (CDE) is a consensus-based data element for data harmonization and sharing between clinical researchers, it can support data standards adoption and mapping. However, the lack of a suitable methodology has become a barrier to data standard adoption. Our aim was to demonstrate an approach that allowed clinical researchers to design electronic case report forms (eCRFs) that complied with the data standard. METHODS We used a multi-technique approach, including information retrieval, natural language processing and an ontology-based knowledgebase to facilitate data standard adoption using the eCRF design. The approach took research questions as query texts with the aim of retrieving and associating relevant CDEs with the research questions. RESULTS The approach was implemented using a CDE-based eCRF builder, which was evaluated using CDE- related questions from CRFs used in the Parkinson Disease Biomarker Program, as well as CDE-unrelated questions from a technique support website. Our approach had a precision of 0.84, a recall of 0.80, a F-measure of 0.82 and an error of 0.31. Using the 303 testing CDE-related questions, our approach responded and provided suggested CDEs for 88.8% (269/303) of the study questions with a 90.3% accuracy (243/269). The reason for any missed and failed responses was also analyzed. CONCLUSION This study demonstrates an approach that helps to cross the barrier that inhibits data standard adoption in eCRF building and our evaluation reveals the approach has satisfactory performance. Our CDE-based form builder provides an alternative perspective regarding data standard compliant eCRF design.
Collapse
Affiliation(s)
- Ching-Heng Lin
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Nai-Yuan Wu
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Der-Ming Liou
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.
| |
Collapse
|
11
|
Payne PRO. Sustainability Through Technology Licensing and Commercialization: Lessons Learned from the TRIAD Project. EGEMS 2014; 2:1075. [PMID: 25848609 PMCID: PMC4371525 DOI: 10.13063/2327-9214.1075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Ongoing transformation relative to the funding climate for healthcare research programs housed in academic and non-profit research organizations has led to a new (or renewed) emphasis on the pursuit of non-traditional sustainability models. This need is often particularly acute in the context of data management and sharing infrastructure that is developed under the auspices of such research initiatives. One option for achieving sustainability of such data management and sharing infrastructure is the pursuit of technology licensing and commercialization, in an effort to establish public-private or equivalent partnerships that sustain and even expand upon the development and dissemination of research-oriented data management and sharing technologies. However, the critical success factors for technology licensing and commercialization efforts are often unknown to individuals outside of the private sector, thus making this type of endeavor challenging to investigators in academic and non-profit settings. In response to such a gap in knowledge, this article will review a number of generalizable lessons learned from an effort undertaken at The Ohio State University to commercialize a prototypical research-oriented data management and sharing infrastructure, known as the Translational Research Informatics and Data Management (TRIAD) Grid. It is important to note that the specific emphasis of these lessons learned is on the early stages of moving a technology from the research setting into a private-sector entity and as such are particularly relevant to academic investigators interested in pursuing such activities.
Collapse
|
12
|
Wade TD, Zelarney PT, Hum RC, McGee S, Batson DH. Using patient lists to add value to integrated data repositories. J Biomed Inform 2014; 52:72-7. [PMID: 24534444 DOI: 10.1016/j.jbi.2014.02.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Revised: 12/20/2013] [Accepted: 02/04/2014] [Indexed: 01/16/2023]
Abstract
Patient lists are project-specific sets of patients that can be queried in integrated data repositories (IDR's). By allowing a set of patients to be an addition to the qualifying conditions of a query, returned results will refer to, and only to, that set of patients. We report a variety of use cases for such lists, including: restricting retrospective chart review to a defined set of patients; following a set of patients for practice management purposes; distributing "honest-brokered" (deidentified) data; adding phenotypes to biosamples; and enhancing the content of study or registry data. Among the capabilities needed to implement patient lists in an IDR are: capture of patient identifiers from a query and feedback of these into the IDR; the existence of a permanent internal identifier in the IDR that is mappable to external identifiers; the ability to add queryable attributes to the IDR; the ability to merge data from multiple queries; and suitable control over user access and de-identification of results. We implemented patient lists in a custom IDR of our own design. We reviewed capabilities of other published IDRs for focusing on sets of patients. The widely used i2b2 IDR platform has various ways to address patient sets, and it could be modified to add the low-overhead version of patient lists that we describe.
Collapse
Affiliation(s)
- Ted D Wade
- Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, CO 80206, USA.
| | - Pearlanne T Zelarney
- Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, CO 80206, USA
| | - Richard C Hum
- Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, CO 80206, USA
| | - Sylvia McGee
- Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, CO 80206, USA
| | - Deborah H Batson
- Department of Research Informatics, Children's Hospital Colorado Research Institute, Aurora, CO 80045, USA
| |
Collapse
|
13
|
Abstract
The growing amount and availability of electronic health record (EHR) data present enhanced opportunities for discovering new knowledge about diseases. In the past decade, there has been an increasing number of data and text mining studies focused on the identification of disease associations (e.g., disease-disease, disease-drug, and disease-gene) in structured and unstructured EHR data. This chapter presents a knowledge discovery framework for mining the EHR for disease knowledge and describes each step for data selection, preprocessing, transformation, data mining, and interpretation/validation. Topics including natural language processing, standards, and data privacy and security are also discussed in the context of this framework.
Collapse
Affiliation(s)
- Elizabeth S Chen
- Center for Clinical and Translational Science, University of Vermont, Burlington, VT, USA,
| | | |
Collapse
|
14
|
Wyatt MC, Hendrickson RC, Ames M, Bondy J, Ranauro P, English TM, Bobitt K, Davidson A, Houston TK, Embi PJ, Berner ES. Federated Aggregate Cohort Estimator (FACE): an easy to deploy, vendor neutral, multi-institutional cohort query architecture. J Biomed Inform 2013; 52:65-71. [PMID: 24316052 DOI: 10.1016/j.jbi.2013.11.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2013] [Revised: 10/24/2013] [Accepted: 11/24/2013] [Indexed: 10/25/2022]
Abstract
Cross-institutional data sharing for cohort discovery is critical to enabling future research. While particularly useful in rare diseases, the ability to target enrollment and to determine if an institution has a sufficient number of patients is valuable in all research, particularly in the initiation of projects and collaborations. An optimal technology solution would work with any source database with minimal resource investment for deployment and would meet all necessary security and confidentiality requirements of participating organizations. We describe a platform-neutral reference implementation to meet these requirements: the Federated Aggregate Cohort Estimator (FACE). FACE was developed and implemented through a collaboration of The University of Alabama at Birmingham (UAB), The Ohio State University (OSU), the University of Massachusetts Medical School (UMMS), and the Denver Health and Hospital Authority (DHHA) a clinical affiliate of the Colorado Clinical and Translational Sciences Institute. The reference implementation of FACE federated diverse SQL data sources and an i2b2 instance to estimate combined research subject availability from three institutions. It used easily-deployed virtual machines and addressed privacy and security concerns for data sharing.
Collapse
Affiliation(s)
- Matthew C Wyatt
- Biomedical Informatics, Center for Clinical and Translational Science, University of Alabama at Birmingham, Suit 175 Sparks Building, 1720 7th Avenue South, Birmingham, AL 35233, United States.
| | - R Curtis Hendrickson
- Biomedical Informatics, Center for Clinical and Translational Science, University of Alabama at Birmingham, Suit 175 Sparks Building, 1720 7th Avenue South, Birmingham, AL 35233, United States
| | - Michael Ames
- University of Colorado Cancer Center, 13001 E. 17th Place, Aurora, CO 80045, United States
| | - Jessica Bondy
- University of Colorado Cancer Center, 13001 E. 17th Place, Aurora, CO 80045, United States
| | - Paul Ranauro
- Research Computing Services, University of Massachusetts Medical School, 55 Lake Avenue North, Worcester, MA 01655-0002, United States
| | - Thomas M English
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, AS8-1063, 368 Plantation Street, Worcester, MA 01605, United States
| | - Keith Bobitt
- Biomedical Informatics, Center for Clinical and Translational Science, University of Alabama at Birmingham, Suit 175 Sparks Building, 1720 7th Avenue South, Birmingham, AL 35233, United States
| | - Arthur Davidson
- Denver Public Health, Denver Health, 605 Bannock Street, Denver, CO 80204, United States
| | - Thomas K Houston
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, 55 Lake Avenue North, Worcester, MA 01655-0002, United States
| | - Peter J Embi
- Department of Biomedical Informatics, The Ohio State University College of Medicine, 3190 Graves Hall, 333 W. Tenth Avenue, Columbus, OH 43210, United States
| | - Eta S Berner
- Health Informatics Program, Department of Health Services Administration, University of Alabama at Birmingham, 1705 University Blvd #590J, Birmingham, AL 35294-1212, United States
| |
Collapse
|
15
|
Schilling LM, Kwan BM, Drolshagen CT, Hosokawa PW, Brandt E, Pace WD, Uhrich C, Kamerick M, Bunting A, Payne PRO, Stephens WE, George JM, Vance M, Giacomini K, Braddy J, Green MK, Kahn MG. Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) Technology Infrastructure for a Distributed Data Network. EGEMS 2013; 1:1027. [PMID: 25848567 PMCID: PMC4371513 DOI: 10.13063/2327-9214.1027] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Introduction: Distributed Data Networks (DDNs) offer infrastructure solutions for sharing electronic health data from across disparate data sources to support comparative effectiveness research. Data sharing mechanisms must address technical and governance concerns stemming from network security and data disclosure laws and best practices, such as HIPAA. Methods: The Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) deploys TRIAD grid technology, a common data model, detailed technical documentation, and custom software for data harmonization to facilitate data sharing in collaboration with stakeholders in the care of safety net populations. Data sharing partners host TRIAD grid nodes containing harmonized clinical data within their internal or hosted network environments. Authorized users can use a central web-based query system to request analytic data sets. Discussion: SAFTINet DDN infrastructure achieved a number of data sharing objectives, including scalable and sustainable systems for ensuring harmonized data structures and terminologies and secure distributed queries. Initial implementation challenges were resolved through iterative discussions, development and implementation of technical documentation, governance, and technology solutions.
Collapse
|
16
|
Sittig DF, Hazlehurst BL, Brown J, Murphy S, Rosenman M, Tarczy-Hornoch P, Wilcox AB. A survey of informatics platforms that enable distributed comparative effectiveness research using multi-institutional heterogenous clinical data. Med Care 2012; 50 Suppl:S49-59. [PMID: 22692259 DOI: 10.1097/mlr.0b013e318259c02b] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Comparative effectiveness research (CER) has the potential to transform the current health care delivery system by identifying the most effective medical and surgical treatments, diagnostic tests, disease prevention methods, and ways to deliver care for specific clinical conditions. To be successful, such research requires the identification, capture, aggregation, integration, and analysis of disparate data sources held by different institutions with diverse representations of the relevant clinical events. In an effort to address these diverse demands, there have been multiple new designs and implementations of informatics platforms that provide access to electronic clinical data and the governance infrastructure required for interinstitutional CER. The goal of this manuscript is to help investigators understand why these informatics platforms are required and to compare and contrast 6 large-scale, recently funded, CER-focused informatics platform development efforts. We utilized an 8-dimension, sociotechnical model of health information technology to help guide our work. We identified 6 generic steps that are necessary in any distributed, multi-institutional CER project: data identification, extraction, modeling, aggregation, analysis, and dissemination. We expect that over the next several years these projects will provide answers to many important, and heretofore unanswerable, clinical research questions.
Collapse
Affiliation(s)
- Dean F Sittig
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, USA.
| | | | | | | | | | | | | |
Collapse
|
17
|
The Electronic Data Methods (EDM) Forum for Comparative Effectiveness Research (CER). Med Care 2012; 50 Suppl:S7-10. [DOI: 10.1097/mlr.0b013e318257a66b] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|