1
|
Daniel Boie S, Meyer-Eschenbach F, Schreiber F, Giesa N, Barrenetxea J, Guinemer C, Haufe S, Krämer M, Brunecker P, Prasser F, Balzer F. A scalable approach for critical care data extraction and analysis in an academic medical center. Int J Med Inform 2024; 192:105611. [PMID: 39255725 DOI: 10.1016/j.ijmedinf.2024.105611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 08/16/2024] [Accepted: 08/28/2024] [Indexed: 09/12/2024]
Abstract
BACKGROUND Electronic health records are a valuable asset for research, but their use is challenging due to inconsistencies of records, heterogeneous formats and the distribution over multiple, non-integrated information systems. Hence, specialized health data engineering and data science expertise are required to enable research. To facilitate secondary use of clinical routine data collected in our intensive care wards, we developed a scalable approach, consisting of cohort generation, variable filtering and data extraction steps. OBJECTIVE With this report we share our workflow of data request, cohort identification and data extraction. We present an algorithm for automatic data extraction from our critical care information system (CCIS) that can be adapted to other object-oriented data bases. METHODS We introduced a data request process with functionalities for automated identification of patient cohorts and a specialized hierarchical data structure that supports filtering relevant variables from the CCIS and further systems for the specified cohorts. The data extraction algorithm takes patient pseudonyms and variable lists as inputs. Algorithms are implemented in Python, leveraging the PySpark framework running on our data lake infrastructure. RESULTS Our data request process is in operational use since June 2022. Since then we have served 121 projects with 148 service requests in total. We discuss the hierarchical structure and the frequently used data items of our CCIS in detail and present an application example, including cohort selection, data extraction and data transformation into an analyses-ready format. CONCLUSIONS Using clinical routine data for secondary research is challenging and requires an interdisciplinary team. We developed a scalable approach that automates steps for cohort identification, data extraction and common data pre-processing steps. Additionally, we facilitate data harmonization, integration and consult on typical data analysis scenarios, machine learning algorithms and visualizations in dashboards.
Collapse
Affiliation(s)
- Sebastian Daniel Boie
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany.
| | - Falk Meyer-Eschenbach
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Clinical Study Center, Charitéplatz 1, 10117 Berlin, Germany
| | - Fabian Schreiber
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany
| | - Niklas Giesa
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Clinical Study Center, Charitéplatz 1, 10117 Berlin, Germany
| | - Jon Barrenetxea
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany
| | - Camille Guinemer
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Unit Research IT, Charitéplatz 1, 10117 Berlin, Germany
| | - Stefan Haufe
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany; Physikalisch-Technische Bundesanstalt, Abbestrasse 2-12, 10587 Berlin, Germany; Technische Universität Berlin, Str. des 17. Juni 135, 10623 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin Center for Advanced Neuroimaging, Charitéplatz 1, 10117 Berlin, Germany
| | - Michael Krämer
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Unit Research IT, Charitéplatz 1, 10117 Berlin, Germany
| | - Peter Brunecker
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Unit Research IT, Charitéplatz 1, 10117 Berlin, Germany
| | - Fabian Prasser
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117 Berlin, Germany
| | - Felix Balzer
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
2
|
Cure P, ElShourbagy Ferreira S, Fessel JP, Ossip D, Zand MS, Steele SJ, Gersing K, Hartshorn CM. Real-world data for 21 st-century medicine: The clinical and translational science awards program perspective. J Clin Transl Sci 2023; 7:e201. [PMID: 37830007 PMCID: PMC10565194 DOI: 10.1017/cts.2023.588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 06/30/2023] [Accepted: 06/30/2023] [Indexed: 10/14/2023] Open
Affiliation(s)
- Pablo Cure
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, USA
| | | | - Joshua P. Fessel
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, USA
| | - Deborah Ossip
- Center for Leading Innovation and Collaboration (CLIC), Clinical and Translational Science Program National Coordinating Center, University of Rochester Medical Center, Rochester, NY, USA
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, NY, USA
| | - Martin S. Zand
- Center for Leading Innovation and Collaboration (CLIC), Clinical and Translational Science Program National Coordinating Center, University of Rochester Medical Center, Rochester, NY, USA
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, NY, USA
- Department of Medicine, Division of Nephrology, University of Rochester Medical Center, Rochester, NY, USA
| | - Scott J. Steele
- Center for Leading Innovation and Collaboration (CLIC), Clinical and Translational Science Program National Coordinating Center, University of Rochester Medical Center, Rochester, NY, USA
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, NY, USA
| | - Kenneth Gersing
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, USA
| | - Christopher M. Hartshorn
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
3
|
Parciak M, Suhr M, Schmidt C, Bönisch C, Löhnhardt B, Kesztyüs D, Kesztyüs T. FAIRness through automation: development of an automated medical data integration infrastructure for FAIR health data in a maximum care university hospital. BMC Med Inform Decis Mak 2023; 23:94. [PMID: 37189148 DOI: 10.1186/s12911-023-02195-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 05/09/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND Secondary use of routine medical data is key to large-scale clinical and health services research. In a maximum care hospital, the volume of data generated exceeds the limits of big data on a daily basis. This so-called "real world data" are essential to complement knowledge and results from clinical trials. Furthermore, big data may help in establishing precision medicine. However, manual data extraction and annotation workflows to transfer routine data into research data would be complex and inefficient. Generally, best practices for managing research data focus on data output rather than the entire data journey from primary sources to analysis. To eventually make routinely collected data usable and available for research, many hurdles have to be overcome. In this work, we present the implementation of an automated framework for timely processing of clinical care data including free texts and genetic data (non-structured data) and centralized storage as Findable, Accessible, Interoperable, Reusable (FAIR) research data in a maximum care university hospital. METHODS We identify data processing workflows necessary to operate a medical research data service unit in a maximum care hospital. We decompose structurally equal tasks into elementary sub-processes and propose a framework for general data processing. We base our processes on open-source software-components and, where necessary, custom-built generic tools. RESULTS We demonstrate the application of our proposed framework in practice by describing its use in our Medical Data Integration Center (MeDIC). Our microservices-based and fully open-source data processing automation framework incorporates a complete recording of data management and manipulation activities. The prototype implementation also includes a metadata schema for data provenance and a process validation concept. All requirements of a MeDIC are orchestrated within the proposed framework: Data input from many heterogeneous sources, pseudonymization and harmonization, integration in a data warehouse and finally possibilities for extraction or aggregation of data for research purposes according to data protection requirements. CONCLUSION Though the framework is not a panacea for bringing routine-based research data into compliance with FAIR principles, it provides a much-needed possibility to process data in a fully automated, traceable, and reproducible manner.
Collapse
Affiliation(s)
- Marcel Parciak
- Department of Medical Informatics, University Medical Center Göttingen, Von-Siebold-Straße 3, 37075, Göttingen, Germany
- University MS Center, Biomedical Research Institute (BIOMED), Hasselt University, Agoralaan Building C, 3590, Diepenbeek, Belgium
- Data Science Institute (DSI), Hasselt University, Agoralaan Building D, 3590, Diepenbeek, Belgium
| | - Markus Suhr
- Department of Medical Informatics, University Medical Center Göttingen, Von-Siebold-Straße 3, 37075, Göttingen, Germany
- NextLytics AG, Kapellenstrasse 37, 65719, Hofheim Am Taunus, Germany
| | - Christian Schmidt
- Department of Medical Informatics, University Medical Center Göttingen, Von-Siebold-Straße 3, 37075, Göttingen, Germany
| | - Caroline Bönisch
- Department of Medical Informatics, University Medical Center Göttingen, Von-Siebold-Straße 3, 37075, Göttingen, Germany
| | - Benjamin Löhnhardt
- Department of Medical Informatics, University Medical Center Göttingen, Von-Siebold-Straße 3, 37075, Göttingen, Germany
| | - Dorothea Kesztyüs
- Department of Medical Informatics, University Medical Center Göttingen, Von-Siebold-Straße 3, 37075, Göttingen, Germany.
| | - Tibor Kesztyüs
- Department of Medical Informatics, University Medical Center Göttingen, Von-Siebold-Straße 3, 37075, Göttingen, Germany
| |
Collapse
|
4
|
Baum L, Johns M, Poikela M, Möller R, Ananthasubramaniam B, Prasser F. Data integration and analysis for circadian medicine. Acta Physiol (Oxf) 2023; 237:e13951. [PMID: 36790321 DOI: 10.1111/apha.13951] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 02/04/2023] [Accepted: 02/12/2023] [Indexed: 02/16/2023]
Abstract
Data integration, data sharing, and standardized analyses are important enablers for data-driven medical research. Circadian medicine is an emerging field with a particularly high need for coordinated and systematic collaboration between researchers from different disciplines. Datasets in circadian medicine are multimodal, ranging from molecular circadian profiles and clinical parameters to physiological measurements and data obtained from (wearable) sensors or reported by patients. Uniquely, data spanning both the time dimension and the spatial dimension (across tissues) are needed to obtain a holistic view of the circadian system. The study of human rhythms in the context of circadian medicine has to confront the heterogeneity of clock properties within and across subjects and our inability to repeatedly obtain relevant biosamples from one subject. This requires informatics solutions for integrating and visualizing relevant data types at various temporal resolutions ranging from milliseconds and seconds to minutes and several hours. Associated challenges range from a lack of standards that can be used to represent all required data in a common interoperable form, to challenges related to data storage, to the need to perform transformations for integrated visualizations, and to privacy issues. The downstream analysis of circadian rhythms requires specialized approaches for the identification, characterization, and discrimination of rhythms. We conclude that circadian medicine research provides an ideal environment for developing innovative methods to address challenges related to the collection, integration, visualization, and analysis of multimodal multidimensional biomedical data.
Collapse
Affiliation(s)
- Lena Baum
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Marco Johns
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Maija Poikela
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Ralf Möller
- Institute of Information Systems, University of Lübeck, Lübeck, Germany
| | | | - Fabian Prasser
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|