1
|
Johns M, Meurers T, Wirth FN, Haber AC, Müller A, Halilovic M, Balzer F, Prasser F. Data Provenance in Biomedical Research: Scoping Review. J Med Internet Res 2023; 25:e42289. [PMID: 36972116 PMCID: PMC10132013 DOI: 10.2196/42289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 12/14/2022] [Accepted: 12/23/2022] [Indexed: 03/29/2023] Open
Abstract
BACKGROUND Data provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve reproducibility as well as quality in biomedical research and, therefore, to foster good scientific practice. However, despite the increasing interest on data provenance technologies in the literature and their implementation in other disciplines, these technologies have not yet been widely adopted in biomedical research. OBJECTIVE The aim of this scoping review was to provide a structured overview of the body of knowledge on provenance methods in biomedical research by systematizing articles covering data provenance technologies developed for or used in this application area; describing and comparing the functionalities as well as the design of the provenance technologies used; and identifying gaps in the literature, which could provide opportunities for future research on technologies that could receive more widespread adoption. METHODS Following a methodological framework for scoping studies and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, articles were identified by searching the PubMed, IEEE Xplore, and Web of Science databases and subsequently screened for eligibility. We included original articles covering software-based provenance management for scientific research published between 2010 and 2021. A set of data items was defined along the following five axes: publication metadata, application scope, provenance aspects covered, data representation, and functionalities. The data items were extracted from the articles, stored in a charting spreadsheet, and summarized in tables and figures. RESULTS We identified 44 original articles published between 2010 and 2021. We found that the solutions described were heterogeneous along all axes. We also identified relationships among motivations for the use of provenance information, feature sets (capture, storage, retrieval, visualization, and analysis), and implementation details such as the data models and technologies used. The important gap that we identified is that only a few publications address the analysis of provenance data or use established provenance standards, such as PROV. CONCLUSIONS The heterogeneity of provenance methods, models, and implementations found in the literature points to the lack of a unified understanding of provenance concepts for biomedical data. Providing a common framework, a biomedical reference, and benchmarking data sets could foster the development of more comprehensive provenance solutions.
Collapse
Affiliation(s)
- Marco Johns
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Thierry Meurers
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Felix N Wirth
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Anna C Haber
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Armin Müller
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Mehmed Halilovic
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Felix Balzer
- Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Fabian Prasser
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
2
|
Böttcher S, Vieluf S, Bruno E, Joseph B, Epitashvili N, Biondi A, Zabler N, Glasstetter M, Dümpelmann M, Van Laerhoven K, Nasseri M, Brinkman BH, Richardson MP, Schulze-Bonhage A, Loddenkemper T. Data quality evaluation in wearable monitoring. Sci Rep 2022; 12:21412. [PMID: 36496546 PMCID: PMC9741649 DOI: 10.1038/s41598-022-25949-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 11/25/2022] [Indexed: 12/13/2022] Open
Abstract
Wearable recordings of neurophysiological signals captured from the wrist offer enormous potential for seizure monitoring. Yet, data quality remains one of the most challenging factors that impact data reliability. We suggest a combined data quality assessment tool for the evaluation of multimodal wearable data. We analyzed data from patients with epilepsy from four epilepsy centers. Patients wore wristbands recording accelerometry, electrodermal activity, blood volume pulse, and skin temperature. We calculated data completeness and assessed the time the device was worn (on-body), and modality-specific signal quality scores. We included 37,166 h from 632 patients in the inpatient and 90,776 h from 39 patients in the outpatient setting. All modalities were affected by artifacts. Data loss was higher when using data streaming (up to 49% among inpatient cohorts, averaged across respective recordings) as compared to onboard device recording and storage (up to 9%). On-body scores, estimating the percentage of time a device was worn on the body, were consistently high across cohorts (more than 80%). Signal quality of some modalities, based on established indices, was higher at night than during the day. A uniformly reported data quality and multimodal signal quality index is feasible, makes study results more comparable, and contributes to the development of devices and evaluation routines necessary for seizure monitoring.
Collapse
Affiliation(s)
- Sebastian Böttcher
- grid.7708.80000 0000 9428 7911Department of Neurosurgery, Epilepsy Center, Medical Center – University of Freiburg, Freiburg, Germany ,grid.5836.80000 0001 2242 8751Ubiquitous Computing, Department of Electrical Engineering and Computer Science, University of Siegen, Siegen, Germany
| | - Solveig Vieluf
- grid.38142.3c000000041936754XDivision of Epilepsy and Clinical Neurophysiology, Boston Children’s Hospital, Harvard Medical School, Boston, MS USA
| | - Elisa Bruno
- grid.13097.3c0000 0001 2322 6764Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King’s College, London, UK
| | - Boney Joseph
- grid.66875.3a0000 0004 0459 167XBioelectronics Neurophysiology and Engineering Laboratory, Department of Neurology, Mayo Clinic, Rochester, MN USA
| | - Nino Epitashvili
- grid.7708.80000 0000 9428 7911Department of Neurosurgery, Epilepsy Center, Medical Center – University of Freiburg, Freiburg, Germany
| | - Andrea Biondi
- grid.13097.3c0000 0001 2322 6764Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King’s College, London, UK
| | - Nicolas Zabler
- grid.7708.80000 0000 9428 7911Department of Neurosurgery, Epilepsy Center, Medical Center – University of Freiburg, Freiburg, Germany
| | - Martin Glasstetter
- grid.7708.80000 0000 9428 7911Department of Neurosurgery, Epilepsy Center, Medical Center – University of Freiburg, Freiburg, Germany
| | - Matthias Dümpelmann
- grid.7708.80000 0000 9428 7911Department of Neurosurgery, Epilepsy Center, Medical Center – University of Freiburg, Freiburg, Germany ,grid.5963.9Department of Microsystems Engineering (IMTEK), University of Freiburg, Freiburg, Germany
| | - Kristof Van Laerhoven
- grid.5836.80000 0001 2242 8751Ubiquitous Computing, Department of Electrical Engineering and Computer Science, University of Siegen, Siegen, Germany
| | - Mona Nasseri
- grid.66875.3a0000 0004 0459 167XBioelectronics Neurophysiology and Engineering Laboratory, Department of Neurology, Mayo Clinic, Rochester, MN USA ,grid.266865.90000 0001 2109 4358School of Engineering, University of North Florida, Jacksonville, FL USA
| | - Benjamin H. Brinkman
- grid.66875.3a0000 0004 0459 167XBioelectronics Neurophysiology and Engineering Laboratory, Department of Neurology, Mayo Clinic, Rochester, MN USA
| | - Mark P. Richardson
- grid.13097.3c0000 0001 2322 6764Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King’s College, London, UK
| | - Andreas Schulze-Bonhage
- grid.7708.80000 0000 9428 7911Department of Neurosurgery, Epilepsy Center, Medical Center – University of Freiburg, Freiburg, Germany
| | - Tobias Loddenkemper
- grid.38142.3c000000041936754XDivision of Epilepsy and Clinical Neurophysiology, Boston Children’s Hospital, Harvard Medical School, Boston, MS USA
| |
Collapse
|
3
|
Cho S, Ensari I, Elhadad N, Weng C, Radin JM, Bent B, Desai P, Natarajan K. An interactive fitness-for-use data completeness tool to assess activity tracker data. J Am Med Inform Assoc 2022; 29:2032-2040. [PMID: 36173371 PMCID: PMC9667174 DOI: 10.1093/jamia/ocac166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 07/29/2022] [Accepted: 09/16/2022] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE To design and evaluate an interactive data quality (DQ) characterization tool focused on fitness-for-use completeness measures to support researchers' assessment of a dataset. MATERIALS AND METHODS Design requirements were identified through a conceptual framework on DQ, literature review, and interviews. The prototype of the tool was developed based on the requirements gathered and was further refined by domain experts. The Fitness-for-Use Tool was evaluated through a within-subjects controlled experiment comparing it with a baseline tool that provides information on missing data based on intrinsic DQ measures. The tools were evaluated on task performance and perceived usability. RESULTS The Fitness-for-Use Tool allows users to define data completeness by customizing the measures and its thresholds to fit their research task and provides a data summary based on the customized definition. Using the Fitness-for-Use Tool, study participants were able to accurately complete fitness-for-use assessment in less time than when using the Intrinsic DQ Tool. The study participants perceived that the Fitness-for-Use Tool was more useful in determining the fitness-for-use of a dataset than the Intrinsic DQ Tool. DISCUSSION Incorporating fitness-for-use measures in a DQ characterization tool could provide data summary that meets researchers needs. The design features identified in this study has potential to be applied to other biomedical data types. CONCLUSION A tool that summarizes a dataset in terms of fitness-for-use dimensions and measures specific to a research question supports dataset assessment better than a tool that only presents information on intrinsic DQ measures.
Collapse
Affiliation(s)
- Sylvia Cho
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Ipek Ensari
- Department of Artificial Intelligence and Human Health, Icahn School of Medicine, New York, New York, USA
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
- Data Science Institute, Columbia University, New York, New York, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
- Data Science Institute, Columbia University, New York, New York, USA
| | - Jennifer M Radin
- Scripps Research Translational Institute, La Jolla, California, USA
| | - Brinnae Bent
- Department of Biomedical Engineering, Duke University, Durham, North Carolina, USA
| | - Pooja Desai
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
- Data Science Institute, Columbia University, New York, New York, USA
| |
Collapse
|