1
|
Iken AR, Poolman RW, Gademan MGJ. Data quality assessment of interventional trials in public trial databases. J Clin Epidemiol 2024; 175:111516. [PMID: 39243872 DOI: 10.1016/j.jclinepi.2024.111516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 08/26/2024] [Accepted: 09/02/2024] [Indexed: 09/09/2024]
Abstract
OBJECTIVE High-quality data entry in clinical trial databases is crucial to the usefulness, validity, and replicability of research findings, as it influences evidence-based medical practice and future research. Our aim is to assess the quality of self-reported data in trial registries and present practical and systematic methods for identifying and evaluating data quality. STUDY DESIGN AND SETTING We searched ClinicalTrials.Gov (CTG) for interventional total knee arthroplasty (TKA) trials between 2000 and 2015. We extracted required and optional trial information elements and used the CTG's variables' definitions. We performed a literature review on data quality reporting on frameworks, checklists, and overviews of irregularities in healthcare databases. We identified and assessed data quality attributes as follows: consistency, accuracy, completeness, and timeliness. RESULTS We included 816 interventional TKA trials. Data irregularities varied widely: 0%-100%. Inconsistency ranged from 0% to 36%, and most often nonrandomized labeled allocation was combined with a "single-group" assignment trial design. Inaccuracy ranged from 0% to 100%. Incompleteness ranged from 0% to 61%; 61% of finished TKA trials did not report their outcome. With regard to irregularities in timeliness, 49% of the trials were registered more than 3 months after the start date. CONCLUSION We found significant variations in the data quality of registered clinical TKA trials. Trial sponsors should be committed to ensuring that the information they provide is reliable, consistent, up-to-date, transparent, and accurate. CTG's users need to be critical when drawing conclusions based on the registered data. We believe this awareness will increase well-informed decisions about published articles and treatment protocols, including replicating and improving trial designs.
Collapse
Affiliation(s)
- Annabelle R Iken
- Leiden University Medical Center, Department of Orthopaedics, Albinusdreef 2, Leiden, 2333 ZA, The Netherlands.
| | - Rudolf W Poolman
- Leiden University Medical Center, Department of Orthopaedics, Albinusdreef 2, Leiden, 2333 ZA, The Netherlands; Department of Orthopaedic Surgery, Joint Research, OLVG, Amsterdam, The Netherlands
| | - Maaike G J Gademan
- Leiden University Medical Center, Department of Orthopaedics, Albinusdreef 2, Leiden, 2333 ZA, The Netherlands; Department of Clinical Epidemiology, Leiden University Medical Center, Albinusdreef 2, Leiden, 2333 ZA, The Netherlands
| |
Collapse
|
2
|
Forero DA, Curioso WH, Wang W. Ten simple rules for successfully carrying out funded research projects. PLoS Comput Biol 2024; 20:e1012431. [PMID: 39298382 DOI: 10.1371/journal.pcbi.1012431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/21/2024] Open
Affiliation(s)
- Diego A Forero
- School of Heath and Sport Sciences, Fundación Universitaria del Área Andina, Bogotá, Colombia
| | - Walter H Curioso
- Vicerrectorado de Investigación, Universidad Continental, Lima, Peru
| | - Wei Wang
- Clinical Research Centre, The First Affiliated Hospital of Shantou University Medical College, Shantou, China
- School of Public Health, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, Shandong, China
- Beijing Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing, China
- Centre for Precision Health, Edith Cowan University, Perth, Australia
| |
Collapse
|
3
|
Lighterness A, Adcock M, Scanlon LA, Price G. Data Quality-Driven Improvement in Health Care: Systematic Literature Review. J Med Internet Res 2024; 26:e57615. [PMID: 39173155 PMCID: PMC11377907 DOI: 10.2196/57615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 05/10/2024] [Accepted: 05/30/2024] [Indexed: 08/24/2024] Open
Abstract
BACKGROUND The promise of real-world evidence and the learning health care system primarily depends on access to high-quality data. Despite widespread awareness of the prevalence and potential impacts of poor data quality (DQ), best practices for its assessment and improvement are unknown. OBJECTIVE This review aims to investigate how existing research studies define, assess, and improve the quality of structured real-world health care data. METHODS A systematic literature search of studies in the English language was implemented in the Embase and PubMed databases to select studies that specifically aimed to measure and improve the quality of structured real-world data within any clinical setting. The time frame for the analysis was from January 1945 to June 2023. We standardized DQ concepts according to the Data Management Association (DAMA) DQ framework to enable comparison between studies. After screening and filtering by 2 independent authors, we identified 39 relevant articles reporting DQ improvement initiatives. RESULTS The studies were characterized by considerable heterogeneity in settings and approaches to DQ assessment and improvement. Affiliated institutions were from 18 different countries and 18 different health domains. DQ assessment methods were largely manual and targeted completeness and 1 other DQ dimension. Use of DQ frameworks was limited to the Weiskopf and Weng (3/6, 50%) or Kahn harmonized model (3/6, 50%). Use of standardized methodologies to design and implement quality improvement was lacking, but mainly included plan-do-study-act (PDSA) or define-measure-analyze-improve-control (DMAIC) cycles. Most studies reported DQ improvements using multiple interventions, which included either DQ reporting and personalized feedback (24/39, 61%), IT-related solutions (21/39, 54%), training (17/39, 44%), improvements in workflows (5/39, 13%), or data cleaning (3/39, 8%). Most studies reported improvements in DQ through a combination of these interventions. Statistical methods were used to determine significance of treatment effect (22/39, 56% times), but only 1 study implemented a randomized controlled study design. Variability in study designs, approaches to delivering interventions, and reporting DQ changes hindered a robust meta-analysis of treatment effects. CONCLUSIONS There is an urgent need for standardized guidelines in DQ improvement research to enable comparison and effective synthesis of lessons learned. Frameworks such as PDSA learning cycles and the DAMA DQ framework can facilitate this unmet need. In addition, DQ improvement studies can also benefit from prioritizing root cause analysis of DQ issues to ensure the most appropriate intervention is implemented, thereby ensuring long-term, sustainable improvement. Despite the rise in DQ improvement studies in the last decade, significant heterogeneity in methodologies and reporting remains a challenge. Adopting standardized frameworks for DQ assessment, analysis, and improvement can enhance the effectiveness, comparability, and generalizability of DQ improvement initiatives.
Collapse
Affiliation(s)
- Anthony Lighterness
- Clinical Outcomes and Data Unit, The Christie NHS Foundation Trust, Manchester, United Kingdom
| | - Michael Adcock
- Clinical Outcomes and Data Unit, The Christie NHS Foundation Trust, Manchester, United Kingdom
| | - Lauren Abigail Scanlon
- Clinical Outcomes and Data Unit, The Christie NHS Foundation Trust, Manchester, United Kingdom
| | - Gareth Price
- Radiotherapy Related Research Group, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
4
|
Xiong K, Xu X, Fu S, Weng D, Wang Y, Wu Y. JsonCurer: Data Quality Management for JSON Based on an Aggregated Schema. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:3008-3021. [PMID: 38625779 DOI: 10.1109/tvcg.2024.3388556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
High-quality data is critical to deriving useful and reliable information. However, real-world data often contains quality issues undermining the value of the derived information. Most existing research on data quality management focuses on tabular data, leaving semi-structured data under-exploited. Due to the schema-less and hierarchical features of semi-structured data, discovering and fixing quality issues is challenging and time-consuming. To address the challenge, this paper presents JsonCurer, an interactive visualization system to assist with data quality management in the context of JSON data. To have an overview of quality issues, we first construct a taxonomy based on interviews with data practitioners and a review of 119 real-world JSON files. Then we highlight a schema visualization that presents structural information, statistical features, and quality issues of JSON data. Based on a similarity-based aggregation technique, the visualization depicts the entire JSON data with a concise tree, where summary visualizations are given above each node, and quality issues are illustrated using Bubble Sets across nodes. We evaluate the effectiveness and usability of JsonCurer with two case studies. One is in the domain of data analysis while the other concerns quality assurance in MongoDB documents.
Collapse
|
5
|
Declerck J, Kalra D, Vander Stichele R, Coorevits P. Frameworks, Dimensions, Definitions of Aspects, and Assessment Methods for the Appraisal of Quality of Health Data for Secondary Use: Comprehensive Overview of Reviews. JMIR Med Inform 2024; 12:e51560. [PMID: 38446534 PMCID: PMC10955383 DOI: 10.2196/51560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 11/07/2023] [Accepted: 01/09/2024] [Indexed: 03/07/2024] Open
Abstract
BACKGROUND Health care has not reached the full potential of the secondary use of health data because of-among other issues-concerns about the quality of the data being used. The shift toward digital health has led to an increase in the volume of health data. However, this increase in quantity has not been matched by a proportional improvement in the quality of health data. OBJECTIVE This review aims to offer a comprehensive overview of the existing frameworks for data quality dimensions and assessment methods for the secondary use of health data. In addition, it aims to consolidate the results into a unified framework. METHODS A review of reviews was conducted including reviews describing frameworks of data quality dimensions and their assessment methods, specifically from a secondary use perspective. Reviews were excluded if they were not related to the health care ecosystem, lacked relevant information related to our research objective, and were published in languages other than English. RESULTS A total of 22 reviews were included, comprising 22 frameworks, with 23 different terms for dimensions, and 62 definitions of dimensions. All dimensions were mapped toward the data quality framework of the European Institute for Innovation through Health Data. In total, 8 reviews mentioned 38 different assessment methods, pertaining to 31 definitions of the dimensions. CONCLUSIONS The findings in this review revealed a lack of consensus in the literature regarding the terminology, definitions, and assessment methods for data quality dimensions. This creates ambiguity and difficulties in developing specific assessment methods. This study goes a step further by assigning all observed definitions to a consolidated framework of 9 data quality dimensions.
Collapse
Affiliation(s)
- Jens Declerck
- Department of Public Health and Primary Care, Unit of Medical Informatics and Statistics, Ghent University, Ghent, Belgium
- The European Institute for Innovation through Health Data, Ghent, Belgium
| | - Dipak Kalra
- Department of Public Health and Primary Care, Unit of Medical Informatics and Statistics, Ghent University, Ghent, Belgium
- The European Institute for Innovation through Health Data, Ghent, Belgium
| | - Robert Vander Stichele
- Faculty of Medicine and Health Sciences, Heymans Institute of Pharmacology, Ghent, Belgium
| | - Pascal Coorevits
- Department of Public Health and Primary Care, Unit of Medical Informatics and Statistics, Ghent University, Ghent, Belgium
| |
Collapse
|
6
|
Saucedo SCM, Silva KR, Silva LDA, Crivelari JM, Costa R. The impact of data quality monitoring of a multicenter prospective registry of cardiac implantable electronic devices. MethodsX 2023; 11:102454. [PMID: 37920872 PMCID: PMC10618759 DOI: 10.1016/j.mex.2023.102454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 10/19/2023] [Indexed: 11/04/2023] Open
Abstract
Data quality monitoring plays a crucial role in multicenter prospective registries. By maintaining high data accuracy, completeness, and consistency, researchers can improve the overall quality and reliability of the registry data, enabling meaningful conclusions and supporting evidence-based decisions. The purpose of the present study was to evaluate data quality metrics (completeness, accuracy, and temporal plausibility) of a Multicenter Registry of Cardiac Implantable Electronic Devices (CIEDs) and to perform a direct data audit of a random sample of records to assess the agreement levels with the source documents. The CIED Registry was a prospective, multicenter, real-world observational study carried out from January 2020 to December 2022 in five designated centers across Sao Paulo, Brazil. We assessed the data quality of the CIED Registry by using two distinct approaches:•Dynamic data monitoring using features of the REDCap (Research Electronic Data Capture) software, including data reports and data quality rules•Direct data audit in which information from a random sample of 10 % of cases from the coordinating center was compared with original source documents Our findings suggest that the methodological approach applied to the CIED Registry resulted in high data completeness, accuracy, temporal plausibility, and excellent agreement levels with the source documents.
Collapse
Affiliation(s)
- Sarah Caroline Martins Saucedo
- Unidade de Estimulação Elétrica e Marcapasso, Instituto do Coração do Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo (InCor-HCFMUSP), Sao Paulo, Brazil
| | - Katia Regina Silva
- Unidade de Estimulação Elétrica e Marcapasso, Instituto do Coração do Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo (InCor-HCFMUSP), Sao Paulo, Brazil
| | - Laísa de Arruda Silva
- Unidade de Estimulação Elétrica e Marcapasso, Instituto do Coração do Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo (InCor-HCFMUSP), Sao Paulo, Brazil
| | - Jéssica Moretto Crivelari
- Unidade de Estimulação Elétrica e Marcapasso, Instituto do Coração do Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo (InCor-HCFMUSP), Sao Paulo, Brazil
| | - Roberto Costa
- Unidade de Estimulação Elétrica e Marcapasso, Instituto do Coração do Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo (InCor-HCFMUSP), Sao Paulo, Brazil
| |
Collapse
|
7
|
Mashoufi M, Ayatollahi H, Khorasani-Zavareh D, Talebi Azad Boni T. Data Quality in Health Care: Main Concepts and Assessment Methodologies. Methods Inf Med 2023; 62:5-18. [PMID: 36716776 DOI: 10.1055/s-0043-1761500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
INTRODUCTION In the health care environment, a huge volume of data is produced on a daily basis. However, the processes of collecting, storing, sharing, analyzing, and reporting health data usually face with numerous challenges that lead to producing incomplete, inaccurate, and untimely data. As a result, data quality issues have received more attention than before. OBJECTIVE The purpose of this article is to provide an insight into the data quality definitions, dimensions, and assessment methodologies. METHODS In this article, a scoping literature review approach was used to describe and summarize the main concepts related to data quality and data quality assessment methodologies. Search terms were selected to find the relevant articles published between January 1, 2012 and September 31, 2022. The retrieved articles were then reviewed and the results were reported narratively. RESULTS In total, 23 papers were included in the study. According to the results, data quality dimensions were various and different methodologies were used to assess them. Most studies used quantitative methods to measure data quality dimensions either in paper-based or computer-based medical records. Only two studies investigated respondents' opinions about data quality. CONCLUSION In health care, high-quality data not only are important for patient care, but also are vital for improving quality of health care services and better decision making. Therefore, using technical and nontechnical solutions as well as constant assessment and supervision is suggested to improve data quality.
Collapse
Affiliation(s)
- Mehrnaz Mashoufi
- Department of Health Information Management, School of Medicine, Ardabil University of Medical Sciences, Ardabil, Iran
| | - Haleh Ayatollahi
- Health Management and Economics Research Center, Health Management Research Institute, Iran University of Medical Sciences, Tehran, Iran.,Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
| | - Davoud Khorasani-Zavareh
- Department of Health in Emergencies and Disasters, Safety Promotion and Injury Prevention Research Center, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Tahere Talebi Azad Boni
- Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran.,Social Determinants of Health Research Center, Saveh University of Medical Sciences, Saveh, Iran
| |
Collapse
|