1
|
Morozyuk D, Weiner MG. Outliers in diagnosis ratios: A clue toward possibly absent data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:1175-1182. [PMID: 38222346 PMCID: PMC10785923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
The evaluation of completeness of real-world data is a particularly challenging component of data quality assessment because the degree of truly versus erroneously absent data is unknown. Among inpatient data sets, while absolute counts of admissions having specific categories of diagnoses in the principal or any position may vary depending on hospital size, we hypothesized that the ratio of these parameters will be preserved across sites, with outliers suggesting the potential for erroneously absent data. For several categories of clinical conditions assigned to inpatient admissions, we analyzed the ratio of their recording as the principal diagnosis versus any diagnosis across several hospitals and compared the ratios against a national benchmark. Our analysis showed ratios that matched clinical expectations, with reasonable preservation of ratios across sites. However, some conditions exhibited more variability in the ratios and some sites had many outliers possibly reflecting data quality issues that warrant further attention.
Collapse
|
2
|
Klann JG, Henderson DW, Morris M, Estiri H, Weber GM, Visweswaran S, Murphy SN. A broadly applicable approach to enrich electronic-health-record cohorts by identifying patients with complete data: a multisite evaluation. J Am Med Inform Assoc 2023; 30:1985-1994. [PMID: 37632234 PMCID: PMC10654861 DOI: 10.1093/jamia/ocad166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 07/25/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023] Open
Abstract
OBJECTIVE Patients who receive most care within a single healthcare system (colloquially called a "loyalty cohort" since they typically return to the same providers) have mostly complete data within that organization's electronic health record (EHR). Loyalty cohorts have low data missingness, which can unintentionally bias research results. Using proxies of routine care and healthcare utilization metrics, we compute a per-patient score that identifies a loyalty cohort. MATERIALS AND METHODS We implemented a computable program for the widely adopted i2b2 platform that identifies loyalty cohorts in EHRs based on a machine-learning model, which was previously validated using linked claims data. We developed a novel validation approach, which tests, using only EHR data, whether patients returned to the same healthcare system after the training period. We evaluated these tools at 3 institutions using data from 2017 to 2019. RESULTS Loyalty cohort calculations to identify patients who returned during a 1-year follow-up yielded a mean area under the receiver operating characteristic curve of 0.77 using the original model and 0.80 after calibrating the model at individual sites. Factors such as multiple medications or visits contributed significantly at all sites. Screening tests' contributions (eg, colonoscopy) varied across sites, likely due to coding and population differences. DISCUSSION This open-source implementation of a "loyalty score" algorithm had good predictive power. Enriching research cohorts by utilizing these low-missingness patients is a way to obtain the data completeness necessary for accurate causal analysis. CONCLUSION i2b2 sites can use this approach to select cohorts with mostly complete EHR data.
Collapse
Affiliation(s)
- Jeffrey G Klann
- Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, United States
- Department of Medicine, Harvard Medical School, Boston, MA 02115, United States
| | - Darren W Henderson
- Institute of Biomedical Informatics, University of Kentucky, Lexington, KY 40506, United States
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Hossein Estiri
- Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, United States
- Department of Medicine, Harvard Medical School, Boston, MA 02115, United States
| | - Griffin M Weber
- Beth Israel Deaconess Medical Center, Boston, MA 02115, United States
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Shawn N Murphy
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, United States
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, United States
| |
Collapse
|
3
|
Syed R, Eden R, Makasi T, Chukwudi I, Mamudu A, Kamalpour M, Kapugama Geeganage D, Sadeghianasl S, Leemans SJJ, Goel K, Andrews R, Wynn MT, Ter Hofstede A, Myers T. Digital Health Data Quality Issues: Systematic Review. J Med Internet Res 2023; 25:e42615. [PMID: 37000497 PMCID: PMC10131725 DOI: 10.2196/42615] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 12/07/2022] [Accepted: 12/31/2022] [Indexed: 04/01/2023] Open
Abstract
BACKGROUND The promise of digital health is principally dependent on the ability to electronically capture data that can be analyzed to improve decision-making. However, the ability to effectively harness data has proven elusive, largely because of the quality of the data captured. Despite the importance of data quality (DQ), an agreed-upon DQ taxonomy evades literature. When consolidated frameworks are developed, the dimensions are often fragmented, without consideration of the interrelationships among the dimensions or their resultant impact. OBJECTIVE The aim of this study was to develop a consolidated digital health DQ dimension and outcome (DQ-DO) framework to provide insights into 3 research questions: What are the dimensions of digital health DQ? How are the dimensions of digital health DQ related? and What are the impacts of digital health DQ? METHODS Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a developmental systematic literature review was conducted of peer-reviewed literature focusing on digital health DQ in predominately hospital settings. A total of 227 relevant articles were retrieved and inductively analyzed to identify digital health DQ dimensions and outcomes. The inductive analysis was performed through open coding, constant comparison, and card sorting with subject matter experts to identify digital health DQ dimensions and digital health DQ outcomes. Subsequently, a computer-assisted analysis was performed and verified by DQ experts to identify the interrelationships among the DQ dimensions and relationships between DQ dimensions and outcomes. The analysis resulted in the development of the DQ-DO framework. RESULTS The digital health DQ-DO framework consists of 6 dimensions of DQ, namely accessibility, accuracy, completeness, consistency, contextual validity, and currency; interrelationships among the dimensions of digital health DQ, with consistency being the most influential dimension impacting all other digital health DQ dimensions; 5 digital health DQ outcomes, namely clinical, clinician, research-related, business process, and organizational outcomes; and relationships between the digital health DQ dimensions and DQ outcomes, with the consistency and accessibility dimensions impacting all DQ outcomes. CONCLUSIONS The DQ-DO framework developed in this study demonstrates the complexity of digital health DQ and the necessity for reducing digital health DQ issues. The framework further provides health care executives with holistic insights into DQ issues and resultant outcomes, which can help them prioritize which DQ-related problems to tackle first.
Collapse
Affiliation(s)
- Rehan Syed
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Rebekah Eden
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Tendai Makasi
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Ignatius Chukwudi
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Azumah Mamudu
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Mostafa Kamalpour
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Dakshi Kapugama Geeganage
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Sareh Sadeghianasl
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Sander J J Leemans
- Rheinisch-Westfälische Technische Hochschule, Aachen University, Aachen, Germany
| | - Kanika Goel
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Robert Andrews
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Moe Thandar Wynn
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Arthur Ter Hofstede
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Trina Myers
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
4
|
Five-dimensional evaluation system and perceptron intelligent computing performance measurement methods based on medical heterogeneous equipment health data. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08316-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
|
5
|
Ozonze O, Scott PJ, Hopgood AA. Automating Electronic Health Record Data Quality Assessment. J Med Syst 2023; 47:23. [PMID: 36781551 PMCID: PMC9925537 DOI: 10.1007/s10916-022-01892-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 11/15/2022] [Indexed: 02/15/2023]
Abstract
Information systems such as Electronic Health Record (EHR) systems are susceptible to data quality (DQ) issues. Given the growing importance of EHR data, there is an increasing demand for strategies and tools to help ensure that available data are fit for use. However, developing reliable data quality assessment (DQA) tools necessary for guiding and evaluating improvement efforts has remained a fundamental challenge. This review examines the state of research on operationalising EHR DQA, mainly automated tooling, and highlights necessary considerations for future implementations. We reviewed 1841 articles from PubMed, Web of Science, and Scopus published between 2011 and 2021. 23 DQA programs deployed in real-world settings to assess EHR data quality (n = 14), and a few experimental prototypes (n = 9), were identified. Many of these programs investigate completeness (n = 15) and value conformance (n = 12) quality dimensions and are backed by knowledge items gathered from domain experts (n = 9), literature reviews and existing DQ measurements (n = 3). A few DQA programs also explore the feasibility of using data-driven techniques to assess EHR data quality automatically. Overall, the automation of EHR DQA is gaining traction, but current efforts are fragmented and not backed by relevant theory. Existing programs also vary in scope, type of data supported, and how measurements are sourced. There is a need to standardise programs for assessing EHR data quality, as current evidence suggests their quality may be unknown.
Collapse
Affiliation(s)
- Obinwa Ozonze
- School of Computing, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth, PO1 3HE, UK
| | - Philip J Scott
- Institute of Management and Health, University of Wales Trinity Saint David, Lampeter, SA48 7ED, UK
| | - Adrian A Hopgood
- School of Computing, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth, PO1 3HE, UK.
| |
Collapse
|
6
|
Evans L, London JW, Palchuk MB. Assessing real-world medication data completeness. J Biomed Inform 2021; 119:103847. [PMID: 34161824 DOI: 10.1016/j.jbi.2021.103847] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 06/16/2021] [Accepted: 06/17/2021] [Indexed: 11/29/2022]
Abstract
OBJECTIVE Analysis of healthcare Real-World Data (RWD) provides an opportunity to observe actual patient diagnostic, treatment and outcomes events. However, researchers should understand the possible limitations of RWD. In particular, these data may be incomplete, which would affect the validity of study conclusions. MATERIALS AND METHODS The completeness of medication RWD was investigated by analyzing the incidence of various diagnosis-medication couplets: the occurrence of a certain medication in the RWD for a patient having a certain diagnosis. Diagnosis and medication data were obtained from 61 U.S. medical data provider organizations, members of the TriNetX global research network. The number of patients having 22 diagnoses and expected medications were obtained at each institution, and the percent completion of each diagnosis-medication couplet calculated. The study hypothesis is that the degree of couplet completeness can serve as a proxy for overall completeness of medication data for a given organization. RESULTS Five diagnosis-medication couplets were found to be reliable proxies, having at least a peak 87% observed completeness for the organizations studied: Type 1 diabetes mellitus and insulin; asthma and albuterol; congestive heart failure and diuretics; cardiovascular disease and aspirin; hypothyroidism and levothyroxine. DISCUSSION These couplets were validated as reliable indicators by determining their status as standards of care. The degree to which patients with these five diagnoses had the specified associated medication was consistent within an organization data set. CONCLUSION The overall degree of medication data completeness for an organization can be assessed by measuring the completeness of certain indicator diagnosis-medication couplets.
Collapse
Affiliation(s)
| | - Jack W London
- Cancer Biology, Thomas Jefferson University, Philadelphia, USA.
| | | |
Collapse
|
7
|
Tobochnik S, Pisano W, Lapinskas E, Ligon KL, Lee JW. Effect of PIK3CA variants on glioma-related epilepsy and response to treatment. Epilepsy Res 2021; 175:106681. [PMID: 34102393 DOI: 10.1016/j.eplepsyres.2021.106681] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Revised: 05/11/2021] [Accepted: 05/31/2021] [Indexed: 11/18/2022]
Abstract
Upregulation of the PI3K/AKT/mTOR pathway has been implicated in glioma-related epileptogenesis. In this retrospective analysis, epilepsy characteristics and response to treatment were evaluated in patients with gliomas harboring somatic mutation variants in PIK3CA. A cohort of 134 patients with 150 PIK3CA variants was extracted from previously validated databases. Patients with the hotspot H1047R, R88Q, E542K, and G118D variants comprised a subset (n = 41) for epilepsy phenotyping. In multivariate analysis, the presence of H1047R (n = 15) was associated with worse seizure control (p = 0.026). These results support preclinical findings and suggest that glioma PIK3CA variation may have promise as a biomarker for epilepsy severity and response to treatment.
Collapse
Affiliation(s)
- Steven Tobochnik
- Department of Neurology, Brigham and Women's Hospital, Boston, MA, United States; VA Boston Healthcare System, Boston, MA, United States.
| | - William Pisano
- Department of Oncologic Pathology, Dana-Farber Cancer Institute, Boston, MA, United States
| | - Emily Lapinskas
- Department of Oncologic Pathology, Dana-Farber Cancer Institute, Boston, MA, United States
| | - Keith L Ligon
- Department of Oncologic Pathology, Dana-Farber Cancer Institute, Boston, MA, United States; Department of Pathology, Brigham and Women's Hospital, Boston, MA, United States
| | - Jong Woo Lee
- Department of Neurology, Brigham and Women's Hospital, Boston, MA, United States
| |
Collapse
|
8
|
Estiri H, Klann JG, Weiler SR, Alema-Mensah E, Joseph Applegate R, Lozinski G, Patibandla N, Wei K, Adams WG, Natter MD, Ofili EO, Ostasiewski B, Quarshie A, Rosenthal GE, Bernstam EV, Mandl KD, Murphy SN. A federated EHR network data completeness tracking system. J Am Med Inform Assoc 2020; 26:637-645. [PMID: 30925587 PMCID: PMC6586954 DOI: 10.1093/jamia/ocz014] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 01/04/2019] [Accepted: 01/17/2019] [Indexed: 02/03/2023] Open
Abstract
OBJECTIVE The study sought to design, pilot, and evaluate a federated data completeness tracking system (CTX) for assessing completeness in research data extracted from electronic health record data across the Accessible Research Commons for Health (ARCH) Clinical Data Research Network. MATERIALS AND METHODS The CTX applies a systems-based approach to design workflow and technology for assessing completeness across distributed electronic health record data repositories participating in a queryable, federated network. The CTX invokes 2 positive feedback loops that utilize open source tools (DQe-c and Vue) to integrate technology and human actors in a system geared for increasing capacity and taking action. A pilot implementation of the system involved 6 ARCH partner sites between January 2017 and May 2018. RESULTS The ARCH CTX has enabled the network to monitor and, if needed, adjust its data management processes to maintain complete datasets for secondary use. The system allows the network and its partner sites to profile data completeness both at the network and partner site levels. Interactive visualizations presenting the current state of completeness in the context of the entire network as well as changes in completeness across time were valued among the CTX user base. DISCUSSION Distributed clinical data networks are complex systems. Top-down approaches that solely rely on technology to report data completeness may be necessary but not sufficient for improving completeness (and quality) of data in large-scale clinical data networks. Improving and maintaining complete (high-quality) data in such complex environments entails sociotechnical systems that exploit technology and empower human actors to engage in the process of high-quality data curating. CONCLUSIONS The CTX has increased the network's capacity to rapidly identify data completeness issues and empowered ARCH partner sites to get involved in improving the completeness of respective data in their repositories.
Collapse
Affiliation(s)
- Hossein Estiri
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, Massachusetts, USA.,Research Information Science and Computing, Partners HealthCare, Charlestown, Massachusetts, USA.,Harvard Medical School, Boston, Massachusetts, USA
| | - Jeffrey G Klann
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, Massachusetts, USA.,Research Information Science and Computing, Partners HealthCare, Charlestown, Massachusetts, USA.,Harvard Medical School, Boston, Massachusetts, USA
| | | | | | - R Joseph Applegate
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Galina Lozinski
- Boston University School of Medicine/Boston Medical Center, Boston, Massachusetts, USA
| | - Nandan Patibandla
- Information Services Department, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Kun Wei
- Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - William G Adams
- Department of Pediatrics, Boston University School of Medicine/Boston Medical Center, Boston, Massachusetts, USA
| | - Marc D Natter
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.,Program in Pediatric Rheumatology, Department of Pediatrics, Mass General Hospital for Children, Boston, Massachusetts, USA
| | | | | | | | - Gary E Rosenthal
- Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Elmer V Bernstam
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA.,Division of General Internal Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Shawn N Murphy
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, Massachusetts, USA.,Research Information Science and Computing, Partners HealthCare, Charlestown, Massachusetts, USA.,Harvard Medical School, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
| |
Collapse
|
9
|
Virkus S, Garoufallou E. Data science and its relationship to library and information science: a content analysis. DATA TECHNOLOGIES AND APPLICATIONS 2020. [DOI: 10.1108/dta-07-2020-0167] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThe purpose of this paper is to present the results of a study exploring the emerging field of data science from the library and information science (LIS) perspective.Design/methodology/approachContent analysis of research publications on data science was made of papers published in the Web of Science database to identify the main themes discussed in the publications from the LIS perspective.FindingsA content analysis of 80 publications is presented. The articles belonged to the six broad categories: data science education and training; knowledge and skills of the data professional; the role of libraries and librarians in the data science movement; tools, techniques and applications of data science; data science from the knowledge management perspective; and data science from the perspective of health sciences. The category of tools, techniques and applications of data science was most addressed by the authors, followed by data science from the perspective of health sciences, data science education and training and knowledge and skills of the data professional. However, several publications fell into several categories because these topics were closely related.Research limitations/implicationsOnly publication recorded in the Web of Science database and with the term “data science” in the topic area were analyzed. Therefore, several relevant studies are not discussed in this paper that either were related to other keywords such as “e-science”, “e-research”, “data service”, “data curation”, “research data management” or “scientific data management” or were not present in the Web of Science database.Originality/valueThe paper provides the first exploration by content analysis of the field of data science from the perspective of the LIS.
Collapse
|
10
|
Looten V, Kong Win Chang L, Neuraz A, Landau-Loriot MA, Vedie B, Paul JL, Mauge L, Rivet N, Bonifati A, Chatellier G, Burgun A, Rance B. What can millions of laboratory test results tell us about the temporal aspect of data quality? Study of data spanning 17 years in a clinical data warehouse. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 181:104825. [PMID: 30612785 DOI: 10.1016/j.cmpb.2018.12.030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 12/24/2018] [Accepted: 12/28/2018] [Indexed: 06/09/2023]
Abstract
OBJECTIVE To identify common temporal evolution profiles in biological data and propose a semi-automated method to these patterns in a clinical data warehouse (CDW). MATERIALS AND METHODS We leveraged the CDW of the European Hospital Georges Pompidou and tracked the evolution of 192 biological parameters over a period of 17 years (for 445,000 + patients, and 131 million laboratory test results). RESULTS We identified three common profiles of evolution: discretization, breakpoints, and trends. We developed computational and statistical methods to identify these profiles in the CDW. Overall, of the 192 observed biological parameters (87,814,136 values), 135 presented at least one evolution. We identified breakpoints in 30 distinct parameters, discretizations in 32, and trends in 79. DISCUSSION AND CONCLUSION our method allowed the identification of several temporal events in the data. Considering the distribution over time of these events, we identified probable causes for the observed profiles: instruments or software upgrades and changes in computation formulas. We evaluated the potential impact for data reuse. Finally, we formulated recommendations to enable safe use and sharing of biological data collection to limit the impact of data evolution in retrospective and federated studies (e.g. the annotation of laboratory parameters presenting breakpoints or trends).
Collapse
Affiliation(s)
- Vincent Looten
- INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Hôpital Européen Georges Pompidou, Department of Medical Informatics, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, 20 rue Leblanc, 75015 Paris, France
| | | | - Antoine Neuraz
- INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Hôpital Necker - Enfants Malades, Department of Medical Informatics, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Marie-Anne Landau-Loriot
- Hôpital Européen Georges Pompidou, Department of Biochimistry, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Benoit Vedie
- Hôpital Européen Georges Pompidou, Department of Biochimistry, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Jean-Louis Paul
- Hôpital Européen Georges Pompidou, Department of Biochimistry, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Laëtitia Mauge
- Hôpital Européen Georges Pompidou, Department of Hematology, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Nadia Rivet
- Hôpital Européen Georges Pompidou, Department of Hematology, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Angela Bonifati
- LIRIS UMR CNRS 5205, Université Claude Bernard Lyon 1, Villeurbanne, France
| | - Gilles Chatellier
- Hôpital Européen Georges Pompidou, Department of Medical Informatics, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, 20 rue Leblanc, 75015 Paris, France
| | - Anita Burgun
- INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Hôpital Européen Georges Pompidou, Department of Medical Informatics, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, 20 rue Leblanc, 75015 Paris, France; Hôpital Necker - Enfants Malades, Department of Medical Informatics, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Bastien Rance
- INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Hôpital Européen Georges Pompidou, Department of Medical Informatics, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, 20 rue Leblanc, 75015 Paris, France.
| |
Collapse
|
11
|
Abstract
OBJECTIVES To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2018. METHOD A bibliographic search using a combination of MeSH descriptors and free-text terms on CRI was performed using PubMed, followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. After peer-review ranking, a consensus meeting of the editorial team was organized to conclude on the selection of best papers. RESULTS Among the 1,469 retrieved papers published in 2018 in the various areas of CRI, the full review process selected four best papers. The first best paper describes a simple algorithm detecting co-morbidities in Electronic Healthcare Records (EHRs) using a clinical data warehouse and a knowledge base. The authors of the second best paper present a federated algorithm for predicting heart failure hospital admissions based on patients' medical history described in their distributed EHRs. The third best paper reports the evaluation of an open source, interoperable, and scalable data quality assessment tool measuring completeness of data items, which can be run on different architectures (EHRs and Clinical Data Warehouses (CDWs) based on PCORnet or OMOP data models). The fourth best paper reports a data quality program conducted across 37 hospitals addressing data quality Issues through the whole data life cycle from patient to researcher. CONCLUSIONS Research efforts in the CRI field currently focus on consolidating promises of early Distributed Research Networks aimed at maximizing the potential of large-scale, harmonized data from diverse, quickly developing digital sources. Data quality assessment methods and tools as well as privacy-enhancing techniques are major concerns. It is also notable that, following examples in the US and Asia, ambitious regional or national plans in Europe are launched that aim at developing big data and new artificial intelligence technologies to contribute to the understanding of health and diseases in whole populations and whole health systems, and returning actionable feedback loops to improve existing models of research and care. The use of "real-world" data is continuously increasing but the ultimate role of this data in clinical research remains to be determined.
Collapse
Affiliation(s)
- Christel Daniel
- AP-HP Information Systems Direction, Paris, France
- Sorbonne University, University Paris 13, Sorbonne Paris Cité, INSERM UMR_S 1142, LIMICS, Paris, France
| | | | | |
Collapse
|
12
|
Abstract
Introduction: In aggregate, existing data quality (DQ) checks are currently represented in heterogeneous formats, making it difficult to compare, categorize, and index checks. This study contributes a data element-function conceptual model to facilitate the categorization and indexing of DQ checks and explores the feasibility of leveraging natural language processing (NLP) for scalable acquisition of knowledge of common data elements and functions from DQ checks narratives. Methods: The model defines a “data element”, the primary focus of the check, and a “function”, the qualitative or quantitative measure over a data element. We applied NLP techniques to extract both from 172 checks for Observational Health Data Sciences and Informatics (OHDSI) and 3,434 checks for Kaiser Permanente’s Center for Effectiveness and Safety Research (CESR). Results: The model was able to classify all checks. A total of 751 unique data elements and 24 unique functions were extracted. The top five frequent data element-function pairings for OHDSI were Person-Count (55 checks), Insurance-Distribution (17), Medication-Count (16), Condition-Count (14), and Observations-Count (13); for CESR, they were Medication-Variable Type (175), Medication-Missing (172), Medication-Existence (152), Medication-Count (127), and Socioeconomic Factors-Variable Type (114). Conclusions: This study shows the efficacy of the data element-function conceptual model for classifying DQ checks, demonstrates early promise of NLP-assisted knowledge acquisition, and reveals the great heterogeneity in the focus in DQ checks, confirming variation in intrinsic checks and use-case specific “fitness-for-use” checks.
Collapse
|
13
|
Brennan PF, Chiang MF, Ohno-Machado L. Biomedical informatics and data science: evolving fields with significant overlap. J Am Med Inform Assoc 2019; 25:2-3. [PMID: 29267964 DOI: 10.1093/jamia/ocx146] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Patricia Flatley Brennan
- 9500 Gilman Dr, MC 0728, La Jolla, CA 92093, USA. Phone: 858-822-4931; Fax: 858-822-7685; E-mail:
| | - Michael F Chiang
- 9500 Gilman Dr, MC 0728, La Jolla, CA 92093, USA. Phone: 858-822-4931; Fax: 858-822-7685; E-mail:
| | - Lucila Ohno-Machado
- 9500 Gilman Dr, MC 0728, La Jolla, CA 92093, USA. Phone: 858-822-4931; Fax: 858-822-7685; E-mail:
| |
Collapse
|