1
|
Bazoge A, Morin E, Daille B, Gourraud PA. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review. JMIR Med Inform 2023; 11:e42477. [PMID: 38100200 PMCID: PMC10757232 DOI: 10.2196/42477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 01/16/2023] [Accepted: 09/07/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND In recent years, health data collected during the clinical care process have been often repurposed for secondary use through clinical data warehouses (CDWs), which interconnect disparate data from different sources. A large amount of information of high clinical value is stored in unstructured text format. Natural language processing (NLP), which implements algorithms that can operate on massive unstructured textual data, has the potential to structure the data and make clinical information more accessible. OBJECTIVE The aim of this review was to provide an overview of studies applying NLP to textual data from CDWs. It focuses on identifying the (1) NLP tasks applied to data from CDWs and (2) NLP methods used to tackle these tasks. METHODS This review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched for relevant articles in 3 bibliographic databases: PubMed, Google Scholar, and ACL Anthology. We reviewed the titles and abstracts and included articles according to the following inclusion criteria: (1) focus on NLP applied to textual data from CDWs, (2) articles published between 1995 and 2021, and (3) written in English. RESULTS We identified 1353 articles, of which 194 (14.34%) met the inclusion criteria. Among all identified NLP tasks in the included papers, information extraction from clinical text (112/194, 57.7%) and the identification of patients (51/194, 26.3%) were the most frequent tasks. To address the various tasks, symbolic methods were the most common NLP methods (124/232, 53.4%), showing that some tasks can be partially achieved with classical NLP techniques, such as regular expressions or pattern matching that exploit specialized lexica, such as drug lists and terminologies. Machine learning (70/232, 30.2%) and deep learning (38/232, 16.4%) have been increasingly used in recent years, including the most recent approaches based on transformers. NLP methods were mostly applied to English language data (153/194, 78.9%). CONCLUSIONS CDWs are central to the secondary use of clinical texts for research purposes. Although the use of NLP on data from CDWs is growing, there remain challenges in this field, especially with regard to languages other than English. Clinical NLP is an effective strategy for accessing, extracting, and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and have an impact on clinical practice.
Collapse
Affiliation(s)
- Adrien Bazoge
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
- Nantes Université, CHU de Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France
| | - Emmanuel Morin
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Béatrice Daille
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Pierre-Antoine Gourraud
- Nantes Université, CHU de Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France
- Nantes Université, INSERM, CHU de Nantes, École Centrale Nantes, Centre de Recherche Translationnelle en Transplantation et Immunologie, CR2TI, F-44000 Nantes, France
| |
Collapse
|
2
|
Benis A, Tamburis O. The Need for Green and Responsible Medical Informatics and Digital Health: Looking Forward with One Digital Health. Yearb Med Inform 2023; 32:7-9. [PMID: 37414027 PMCID: PMC10751118 DOI: 10.1055/s-0043-1768717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023] Open
Abstract
One Health is an important initiative to view the world in a more integrative sense of our health and environment. Digital Health provides essential support to all of us as healthcare professionals and customers. One Digital Health (ODH) combines both One Health and Digital Health to provide a technologically integrative view. ODH gives an essential place to the environment and ecosystems. Thus, health technologies and digital health must be "green" and eco-friendly as much as possible. We suggest in this position paper examples of developing and implementing ODH-related concepts, systems, and products with a respectful consideration of the environment. For humans and animals, developing cutting-edge technologies to improve wellness and healthcare is critical. Nevertheless, we can learn from One Health that digitalization and so One Digital Health must be built to implement green, eco-friendly, and responsible thinking.
Collapse
Affiliation(s)
- Arriel Benis
- Department of Digital Medical Technologies, Holon Institute of Technology, Israel
| | - Oscar Tamburis
- Institute of Biostructures and Bioimaging, National Research Council, Naples, Italy
| |
Collapse
|
3
|
Doutreligne M, Degremont A, Jachiet PA, Lamer A, Tannier X. Good practices for clinical data warehouse implementation: A case study in France. PLOS DIGITAL HEALTH 2023; 2:e0000298. [PMID: 37410797 PMCID: PMC10325086 DOI: 10.1371/journal.pdig.0000298] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/08/2023]
Abstract
Real-world data (RWD) bears great promises to improve the quality of care. However, specific infrastructures and methodologies are required to derive robust knowledge and brings innovations to the patient. Drawing upon the national case study of the 32 French regional and university hospitals governance, we highlight key aspects of modern clinical data warehouses (CDWs): governance, transparency, types of data, data reuse, technical tools, documentation, and data quality control processes. Semi-structured interviews as well as a review of reported studies on French CDWs were conducted in a semi-structured manner from March to November 2022. Out of 32 regional and university hospitals in France, 14 have a CDW in production, 5 are experimenting, 5 have a prospective CDW project, 8 did not have any CDW project at the time of writing. The implementation of CDW in France dates from 2011 and accelerated in the late 2020. From this case study, we draw some general guidelines for CDWs. The actual orientation of CDWs towards research requires efforts in governance stabilization, standardization of data schema, and development in data quality and data documentation. Particular attention must be paid to the sustainability of the warehouse teams and to the multilevel governance. The transparency of the studies and the tools of transformation of the data must improve to allow successful multicentric data reuses as well as innovations in routine care.
Collapse
Affiliation(s)
- Matthieu Doutreligne
- Mission Data, Haute Autorité de Santé, Saint-Denis, France
- Inria, Soda team, Palaiseau, France
| | | | | | - Antoine Lamer
- Univ. Lille, CHU Lille, ULR 2694—METRICS: Évaluation des Technologies de santé et des Pratiques médicales, Lille, France
- Fédération régionale de recherche en psychiatrie et santé mentale (F2RSM Psy), Hauts-de-France, Saint-André-Lez-Lille, France
| | - Xavier Tannier
- Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, Laboratoire d’informatique médicale et d’ingénierie des connaissances en e-Santé, LIMICS, France
| |
Collapse
|
4
|
Eysenbach G, Ulrich H, Bergh B, Schreiweis B. Functional Requirements for Medical Data Integration into Knowledge Management Environments: Requirements Elicitation Approach Based on Systematic Literature Analysis. J Med Internet Res 2023; 25:e41344. [PMID: 36757764 PMCID: PMC9951079 DOI: 10.2196/41344] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 10/24/2022] [Accepted: 11/17/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND In patient care, data are historically generated and stored in heterogeneous databases that are domain specific and often noninteroperable or isolated. As the amount of health data increases, the number of isolated data silos is also expected to grow, limiting the accessibility of the collected data. Medical informatics is developing ways to move from siloed data to a more harmonized arrangement in information architectures. This paradigm shift will allow future research to integrate medical data at various levels and from various sources. Currently, comprehensive requirements engineering is working on data integration projects in both patient care- and research-oriented contexts, and it is significantly contributing to the success of such projects. In addition to various stakeholder-based methods, document-based requirement elicitation is a valid method for improving the scope and quality of requirements. OBJECTIVE Our main objective was to provide a general catalog of functional requirements for integrating medical data into knowledge management environments. We aimed to identify where integration projects intersect to derive consistent and representative functional requirements from the literature. On the basis of these findings, we identified which functional requirements for data integration exist in the literature and thus provide a general catalog of requirements. METHODS This work began by conducting a literature-based requirement elicitation based on a broad requirement engineering approach. Thus, in the first step, we performed a web-based systematic literature review to identify published articles that dealt with the requirements for medical data integration. We identified and analyzed the available literature by applying the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. In the second step, we screened the results for functional requirements using the requirements engineering method of document analysis and derived the requirements into a uniform requirement syntax. Finally, we classified the elicited requirements into a category scheme that represents the data life cycle. RESULTS Our 2-step requirements elicitation approach yielded 821 articles, of which 61 (7.4%) were included in the requirement elicitation process. There, we identified 220 requirements, which were covered by 314 references. We assigned the requirements to different data life cycle categories as follows: 25% (55/220) to data acquisition, 35.9% (79/220) to data processing, 12.7% (28/220) to data storage, 9.1% (20/220) to data analysis, 6.4% (14/220) to metadata management, 2.3% (5/220) to data lineage, 3.2% (7/220) to data traceability, and 5.5% (12/220) to data security. CONCLUSIONS The aim of this study was to present a cross-section of functional data integration-related requirements defined in the literature by other researchers. The aim was achieved with 220 distinct requirements from 61 publications. We concluded that scientific publications are, in principle, a reliable source of information for functional requirements with respect to medical data integration. Finally, we provide a broad catalog to support other scientists in the requirement elicitation phase.
Collapse
Affiliation(s)
- G Eysenbach
- Institute for Medical Informatics and StatisticsKiel University and University Hospital Schleswig-HolsteinKielGermany
| | - Hannes Ulrich
- Institute for Medical Informatics and Statistics, Kiel University and University Hospital Schleswig-Holstein, Kiel, Germany
| | - Björn Bergh
- Institute for Medical Informatics and Statistics, Kiel University and University Hospital Schleswig-Holstein, Kiel, Germany
| | - Björn Schreiweis
- Institute for Medical Informatics and Statistics, Kiel University and University Hospital Schleswig-Holstein, Kiel, Germany
| |
Collapse
|
5
|
Gosselin L, Letord C, Leguillon R, Soualmia LF, Dahamna B, Mouazer A, Disson F, Darmoni SJ, Grosjean J. Modeling and integrating interactions involving the CYP450 enzyme system in a multi-terminology server: Contribution to information extraction from a clinical data warehouse. Int J Med Inform 2023; 170:104976. [PMID: 36599261 DOI: 10.1016/j.ijmedinf.2022.104976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 12/21/2022] [Accepted: 12/22/2022] [Indexed: 12/31/2022]
Abstract
INTRODUCTION The cytochrome P450 (CYP450) enzyme system is involved in the metabolism of certain drugs and is responsible for most drug interactions. These interactions result in either an enzymatic inhibition or an enzymatic induction mechanism that has an impact on the therapeutic management of patients. Detecting these drug interactions will allow for better predictability in therapeutic response. Therefore, computerized solutions can represent a valuable help for clinicians in their tasks of detection. OBJECTIVE The objective of this study is to provide a structured data-source of interactions involving the CYP450 enzyme system. These interactions are aimed to be integrated in the cross-lingual multi-terminology server HeTOP (Health Terminologies and Ontologies Portal), to support the query processing of the clinical data warehouse (CDW) EDSaN (Entrepôt de Données de Santé Normand). MATERIAL AND METHODS A selection and curation of drug components (DCs) that share a relationship with the CYP450 system was performed from several international data sources. The DCs were linked according to the type of relationship which can be substrate, inhibitor, or inducer. These relationships were then integrated into the HeTOP server. To validate the CYP450 relationships, a semantic query was performed on the CDW, whose search engine is founded on HeTOP data (concepts, terms, and relations). RESULTS A total of 776 DCs are associated by a new interaction relationship, integrated in HeTOP, by 14 enzymes. These are CYP450 1A2, 2A6, 2B6, 2C8, 2C9, 2C18, 2C19, 2D6, 2E1, 3A4, 3A7, 11B1,11B2 mitochondrial and P-glycoprotein, constituting a total of 2,088 relationships. A general modelling of cytochromic interactions was performed. From this model, 233,006 queries were processed in less than two hours, demonstrating the usefulness and performance of our CDW implementation. Moreover, they showed that in our university hospital, the concurrent prescription that could cause a cytochromic interaction is Bisoprolol with Amiodarone by enzymatic inhibition for 2,493 patients. DISCUSSION The queries submitted to the CDW EDSaN allowed to highlight the most prescribed molecules simultaneously and potentially responsible for cytochromic interactions. In a second step, it would be interesting to evaluate the real clinical impact by looking for possible adverse effects of these interactions in the patients' files. Other computational solutions for cytochromic interactions exist. The impact of CYP450 is particularly important for drugs with narrow therapeutic window (NTW) as they can lead to increased toxicity or therapeutic failure. It is also important to define which drug component is a pro-drug and to considerate the many genetic polymorphisms of patients. CONCLUSION The HeTOP server contains a non-negligible number of relationships between drug components and CYP450 from multiple reference sources. These data allow us to query our Clinical Data Warehouse to highlight these cytochromic interactions. It would be interesting in the future to assess the actual clinical impact in hospital reports.
Collapse
Affiliation(s)
- Laura Gosselin
- Department of Digital Health, Rouen University Hospital, Rouen, France; Department of Pharmacy, Rouen University Hospital, Rouen, France.
| | - Catherine Letord
- Department of Digital Health, Rouen University Hospital, Rouen, France; Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé (LIMICS), U1142, INSERM, Sorbonne Université, Paris, France
| | - Romain Leguillon
- Department of Digital Health, Rouen University Hospital, Rouen, France; Department of Pharmacy, Rouen University Hospital, Rouen, France; Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé (LIMICS), U1142, INSERM, Sorbonne Université, Paris, France
| | - Lina F Soualmia
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé (LIMICS), U1142, INSERM, Sorbonne Université, Paris, France; Normandy University, UNIROUEN, LITIS-TIBS, UR 4108 Rouen, France
| | - Badisse Dahamna
- Department of Digital Health, Rouen University Hospital, Rouen, France; Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé (LIMICS), U1142, INSERM, Sorbonne Université, Paris, France
| | - Abdelmalek Mouazer
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé (LIMICS), U1142, INSERM, Sorbonne Université, Paris, France
| | - Flavien Disson
- Department of Digital Health, Rouen University Hospital, Rouen, France
| | - Stéfan J Darmoni
- Department of Digital Health, Rouen University Hospital, Rouen, France; Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé (LIMICS), U1142, INSERM, Sorbonne Université, Paris, France
| | - Julien Grosjean
- Department of Digital Health, Rouen University Hospital, Rouen, France; Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé (LIMICS), U1142, INSERM, Sorbonne Université, Paris, France
| |
Collapse
|