1
|
Bazoge A, Morin E, Daille B, Gourraud PA. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review. JMIR Med Inform 2023; 11:e42477. [PMID: 38100200 PMCID: PMC10757232 DOI: 10.2196/42477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 01/16/2023] [Accepted: 09/07/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND In recent years, health data collected during the clinical care process have been often repurposed for secondary use through clinical data warehouses (CDWs), which interconnect disparate data from different sources. A large amount of information of high clinical value is stored in unstructured text format. Natural language processing (NLP), which implements algorithms that can operate on massive unstructured textual data, has the potential to structure the data and make clinical information more accessible. OBJECTIVE The aim of this review was to provide an overview of studies applying NLP to textual data from CDWs. It focuses on identifying the (1) NLP tasks applied to data from CDWs and (2) NLP methods used to tackle these tasks. METHODS This review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched for relevant articles in 3 bibliographic databases: PubMed, Google Scholar, and ACL Anthology. We reviewed the titles and abstracts and included articles according to the following inclusion criteria: (1) focus on NLP applied to textual data from CDWs, (2) articles published between 1995 and 2021, and (3) written in English. RESULTS We identified 1353 articles, of which 194 (14.34%) met the inclusion criteria. Among all identified NLP tasks in the included papers, information extraction from clinical text (112/194, 57.7%) and the identification of patients (51/194, 26.3%) were the most frequent tasks. To address the various tasks, symbolic methods were the most common NLP methods (124/232, 53.4%), showing that some tasks can be partially achieved with classical NLP techniques, such as regular expressions or pattern matching that exploit specialized lexica, such as drug lists and terminologies. Machine learning (70/232, 30.2%) and deep learning (38/232, 16.4%) have been increasingly used in recent years, including the most recent approaches based on transformers. NLP methods were mostly applied to English language data (153/194, 78.9%). CONCLUSIONS CDWs are central to the secondary use of clinical texts for research purposes. Although the use of NLP on data from CDWs is growing, there remain challenges in this field, especially with regard to languages other than English. Clinical NLP is an effective strategy for accessing, extracting, and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and have an impact on clinical practice.
Collapse
Affiliation(s)
- Adrien Bazoge
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
- Nantes Université, CHU de Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France
| | - Emmanuel Morin
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Béatrice Daille
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Pierre-Antoine Gourraud
- Nantes Université, CHU de Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France
- Nantes Université, INSERM, CHU de Nantes, École Centrale Nantes, Centre de Recherche Translationnelle en Transplantation et Immunologie, CR2TI, F-44000 Nantes, France
| |
Collapse
|
2
|
Hjaltelin JX, Novitski SI, Jørgensen IF, Siggaard T, Vulpius SA, Westergaard D, Johansen JS, Chen IM, Juhl Jensen L, Brunak S. Pancreatic cancer symptom trajectories from Danish registry data and free text in electronic health records. eLife 2023; 12:e84919. [PMID: 37988407 PMCID: PMC10662947 DOI: 10.7554/elife.84919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 10/19/2023] [Indexed: 11/23/2023] Open
Abstract
Pancreatic cancer is one of the deadliest cancer types with poor treatment options. Better detection of early symptoms and relevant disease correlations could improve pancreatic cancer prognosis. In this retrospective study, we used symptom and disease codes (ICD-10) from the Danish National Patient Registry (NPR) encompassing 6.9 million patients from 1994 to 2018,, of whom 23,592 were diagnosed with pancreatic cancer. The Danish cancer registry included 18,523 of these patients. To complement and compare the registry diagnosis codes with deeper clinical data, we used a text mining approach to extract symptoms from free text clinical notes in electronic health records (3078 pancreatic cancer patients and 30,780 controls). We used both data sources to generate and compare symptom disease trajectories to uncover temporal patterns of symptoms prior to pancreatic cancer diagnosis for the same patients. We show that the text mining of the clinical notes was able to complement the registry-based symptoms by capturing more symptoms prior to pancreatic cancer diagnosis. For example, 'Blood pressure reading without diagnosis', 'Abnormalities of heartbeat', and 'Intestinal obstruction' were not found for the registry-based analysis. Chaining symptoms together in trajectories identified two groups of patients with lower median survival (<90 days) following the trajectories 'Cough→Jaundice→Intestinal obstruction' and 'Pain→Jaundice→Abnormal results of function studies'. These results provide a comprehensive comparison of the two types of pancreatic cancer symptom trajectories, which in combination can leverage the full potential of the health data and ultimately provide a fuller picture for detection of early risk factors for pancreatic cancer.
Collapse
Affiliation(s)
- Jessica Xin Hjaltelin
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of CopenhagenCopenhagenDenmark
| | - Sif Ingibergsdóttir Novitski
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of CopenhagenCopenhagenDenmark
| | - Isabella Friis Jørgensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of CopenhagenCopenhagenDenmark
| | - Troels Siggaard
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of CopenhagenCopenhagenDenmark
| | - Siri Amalie Vulpius
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of CopenhagenCopenhagenDenmark
| | - David Westergaard
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of CopenhagenCopenhagenDenmark
| | | | - Inna M Chen
- Department of Oncology, Copenhagen University Hospital - Herlev and GentofteHerlevDenmark
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of CopenhagenCopenhagenDenmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of CopenhagenCopenhagenDenmark
- Copenhagen University Hospital, Rigshospitalet, BlegdamsvejCopenhagenDenmark
| |
Collapse
|
3
|
Hacking C, Verbeek H, Hamers JPH, Aarts S. Comparing text mining and manual coding methods: Analysing interview data on quality of care in long-term care for older adults. PLoS One 2023; 18:e0292578. [PMID: 37939098 PMCID: PMC10631650 DOI: 10.1371/journal.pone.0292578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 09/24/2023] [Indexed: 11/10/2023] Open
Abstract
OBJECTIVES In long-term care for older adults, large amounts of text are collected relating to the quality of care, such as transcribed interviews. Researchers currently analyze textual data manually to gain insights, which is a time-consuming process. Text mining could provide a solution, as this methodology can be used to analyze large amounts of text automatically. This study aims to compare text mining to manual coding with regard to sentiment analysis and thematic content analysis. METHODS Data were collected from interviews with residents (n = 21), family members (n = 20), and care professionals (n = 20). Text mining models were developed and compared to the manual approach. The results of the manual and text mining approaches were evaluated based on three criteria: accuracy, consistency, and expert feedback. Accuracy assessed the similarity between the two approaches, while consistency determined whether each individual approach found the same themes in similar text segments. Expert feedback served as a representation of the perceived correctness of the text mining approach. RESULTS An accuracy analysis revealed that more than 80% of the text segments were assigned the same themes and sentiment using both text mining and manual approaches. Interviews coded with text mining demonstrated higher consistency compared to those coded manually. Expert feedback identified certain limitations in both the text mining and manual approaches. CONCLUSIONS AND IMPLICATIONS While these analyses highlighted the current limitations of text mining, they also exposed certain inconsistencies in manual analysis. This information suggests that text mining has the potential to be an effective and efficient tool for analysing large volumes of textual data in the context of long-term care for older adults.
Collapse
Affiliation(s)
- Coen Hacking
- Faculty of Health Medicine and Life Sciences, Department of Health Services Research, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, The Netherlands
- The Living Lab in Ageing & Long-Term Care, Maastricht, The Netherlands
| | - Hilde Verbeek
- Faculty of Health Medicine and Life Sciences, Department of Health Services Research, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, The Netherlands
- The Living Lab in Ageing & Long-Term Care, Maastricht, The Netherlands
| | - Jan P. H. Hamers
- Faculty of Health Medicine and Life Sciences, Department of Health Services Research, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, The Netherlands
- The Living Lab in Ageing & Long-Term Care, Maastricht, The Netherlands
| | - Sil Aarts
- Faculty of Health Medicine and Life Sciences, Department of Health Services Research, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, The Netherlands
- The Living Lab in Ageing & Long-Term Care, Maastricht, The Netherlands
| |
Collapse
|
4
|
Hacking C, Verbeek H, Hamers JPH, Aarts S. The development of an automatic speech recognition model using interview data from long-term care for older adults. J Am Med Inform Assoc 2022; 30:411-417. [PMID: 36495570 PMCID: PMC9933064 DOI: 10.1093/jamia/ocac241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 11/08/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVE In long-term care (LTC) for older adults, interviews are used to collect client perspectives that are often recorded and transcribed verbatim, which is a time-consuming, tedious task. Automatic speech recognition (ASR) could provide a solution; however, current ASR systems are not effective for certain demographic groups. This study aims to show how data from specific groups, such as older adults or people with accents, can be used to develop an effective ASR. MATERIALS AND METHODS An initial ASR model was developed using the Mozilla Common Voice dataset. Audio and transcript data (34 h) from interviews with residents, family, and care professionals on quality of care were used. Interview data were continuously processed to reduce the word error rate (WER). RESULTS Due to background noise and mispronunciations, an initial ASR model had a WER of 48.3% on interview data. After finetuning using interview data, the average WER was reduced to 24.3%. When tested on speech data from the interviews, a median WER of 22.1% was achieved, with residents displaying the highest WER (22.7%). The resulting ASR model was at least 6 times faster than manual transcription. DISCUSSION The current method decreased the WER substantially, verifying its efficacy. Moreover, using local transcription of audio can be beneficial to the privacy of participants. CONCLUSIONS The current study shows that interview data from LTC for older adults can be effectively used to improve an ASR model. While the model output does still contain some errors, researchers reported that it saved much time during transcription.
Collapse
Affiliation(s)
- Coen Hacking
- Corresponding Author: Coen Hacking, MSc, Maastricht University, Duboisdomein 30, P.O. Box 616, 6200 MD Maastricht, The Netherlands;
| | - Hilde Verbeek
- Department of Health Services Research, CAPHRI Care and Public Health Research Institute, Faculty of Health Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands,The Living Lab in Ageing & Long-Term Care, Maastricht, The Netherlands
| | - Jan P H Hamers
- Department of Health Services Research, CAPHRI Care and Public Health Research Institute, Faculty of Health Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands,The Living Lab in Ageing & Long-Term Care, Maastricht, The Netherlands
| | - Sil Aarts
- Department of Health Services Research, CAPHRI Care and Public Health Research Institute, Faculty of Health Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands,The Living Lab in Ageing & Long-Term Care, Maastricht, The Netherlands
| |
Collapse
|
5
|
Kakoti BB, Bezbaruah R, Ahmed N. Therapeutic drug repositioning with special emphasis on neurodegenerative diseases: Threats and issues. Front Pharmacol 2022; 13:1007315. [PMID: 36263141 PMCID: PMC9574100 DOI: 10.3389/fphar.2022.1007315] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 09/12/2022] [Indexed: 11/21/2022] Open
Abstract
Drug repositioning or repurposing is the process of discovering leading-edge indications for authorized or declined/abandoned molecules for use in different diseases. This approach revitalizes the traditional drug discovery method by revealing new therapeutic applications for existing drugs. There are numerous studies available that highlight the triumph of several drugs as repurposed therapeutics. For example, sildenafil to aspirin, thalidomide to adalimumab, and so on. Millions of people worldwide are affected by neurodegenerative diseases. According to a 2021 report, the Alzheimer's disease Association estimates that 6.2 million Americans are detected with Alzheimer's disease. By 2030, approximately 1.2 million people in the United States possibly acquire Parkinson's disease. Drugs that act on a single molecular target benefit people suffering from neurodegenerative diseases. Current pharmacological approaches, on the other hand, are constrained in their capacity to unquestionably alter the course of the disease and provide patients with inadequate and momentary benefits. Drug repositioning-based approaches appear to be very pertinent, expense- and time-reducing strategies for the enhancement of medicinal opportunities for such diseases in the current era. Kinase inhibitors, for example, which were developed for various oncology indications, demonstrated significant neuroprotective effects in neurodegenerative diseases. This review expounds on the classical and recent examples of drug repositioning at various stages of drug development, with a special focus on neurodegenerative disorders and the aspects of threats and issues viz. the regulatory, scientific, and economic aspects.
Collapse
Affiliation(s)
- Bibhuti Bhusan Kakoti
- Department of Pharmaceutical Sciences, Faculty of Science and Engineering, Dibrugarh University, Dibrugarh, India
| | | | | |
Collapse
|
6
|
Baty F, Hegermann J, Locatelli T, Rüegg C, Gysin C, Rassouli F, Brutsche M. Text mining-based measurement of precision of polysomnographic reports as basis for intervention. J Biomed Semantics 2022; 13:5. [PMID: 35101128 PMCID: PMC8805265 DOI: 10.1186/s13326-022-00259-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 01/06/2022] [Indexed: 11/10/2022] Open
Abstract
Background Text mining can be applied to automate knowledge extraction from unstructured data included in medical reports and generate quality indicators applicable for medical documentation. The primary objective of this study was to apply text mining methodology for the analysis of polysomnographic medical reports in order to quantify sources of variation – here the diagnostic precision vs. the inter-rater variability – in the work-up of sleep-disordered breathing. The secondary objective was to assess the impact of a text block standardization on the diagnostic precision of polysomnography reports in an independent test set. Results Polysomnography reports of 243 laboratory-based overnight sleep investigations scored by 9 trained sleep specialists of the Sleep Center St. Gallen were analyzed using a text-mining methodology. Patterns in the usage of discriminating terms allowed for the characterization of type and severity of disease and inter-rater homogeneity. The variation introduced by the inter-rater (technician/physician) heterogeneity was found to be twice as high compared to the variation introduced by effective diagnostic information. A simple text block standardization could significantly reduce the inter-rater variability by 44%, enhance the predictive value and ultimately improve the diagnostic accuracy of polysomnography reports. Conclusions Text mining was successfully used to assess and optimize the quality, as well as the precision and homogeneity of medical reporting of diagnostic procedures – here exemplified with sleep studies. Text mining methodology could lay the ground for objective and systematic qualitative assessment of medical reports. Supplementary Information The online version contains supplementary material available at (10.1186/s13326-022-00259-3).
Collapse
Affiliation(s)
- Florent Baty
- Lung Center, Cantonal Hospital St. Gallen, Rorschacherstrasse 95, St. Gallen, 9007, Switzerland
| | - Jemima Hegermann
- Lung Center, Cantonal Hospital St. Gallen, Rorschacherstrasse 95, St. Gallen, 9007, Switzerland
| | - Tiziana Locatelli
- Lung Center, Cantonal Hospital St. Gallen, Rorschacherstrasse 95, St. Gallen, 9007, Switzerland
| | - Claudio Rüegg
- Division of General Internal Medicine, Cantonal Hospital St. Gallen, Rorschacherstrasse 95, St. Gallen, 9007, Switzerland
| | - Christian Gysin
- Lung Center, Cantonal Hospital St. Gallen, Rorschacherstrasse 95, St. Gallen, 9007, Switzerland
| | - Frank Rassouli
- Lung Center, Cantonal Hospital St. Gallen, Rorschacherstrasse 95, St. Gallen, 9007, Switzerland
| | - Martin Brutsche
- Lung Center, Cantonal Hospital St. Gallen, Rorschacherstrasse 95, St. Gallen, 9007, Switzerland.
| |
Collapse
|
7
|
van Laar SA, Gombert-Handoko KB, Wassenaar S, Kroep JR, Guchelaar HJ, Zwaveling J. Real-world evaluation of supportive care using an electronic health record text-mining tool: G-CSF use in breast cancer patients. Support Care Cancer 2022; 30:9181-9189. [PMID: 36044088 PMCID: PMC9633501 DOI: 10.1007/s00520-022-07343-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 08/24/2022] [Indexed: 01/05/2023]
Abstract
PURPOSE Chemotherapy-induced febrile neutropenia (FN) is a life-threatening and chemotherapy dose-limiting adverse event. FN can be prevented with granulocyte-colony stimulating factors (G-CSFs). Guidelines recommend primary G-CSF use for patients receiving either high (> 20%) FN risk (HR) chemotherapy, or intermediate (10-20%) FN risk (IR) chemotherapy if the overall risk with additional patient-related risk factors exceeds 20%. In this study, we applied an EHR text-mining tool for real-world G-CSF treatment evaluation in breast cancer patients. METHODS Breast cancer patients receiving IR or HR chemotherapy treatments between January 2015 and February 2021 at LUMC, the Netherlands, were included. We retrospectively collected data from EHR with a text-mining tool and assessed G-CSF use, risk factors, and the FN and neutropenia (grades 3-4) and incidence. RESULTS A total of 190 female patients were included, who received 77 HR and 113 IR treatments. In 88.3% of the HR regimens, G-CSF was administered; 7.3% of these patients developed FN vs. 33.3% without G-CSF. Although most IR regimen patients had ≥ 2 risk factors, only 4% received G-CSF, of which none developed neutropenia. However, without G-CSF, 11.9% developed FN and 31.2% severe neutropenia. CONCLUSIONS Our text-mining study shows high G-CSF use among HR regimen patients, and low use among IR regimen patients, although most had ≥ 2 risk factors. Therefore, current practice is not completely in accordance with the guidelines. This shows the need for increased awareness and clarity regarding risk factors. Also, text-mining can effectively be implemented for the evaluation of patient care.
Collapse
Affiliation(s)
- Sylvia A. van Laar
- grid.10419.3d0000000089452978Department of Clinical Pharmacy and Toxicology, Leiden University Medical Center, Albinusdreef 2, 2333ZA Leiden, The Netherlands
| | - Kim B. Gombert-Handoko
- grid.10419.3d0000000089452978Department of Clinical Pharmacy and Toxicology, Leiden University Medical Center, Albinusdreef 2, 2333ZA Leiden, The Netherlands
| | - Sophie Wassenaar
- grid.10419.3d0000000089452978Department of Clinical Pharmacy and Toxicology, Leiden University Medical Center, Albinusdreef 2, 2333ZA Leiden, The Netherlands
| | - Judith R. Kroep
- grid.10419.3d0000000089452978Department of Medical Oncology, Leiden University Medical Center, Leiden, The Netherlands
| | - Henk-Jan Guchelaar
- grid.10419.3d0000000089452978Department of Clinical Pharmacy and Toxicology, Leiden University Medical Center, Albinusdreef 2, 2333ZA Leiden, The Netherlands
| | - Juliette Zwaveling
- grid.10419.3d0000000089452978Department of Clinical Pharmacy and Toxicology, Leiden University Medical Center, Albinusdreef 2, 2333ZA Leiden, The Netherlands
| |
Collapse
|
8
|
Park S, Kim-Knauss Y, Sim JA. Leveraging Text Mining Approach to Identify What People Want to Know About Mental Disorders From Online Inquiry Platforms. Front Public Health 2021; 9:759802. [PMID: 34712643 PMCID: PMC8546111 DOI: 10.3389/fpubh.2021.759802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 09/13/2021] [Indexed: 11/13/2022] Open
Abstract
Online inquiry platforms, which is where a person can anonymously ask questions, have become an important information source for those who are concerned about social stigma and discrimination that follow mental disorders. Therefore, examining what people inquire about regarding mental disorders would be useful when designing educational programs for communities. The present study aimed to examine the contents of the queries regarding mental disorders that were posted on online inquiry platforms. A total of 4,714 relevant queries from the two major online inquiry platforms were collected. We computed word frequencies, centralities, and latent Dirichlet allocation (LDA) topic modeling. The words like symptom, hospital and treatment ranked as the most frequently used words, and the word my appeared to have the highest centrality. LDA identified four latent topics: (1) the understanding of general symptoms, (2) a disability grading system and welfare entitlement, (3) stressful life events, and (4) social adaptation with mental disorders. People are interested in practical information concerning mental disorders, such as social benefits, social adaptation, more general information about the symptoms and the treatments. Our findings suggest that instructions encompassing different scopes of information are needed when developing educational programs.
Collapse
Affiliation(s)
- Soowon Park
- Department of Education, Kyonggi University, Suwon, South Korea
| | - Yaeji Kim-Knauss
- Faculty of Humanities, Social Sciences, and Theology, University of Erlangen-Nuremberg, Nuremberg, Germany
| | - Jin-ah Sim
- School of AI Convergence, Hallym University, Chuncheon, South Korea
| |
Collapse
|
9
|
Du YQ, Zhu GD, Cao J, Huang JY. Research supporting malaria control and elimination in China over four decades: a bibliometric analysis of academic articles published in chinese from 1980 to 2019. Malar J 2021; 20:158. [PMID: 33743712 PMCID: PMC7980574 DOI: 10.1186/s12936-021-03698-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 03/12/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND China has accumulated considerable experience in malaria control and elimination over the past decades. Many research papers have been published in Chinese journals. This study intends to describe the development and experience of malaria control and elimination in China by quantitatively analysing relevant research using a bibliometric analysis. METHODS A long-term, multistage bibliometric analysis was performed. Research articles published in Chinese journals from 1980 to 2019 were retrieved from the Wanfang and China National Knowledge Infrastructure (CNKI) databases. Year of publication, journal name and keywords were extracted by the Bibliographic Items Co-occurrence Matrix Builder (BICOMB). The K/A ratio (the frequency of a keyword among the total number of articles within a certain period) was considered an indicator of the popularity of a keyword in different decades. VOSviewer software was used to construct keyword co-occurrence network maps. RESULTS A total of 16,290 articles were included. The overall number of articles continually increased. However, the number of articles published in the last three years decreased. There were two kinds of keyword frequency trends among the different decades. The K/A ratio of the keyword 'Plasmodium falciparum' decreased (17.05 in the 1980s, 13.04% in the 1990s, 9.86 in the 2000s, 5.28 in the 2010s), but those of 'imported case' and 'surveillance' increased. Drug resistance has been a continuous concern. The keyword co-occurrence network maps showed that the themes of malaria research diversified, and the degree of multidisciplinary cooperation gradually increased. CONCLUSIONS This bibliometric analysis revealed the trends in malaria research in China over the past 40 years. The results suggest emphasis on investigation, multidisciplinary participation and drug resistance by researchers and policymakers in malaria epidemic areas. The results also provide domestic experts with qualitative evidence of China's experience in malaria control and elimination.
Collapse
Affiliation(s)
- Yan-Qiu Du
- Key Lab of Health Technology Assessment, School of Public Health, National Health Commission, Fudan University, 200433, Shanghai, China
- Global Health Institute, Fudan University, Shanghai, 200433, China
| | - Guo-Ding Zhu
- Jiangsu Provincial Key Laboratory of Parasite and Vector Control Technology, National Health Commission Key Laboratory of Parasitic Disease Control and Prevention, Jiangsu Institute of Parasitic Diseases, Wuxi, 214064, China
- Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, 211166, China
| | - Jun Cao
- Key Lab of Health Technology Assessment, School of Public Health, National Health Commission, Fudan University, 200433, Shanghai, China.
- Jiangsu Provincial Key Laboratory of Parasite and Vector Control Technology, National Health Commission Key Laboratory of Parasitic Disease Control and Prevention, Jiangsu Institute of Parasitic Diseases, Wuxi, 214064, China.
- Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, 211166, China.
| | - Jia-Yan Huang
- Key Lab of Health Technology Assessment, School of Public Health, National Health Commission, Fudan University, 200433, Shanghai, China.
- Global Health Institute, Fudan University, Shanghai, 200433, China.
| |
Collapse
|
10
|
Derington CG, Mueller SR, Glanz JM, Binswanger IA. Identifying naloxone administrations in electronic health record data using a text-mining tool. Subst Abus 2020; 42:806-812. [PMID: 33320803 PMCID: PMC8203755 DOI: 10.1080/08897077.2020.1856288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Background: Effective and efficient methods are needed to identify naloxone administrations within electronic health record (EHR) data to conduct overdose surveillance and research. The objective of this study was to develop and validate a text-mining tool to identify naloxone administrations in EHR data. Methods: Clinical notes stored in databases between January 2017 and March 2018 were used to iteratively develop a text-mining tool to identify naloxone administrations. The first iteration of the tool used broad search terms. Then, after reviewing clinical notes of overdose encounters, we developed a list of phrases that described naloxone administrations to inform iteration two. While validating iteration two, additional phrases were found, which were then added to inform the final iteration. The comparator was an administrative code query extracted from the EHR. Medical record review was used to identify true positives. The primary outcome was the positive predictive values (PPV) of the second iteration, final iteration, and administrative code query. Results: Iteration two, the final iteration, and the administrative code had PPVs of 84.3% (95% confidence interval [CI] 78.6-89.0%), 83.8% (95% CI 78.6-88.2%), and 57.1% (95% CI 47.1-66.8%), respectively. Both iterations of the tool had a significantly higher PPV than the administrative code (p < 0.001). Conclusions: A text-mining tool improved the identification of naloxone administrations in EHR data from less than 60% with the administrative code to greater than 80% with both versions of the tool. Text-mining tools can inform the use of more sophisticated informatics methods, which often require significant time, resource, and expertise investment.
Collapse
Affiliation(s)
- Catherine G. Derington
- Department of Population Health Sciences, University of Utah, 295 Chipeta Way, Salt Lake City UT 84112
| | - Shane R. Mueller
- Institute for Health Research, Kaiser Permanente Colorado, 2550 S. Parker Road, Suite 200, Aurora CO 80014
| | - Jason M. Glanz
- Institute for Health Research, Kaiser Permanente Colorado, 2550 S. Parker Road, Suite 200, Aurora CO 80014
- Department of Epidemiology, Colorado School of Public Health, 13001 E 17 Place, Mail Stop B-119, Aurora CO 80045
| | - Ingrid A. Binswanger
- Institute for Health Research, Kaiser Permanente Colorado, 2550 S. Parker Road, Suite 200, Aurora CO 80014
- Colorado Permanente Medical Group, 10350 E. Dakota Ave, Denver CO 80247
- Division of General Internal Medicine, University of Colorado School of Medicine, 13001 E 17 Place, Aurora CO 80045
| |
Collapse
|
11
|
Lee HJ, Chung YJ, Jang S, Seo DW, Lee HK, Yoon D, Lim D, Lee SH. Genome-wide identification of major genes and genomic prediction using high-density and text-mined gene-based SNP panels in Hanwoo (Korean cattle). PLoS One 2020; 15:e0241848. [PMID: 33264312 PMCID: PMC7710051 DOI: 10.1371/journal.pone.0241848] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Accepted: 10/21/2020] [Indexed: 11/24/2022] Open
Abstract
It was hypothesized that single-nucleotide polymorphisms (SNPs) extracted from text-mined genes could be more tightly related to causal variant for each trait and that differentially weighting of this SNP panel in the GBLUP model could improve the performance of genomic prediction in cattle. Fitting two GRMs constructed by text-mined SNPs and SNPs except text-mined SNPs from 777k SNPs set (exp_777K) as different random effects showed better accuracy than fitting one GRM (Im_777K) for six traits (e.g. backfat thickness: + 0.002, eye muscle area: + 0.014, Warner–Bratzler Shear Force of semimembranosus and longissimus dorsi: + 0.024 and + 0.068, intramuscular fat content of semimembranosus and longissimus dorsi: + 0.008 and + 0.018). These results can suggest that attempts to incorporate text mining into genomic predictions seem valuable, and further study using text mining can be expected to present the significant results.
Collapse
Affiliation(s)
- Hyo Jun Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon, Korea
| | - Yoon Ji Chung
- Division of Animal and Dairy Science, Chungnam National University, Daejeon, Korea
| | - Sungbong Jang
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States of America
| | - Dong Won Seo
- Division of Animal and Dairy Science, Chungnam National University, Daejeon, Korea
| | - Hak Kyo Lee
- Department of Animal Biotechnology, Chonbuk National University, Jeonju, Korea
| | - Duhak Yoon
- Department of Animal Science, Kyungpook National University, Sangju, Korea
| | - Dajeong Lim
- Animal Genome & Bioinformatics, National Institute of Animal Science, Wanju, Korea
- * E-mail: (DL); (SHL)
| | - Seung Hwan Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon, Korea
- * E-mail: (DL); (SHL)
| |
Collapse
|
12
|
Application of Text Mining to Nursing Texts. Comput Inform Nurs 2020; 38:475-482. [DOI: 10.1097/cin.0000000000000681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
13
|
Gonzalez-Garcia J, Telleria-Orriols C, Estupinan-Romero F, Bernal-Delgado E. Construction of Empirical Care Pathways Process Models From Multiple Real-World Datasets. IEEE J Biomed Health Inform 2020; 24:2671-2680. [PMID: 32092019 DOI: 10.1109/jbhi.2020.2971146] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Care pathways (CPWs) are "multidisciplinary care plans that detail essential care steps for patients with specific clinical problems." While CPWs impact on health or cost outcomes is vastly studied, an in-depth analysis of the real-world implementation of the CPWs is an area that still remains underexplored. The present work describes how to apply an existing process mining methodology to construct the empirical CPW process models. These process models are a unique piece of information for health services research: for example to evaluate their conformance against the theoretical CPW described on clinical guidelines or to evaluate the impact of the process in health outcomes. To this purpose, this work relies on the design and implementation of a solution that a) synthesizes the expert knowledge on how health care is delivered within and across providers as an activity log, and b) constructs the CPW process model from that activity log using process mining techniques. Unlike previous research based on ad hoc data captures, current approach is built on the linkage of various heterogeneous real-world data (RWD) sets that share a minimum semantic linkage. RWD, defined as secondary use of routinely collected data as opposite to ad hoc data extractions, is a unique source of information for the CPW analysis due to its coverage of the caregiving activities and its wide availability. The viability of the solution is demonstrated by constructing the CPW process model of Code Stroke (Acute Stroke CPW) in the Aragon region (Spain).
Collapse
|
14
|
Paranjpe MD, Taubes A, Sirota M. Insights into Computational Drug Repurposing for Neurodegenerative Disease. Trends Pharmacol Sci 2019; 40:565-576. [PMID: 31326236 PMCID: PMC6771436 DOI: 10.1016/j.tips.2019.06.003] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 04/26/2019] [Accepted: 06/12/2019] [Indexed: 12/14/2022]
Abstract
Computational drug repurposing has the ability to remarkably reduce drug development time and cost in an era where these factors are prohibitively high. Several examples of successful repurposed drugs exist in fields such as oncology, diabetes, leprosy, inflammatory bowel disease, among others, however computational drug repurposing in neurodegenerative disease has presented several unique challenges stemming from the lack of validation methods and difficulty in studying heterogenous diseases of aging. Here, we examine existing approaches to computational drug repurposing, including molecular, clinical, and biophysical methods, and propose data sources and methods to advance computational drug repurposing in neurodegenerative disease using Alzheimer's disease as an example.
Collapse
Affiliation(s)
- Manish D Paranjpe
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA 94158, USA.
| | - Alice Taubes
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Marina Sirota
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA 94158, USA; Gladstone Institutes, San Francisco, CA 94158, USA.
| |
Collapse
|
15
|
Delespierre T, Josseran L. Issues in Building a Nursing Home Syndromic Surveillance System with Textmining: Longitudinal Observational Study. JMIR Public Health Surveill 2018; 4:e69. [PMID: 30545816 PMCID: PMC6315244 DOI: 10.2196/publichealth.9022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Revised: 01/23/2018] [Accepted: 07/23/2018] [Indexed: 11/17/2022] Open
Abstract
Background New nursing homes (NH) data warehouses fed from residents’ medical records allow monitoring the health of elderly population on a daily basis. Elsewhere, syndromic surveillance has already shown that professional data can be used for public health (PH) surveillance but not during a long-term follow-up of the same cohort. Objective This study aimed to build and assess a national ecological NH PH surveillance system (SS). Methods Using a national network of 126 NH, we built a residents’ cohort, extracted medical and personal data from their electronic health records, and transmitted them through the internet to a national server almost in real time. After recording sociodemographic, autonomic and syndromic information, a set of 26 syndromes was defined using pattern matching with the standard query language-LIKE operator and a Delphi-like technique, between November 2010 and June 2016. We used early aberration reporting system (EARS) and Bayes surveillance algorithms of the R surveillance package (Höhle) to assess our influenza and acute gastroenteritis (AGE) syndromic data against the Sentinelles network data, French epidemics gold standard, following Centers for Disease Control and Prevention surveillance system assessment guidelines. Results By extracting all sociodemographic residents’ data, a cohort of 41,061 senior citizens was built. EARS_C3 algorithm on NH influenza and AGE syndromic data gave sensitivities of 0.482 and 0.539 and specificities of 0.844 and 0.952, respectively, over a 6-year period, forecasting the last influenza outbreak by catching early flu signals. In addition, assessment of influenza and AGE syndromic data quality showed precisions of 0.98 and 0.96 during last season epidemic weeks’ peaks (weeks 03-2017 and 01-2017) and precisions of 0.95 and 0.92 during last summer epidemic weeks’ low (week 33-2016). Conclusions This study confirmed that using syndromic information gives a good opportunity to develop a genuine French national PH SS dedicated to senior citizens. Access to senior citizens’ free-text validated health data on influenza and AGE responds to a PH issue for the surveillance of this fragile population. This database will also make possible new ecological research on other subjects that will improve prevention, care, and rapid response when facing health threats.
Collapse
Affiliation(s)
- Tiba Delespierre
- Equipe de recherche (HANDIReSP), UFR des Sciences de la Santé Simone Veil, Université de Versailles Saint-Quentin-en-Yvelines et Université Paris-Saclay, Montigny-le-Bretonneux, France
| | - Loic Josseran
- Equipe de recherche (HANDIReSP), UFR des Sciences de la Santé Simone Veil, Université de Versailles Saint-Quentin-en-Yvelines et Université Paris-Saclay, Montigny-le-Bretonneux, France
| |
Collapse
|