1
|
Chan JC, Mbanya JC, Chantelot JM, Shestakova M, Ramachandran A, Ilkova H, Deplante L, Rollot M, Melas-Melt L, Gagliardino JJ, Aschner P. Patient-reported outcomes and treatment adherence in type 2 diabetes using natural language processing: Wave 8 of the Observational International Diabetes Management Practices Study. J Diabetes Investig 2024. [PMID: 38840439 DOI: 10.1111/jdi.14228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/17/2024] [Accepted: 04/20/2024] [Indexed: 06/07/2024] Open
Abstract
AIMS/INTRODUCTION We analyzed patient-reported outcomes of people with type 2 diabetes to better understand perceptions and experiences contributing to treatment adherence. MATERIALS AND METHODS In the ongoing International Diabetes Management Practices Study, we collected patient-reported outcomes data from structured questionnaires (chronic treatment acceptance questionnaire and Diabetes Self-Management Questionnaire) and free-text answers to open-ended questions to assess perceptions of treatment value and side-effects, as well as barriers to, and enablers for, adherence and self-management. Free-text answers were analyzed by natural language processing. RESULTS In 2018-2020, we recruited 2,475 patients with type 2 diabetes (43.3% insulin-treated, glycated hemoglobin (HbA1c) 8.0 ± 1.8%; 30.9% with HbA1c <7%) from 13 countries across Africa, the Middle East, Europe, Latin America and Asia. Mean ± standard deviation scores of chronic treatment acceptance questionnaire (acceptance of medication, rated out of 100) and Diabetes Self-Management Questionnaire (self-management, rated out of 10) were 87.8 ± 24.5 and 3.3 ± 0.9, respectively. Based on free-text analysis and coded responses, one in three patients reported treatment non-adherence. Overall, although most patients accepted treatment values and side-effects, self-management was suboptimal. Treatment duration, regimen complexity and disruption of daily routines were major barriers to adherence, whereas habit formation was a key enabler. Treatment-adherent patients were older (60 ± 11.6 vs 55 ± 11.7 years, P < 0.001), and more likely to have longer disease duration (12 ± 8.6 vs 10 ± 7.7 years, P < 0.001), exposure to diabetes education (73.1% vs 67.8%, P < 0.05), lower HbA1c (7.9 ± 1.8% vs 8.3 ± 1.9%, P < 0.001) and attainment of HbA1c <7% (29.7% vs 23.3%, P < 0.01). CONCLUSIONS Patient perceptions/experiences influence treatment adherence and self-management. Patient-centered education and support programs that consider patient-reported outcomes aimed at promoting empowerment and developing new routines might improve glycemic control.
Collapse
Affiliation(s)
- Juliana Cn Chan
- Department of Medicine and Therapeutics, Hong Kong Institute of Diabetes and Obesity and Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Prince of Wales Hospital, Hong Kong SAR, China
| | - Jean Claude Mbanya
- Doctoral School of Life Sciences, Health and Environment, and Department of Medicine and Specialties, Faculty of Medicine and Biomedical Sciences, University of Yaoundé I, Yaoundé, Cameroon
| | | | | | - Ambady Ramachandran
- India Diabetes Research Foundation, Dr. A. Ramachandran's Diabetes Hospitals, Chennai, India
| | | | | | | | | | | | - Pablo Aschner
- Javeriana University School of Medicine and San Ignacio University Hospital, Bogotá, Colombia
| |
Collapse
|
2
|
Petit-Jean T, Gérardin C, Berthelot E, Chatellier G, Frank M, Tannier X, Kempf E, Bey R. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. J Am Med Inform Assoc 2024; 31:1280-1290. [PMID: 38573195 PMCID: PMC11105139 DOI: 10.1093/jamia/ocae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 02/28/2024] [Accepted: 03/13/2024] [Indexed: 04/05/2024] Open
Abstract
OBJECTIVE To develop and validate a natural language processing (NLP) pipeline that detects 18 conditions in French clinical notes, including 16 comorbidities of the Charlson index, while exploring a collaborative and privacy-enhancing workflow. MATERIALS AND METHODS The detection pipeline relied both on rule-based and machine learning algorithms, respectively, for named entity recognition and entity qualification, respectively. We used a large language model pre-trained on millions of clinical notes along with annotated clinical notes in the context of 3 cohort studies related to oncology, cardiology, and rheumatology. The overall workflow was conceived to foster collaboration between studies while respecting the privacy constraints of the data warehouse. We estimated the added values of the advanced technologies and of the collaborative setting. RESULTS The pipeline reached macro-averaged F1-score positive predictive value, sensitivity, and specificity of 95.7 (95%CI 94.5-96.3), 95.4 (95%CI 94.0-96.3), 96.0 (95%CI 94.0-96.7), and 99.2 (95%CI 99.0-99.4), respectively. F1-scores were superior to those observed using alternative technologies or non-collaborative settings. The models were shared through a secured registry. CONCLUSIONS We demonstrated that a community of investigators working on a common clinical data warehouse could efficiently and securely collaborate to develop, validate and use sensitive artificial intelligence models. In particular, we provided an efficient and robust NLP pipeline that detects conditions mentioned in clinical notes.
Collapse
Affiliation(s)
- Thomas Petit-Jean
- Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France
| | - Christel Gérardin
- Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France
- Institut Pierre-Louis d’Epidémiologie et de Santé Publique, INSERM, Sorbonne Université, Paris, 75012, France
| | - Emmanuelle Berthelot
- Department of Cardiology, Hôpital Bicêtre, Assistance Publique-Hôpitaux de Paris, Le Kremlin Bicêtre, 94270, France
| | - Gilles Chatellier
- Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France
- Department of Medical Informatics, Assistance Publique-Hôpitaux de Paris, Centre-Université de Paris (APHP-CUP), Université de Paris, Paris, 75015, France
| | - Marie Frank
- Department of Medical Informatics, Hôpitaux Universitaires Paris-Saclay, Assistance Publique-Hôpitaux de Paris, Le Kremlin-Bicêtre, 94270, France
| | - Xavier Tannier
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé (LIMICS), INSERM, Université Sorbonne Paris Nord, Sorbonne Université, Paris, 75005, France
| | - Emmanuelle Kempf
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé (LIMICS), INSERM, Université Sorbonne Paris Nord, Sorbonne Université, Paris, 75005, France
- Department of Medical Oncology, Henri Mondor and Albert Chenevier Teaching Hospital, Assistance Publique-Hôpitaux de Paris, Créteil, 94000, France
| | - Romain Bey
- Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France
| |
Collapse
|
3
|
Lee S, Martin EA, Pan J, Eastwood CA, Southern DA, Campbell DJT, Shaheen AA, Quan H, Butalia S. Exploring the reliability of inpatient EMR algorithms for diabetes identification. BMJ Health Care Inform 2023; 30:e100894. [PMID: 38123357 DOI: 10.1136/bmjhci-2023-100894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 12/04/2023] [Indexed: 12/23/2023] Open
Abstract
INTRODUCTION Accurate identification of medical conditions within a real-time inpatient setting is crucial for health systems. Current inpatient comorbidity algorithms rely on integrating various sources of administrative data, but at times, there is a considerable lag in obtaining and linking these data. Our study objective was to develop electronic medical records (EMR) data-based inpatient diabetes phenotyping algorithms. MATERIALS AND METHODS A chart review on 3040 individuals was completed, and 583 had diabetes. We linked EMR data on these individuals to the International Classification of Disease (ICD) administrative databases. The following EMR-data-based diabetes algorithms were developed: (1) laboratory data, (2) medication data, (3) laboratory and medications data, (4) diabetes concept keywords and (5) diabetes free-text algorithm. Combined algorithms used or statements between the above algorithms. Algorithm performances were measured using chart review as a gold standard. We determined the best-performing algorithm as the one that showed the high performance of sensitivity (SN), and positive predictive value (PPV). RESULTS The algorithms tested generally performed well: ICD-coded data, SN 0.84, specificity (SP) 0.98, PPV 0.93 and negative predictive value (NPV) 0.96; medication and laboratory algorithm, SN 0.90, SP 0.95, PPV 0.80 and NPV 0.97; all document types algorithm, SN 0.95, SP 0.98, PPV 0.94 and NPV 0.99. DISCUSSION Free-text data-based diabetes algorithm can yield comparable or superior performance to a commonly used ICD-coded algorithm and could supplement existing methods. These types of inpatient EMR-based algorithms for case identification may become a key method for timely resource planning and care delivery.
Collapse
Affiliation(s)
- Seungwon Lee
- Community Health Sciences, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
- Provincial Research Data Services, Alberta Health Services, Edmonton, Alberta, Canada
| | - Elliot A Martin
- Community Health Sciences, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
- Provincial Research Data Services, Alberta Health Services, Edmonton, Alberta, Canada
| | - Jie Pan
- Community Health Sciences, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
- Centre for Health Informatics, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
| | - Cathy A Eastwood
- Community Health Sciences, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
- Centre for Health Informatics, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
| | - Danielle A Southern
- Centre for Health Informatics, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
| | - David J T Campbell
- Community Health Sciences, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
- Department of Medicine, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
| | - Abdel Aziz Shaheen
- Community Health Sciences, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
- Department of Medicine, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
| | - Hude Quan
- Community Health Sciences, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
- Centre for Health Informatics, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
| | - Sonia Butalia
- Community Health Sciences, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
- Department of Medicine, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
| |
Collapse
|
4
|
Lan Z, Turchin A. Impact of possible errors in natural language processing-derived data on downstream epidemiologic analysis. JAMIA Open 2023; 6:ooad111. [PMID: 38152447 PMCID: PMC10752385 DOI: 10.1093/jamiaopen/ooad111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 12/14/2023] [Accepted: 12/19/2023] [Indexed: 12/29/2023] Open
Abstract
Objective To assess the impact of potential errors in natural language processing (NLP) on the results of epidemiologic studies. Materials and Methods We utilized data from three outcomes research studies where the primary predictor variable was generated using NLP. For each of these studies, Monte Carlo simulations were applied to generate datasets simulating potential errors in NLP-derived variables. We subsequently fit the original regression models to these partially simulated datasets and compared the distribution of coefficient estimates to the original study results. Results Among the four models evaluated, the mean change in the point estimate of the relationship between the predictor variable and the outcome ranged from -21.9% to 4.12%. In three of the four models, significance of this relationship was not eliminated in a single of the 500 simulations, and in one model it was eliminated in 12% of simulations. Mean changes in the estimates for confounder variables ranged from 0.27% to 2.27% and significance of the relationship was eliminated between 0% and 9.25% of the time. No variables underwent a shift in the direction of its interpretation. Discussion Impact of simulated NLP errors on the results of epidemiologic studies was modest, with only small changes in effect estimates and no changes in the interpretation of the findings (direction and significance of association with the outcome) for either the NLP-generated variables or other variables in the models. Conclusion NLP errors are unlikely to affect the results of studies that use NLP as the source of data.
Collapse
Affiliation(s)
- Zhou Lan
- Center for Clinical Investigation, Brigham & Women’s Hospital, Boston, MA 02115, United States
- Harvard Medical School, Boston, MA 02115, United States
| | - Alexander Turchin
- Harvard Medical School, Boston, MA 02115, United States
- Division of Endocrinology, Brigham & Women’s Hospital, Boston, MA 02115, United States
| |
Collapse
|
5
|
Mermin-Bunnell K, Zhu Y, Hornback A, Damhorst G, Walker T, Robichaux C, Mathew L, Jaquemet N, Peters K, Johnson TM, Wang MD, Anderson B. Use of Natural Language Processing of Patient-Initiated Electronic Health Record Messages to Identify Patients With COVID-19 Infection. JAMA Netw Open 2023; 6:e2322299. [PMID: 37418261 PMCID: PMC10329205 DOI: 10.1001/jamanetworkopen.2023.22299] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 05/19/2023] [Indexed: 07/08/2023] Open
Abstract
Importance Natural language processing (NLP) has the potential to enable faster treatment access by reducing clinician response time and improving electronic health record (EHR) efficiency. Objective To develop an NLP model that can accurately classify patient-initiated EHR messages and triage COVID-19 cases to reduce clinician response time and improve access to antiviral treatment. Design, Setting, and Participants This retrospective cohort study assessed development of a novel NLP framework to classify patient-initiated EHR messages and subsequently evaluate the model's accuracy. Included patients sent messages via the EHR patient portal from 5 Atlanta, Georgia, hospitals between March 30 and September 1, 2022. Assessment of the model's accuracy consisted of manual review of message contents to confirm the classification label by a team of physicians, nurses, and medical students, followed by retrospective propensity score-matched clinical outcomes analysis. Exposure Prescription of antiviral treatment for COVID-19. Main Outcomes and Measures The 2 primary outcomes were (1) physician-validated evaluation of the NLP model's message classification accuracy and (2) analysis of the model's potential clinical effect via increased patient access to treatment. The model classified messages into COVID-19-other (pertaining to COVID-19 but not reporting a positive test), COVID-19-positive (reporting a positive at-home COVID-19 test result), and non-COVID-19 (not pertaining to COVID-19). Results Among 10 172 patients whose messages were included in analyses, the mean (SD) age was 58 (17) years; 6509 patients (64.0%) were women and 3663 (36.0%) were men. In terms of race and ethnicity, 2544 patients (25.0%) were African American or Black, 20 (0.2%) were American Indian or Alaska Native, 1508 (14.8%) were Asian, 28 (0.3%) were Native Hawaiian or other Pacific Islander, 5980 (58.8%) were White, 91 (0.9%) were more than 1 race or ethnicity, and 1 (0.01%) chose not to answer. The NLP model had high accuracy and sensitivity, with a macro F1 score of 94% and sensitivity of 85% for COVID-19-other, 96% for COVID-19-positive, and 100% for non-COVID-19 messages. Among the 3048 patient-generated messages reporting positive SARS-CoV-2 test results, 2982 (97.8%) were not documented in structured EHR data. Mean (SD) message response time for COVID-19-positive patients who received treatment (364.10 [784.47] minutes) was faster than for those who did not (490.38 [1132.14] minutes; P = .03). Likelihood of antiviral prescription was inversely correlated with message response time (odds ratio, 0.99 [95% CI, 0.98-1.00]; P = .003). Conclusions and Relevance In this cohort study of 2982 COVID-19-positive patients, a novel NLP model classified patient-initiated EHR messages reporting positive COVID-19 test results with high sensitivity. Furthermore, when responses to patient messages occurred faster, patients were more likely to receive antiviral medical prescription within the 5-day treatment window. Although additional analysis on the effect on clinical outcomes is needed, these findings represent a possible use case for integration of NLP algorithms into clinical care.
Collapse
Affiliation(s)
- Kellen Mermin-Bunnell
- Currently a medical student at Emory University School of Medicine, Atlanta, Georgia
| | - Yuanda Zhu
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta
| | - Andrew Hornback
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta
| | - Gregory Damhorst
- Division of Infectious Diseases, Emory University School of Medicine, Atlanta, Georgia
| | - Tiffany Walker
- Division of General Internal Medicine, Emory University School of Medicine, Atlanta, Georgia
| | - Chad Robichaux
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia
| | - Lejy Mathew
- Division of General Internal Medicine, Emory University School of Medicine, Atlanta, Georgia
| | - Nour Jaquemet
- Currently a medical student at Emory University School of Medicine, Atlanta, Georgia
| | | | - Theodore M. Johnson
- Division of General Internal Medicine, Emory University School of Medicine, Atlanta, Georgia
- Atlanta Veterans Affairs Healthcare System, Decatur, Georgia
| | - May Dongmei Wang
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia
| | - Blake Anderson
- Division of General Internal Medicine, Emory University School of Medicine, Atlanta, Georgia
- Atlanta Veterans Affairs Healthcare System, Decatur, Georgia
| |
Collapse
|
6
|
Krebs B, Nataraj A, McCabe E, Clark S, Sufiyan Z, Yamamoto SS, Zaïane O, Gross DP. Developing a triage predictive model for access to a spinal surgeon using clinical variables and natural language processing of radiology reports. EUROPEAN SPINE JOURNAL : OFFICIAL PUBLICATION OF THE EUROPEAN SPINE SOCIETY, THE EUROPEAN SPINAL DEFORMITY SOCIETY, AND THE EUROPEAN SECTION OF THE CERVICAL SPINE RESEARCH SOCIETY 2023:10.1007/s00586-023-07552-4. [PMID: 36740609 DOI: 10.1007/s00586-023-07552-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 01/17/2023] [Accepted: 01/22/2023] [Indexed: 02/07/2023]
Abstract
PURPOSE To utilize natural language processing (NLP) of MRI reports and various clinical variables to develop a preliminary model predictive of the need for surgery in patients with low back and neck pain. Such a model would be beneficial for informing clinical practice decisions and help reduce the number of unnecessary surgical referrals, streamlining the surgical process. METHODS A historical cohort study was conducted using de-identified data from patients referred to a spine assessment clinic. Various demographic, clinical, and radiological variables were included as potential predictors. Full-text radiology reports of patients' MRI findings were vectorized using NLP before applying machine learning algorithms to develop models predicting who underwent surgery. Outputs from these models were then entered into a logistic regression model with clinical variables to develop a preliminary model predictive of surgical recommendations. RESULTS Of the 398 patients assessed, 71 underwent spine surgery. NLP variables were significant predictors in univariate analysis but did not remain in the final logistic regression model. An outcome of receiving surgery was predicted by a primary symptom of low back and leg pain (adjusted odds ratio 2.81), distal pain indicated by a pain diagram (adjusted odds ratio 2.49) and self-reported difficulties walking (adjusted odds ratio 2.73). CONCLUSION A logistic regression model was created to predict which patients may require spine surgery. Simple clinical variables appeared more predictive than variables created using NLP. However, additional research with more data samples is needed to validate this model and fully evaluate the usefulness of NLP for this task.
Collapse
Affiliation(s)
- Brandon Krebs
- Faculty of Rehabilitation Medicine, University of Alberta, Edmonton, Canada
| | - Andrew Nataraj
- Department of Surgery, University of Alberta, Edmonton, Canada
| | - Erin McCabe
- Faculty of Rehabilitation Medicine, University of Alberta, Edmonton, Canada
| | - Shannon Clark
- Department of Computing Science, University of Alberta, Edmonton, Canada
| | - Zahin Sufiyan
- Department of Computing Science, University of Alberta, Edmonton, Canada
| | | | - Osmar Zaïane
- Department of Computing Science, University of Alberta, Edmonton, Canada
| | - Douglas P Gross
- Department of Physical Therapy, University of Alberta, 2-50 Corbett Hall, Alberta, Edmonton, T6G 2G4, Canada.
| |
Collapse
|
7
|
Pethani F, Dunn AG. Natural language processing for clinical notes in dentistry: A systematic review. J Biomed Inform 2023; 138:104282. [PMID: 36623780 DOI: 10.1016/j.jbi.2023.104282] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 12/01/2022] [Accepted: 01/04/2023] [Indexed: 01/09/2023]
Abstract
OBJECTIVE To identify and synthesise research on applications of natural language processing (NLP) for information extraction and retrieval from clinical notes in dentistry. MATERIALS AND METHODS A predefined search strategy was applied in EMBASE, CINAHL and Medline. Studies eligible for inclusion were those that that described, evaluated, or applied NLP to clinical notes containing either human or simulated patient information. Quality of the study design and reporting was independently assessed based on a set of questions derived from relevant tools including CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS). A narrative synthesis was conducted to present the results. RESULTS Of the 17 included studies, 10 developed and evaluated NLP methods and 7 described applications of NLP-based information retrieval methods in dental records. Studies were published between 2015 and 2021, most were missing key details needed for reproducibility, and there was no consistency in design or reporting. The 10 studies developing or evaluating NLP methods used document classification or entity extraction, and 4 compared NLP methods to non-NLP methods. The quality of reporting on NLP studies in dentistry has modestly improved over time. CONCLUSIONS Study design heterogeneity and incomplete reporting of studies currently limits our ability to synthesise NLP applications in dental records. Standardisation of reporting and improved connections between NLP methods and applied NLP in dentistry may improve how we can make use of clinical notes from dentistry in population health or decision support systems. PROTOCOL REGISTRATION PROSPERO CRD42021227823.
Collapse
Affiliation(s)
- Farhana Pethani
- Biomedical Informatics and Digital Health, Faculty of Medicine and Health, the University of Sydney, Sydney, Australia
| | - Adam G Dunn
- Biomedical Informatics and Digital Health, Faculty of Medicine and Health, the University of Sydney, Sydney, Australia.
| |
Collapse
|
8
|
Kwabena AE, Wiafe OB, John BD, Bernard A, Boateng FA. An automated method for developing search strategies for systematic review using Natural Language Processing (NLP). MethodsX 2022; 10:101935. [PMID: 36590320 PMCID: PMC9795520 DOI: 10.1016/j.mex.2022.101935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 11/18/2022] [Indexed: 11/24/2022] Open
Abstract
The design and implementation of systematic reviews and meta-analyses are often hampered by high financial costs, significant time commitment, and biases due to researchers' familiarity with studies. We proposed and implemented a fast and standardized method for search term selection using Natural Language Processing (NLP) and co-occurrence networks to identify relevant search terms to reduce biases in conducting systematic reviews and meta-analyses.•The method was implemented using Python packaged dubbed Ananse, which is benchmarked on the search terms strategy for naïve search proposed by Grames et al. (2019) written in "R". Ananse was applied to a case example towards finding search terms to implement a systematic literature review on cumulative effect studies on forest ecosystems.•The software automatically corrected and classified 100% of the duplicate articles identified by manual deduplication. Ananse was applied to the cumulative effects assessment case study, but it can serve as a general-purpose, open-source software system that can support extensive systematic reviews within a relatively short period with reduced biases.•Besides generating keywords, Ananse can act as middleware or a data converter for integrating multiple datasets into a database.
Collapse
Affiliation(s)
- Antwi Effah Kwabena
- Canadian Forest Service, Great Lakes Forestry Centre, 1219 Queen Street East, Sault Ste. Marie, Ontario, P6A 2E5,Corresponding Author.
| | - Owusu-Banahene Wiafe
- University of Ghana, Department of Computer Engineering, P.O. BOX LG 77, Legon, Accra, Ghana
| | - Boakye-Danquah John
- Canadian Forest Service, Great Lakes Forestry Centre, 1219 Queen Street East, Sault Ste. Marie, Ontario, P6A 2E5
| | - Asare Bernard
- University of Ghana, Department of Computer Engineering, P.O. BOX LG 77, Legon, Accra, Ghana
| | - Frimpong A.F. Boateng
- University of Ghana, Department of Computer Engineering, P.O. BOX LG 77, Legon, Accra, Ghana
| |
Collapse
|
9
|
Wang S, Song F, Qiao Q, Liu Y, Chen J, Ma J. A Comparative Study of Natural Language Processing Algorithms Based on Cities Changing Diabetes Vulnerability Data. Healthcare (Basel) 2022; 10:healthcare10061119. [PMID: 35742169 PMCID: PMC9223144 DOI: 10.3390/healthcare10061119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 06/08/2022] [Accepted: 06/13/2022] [Indexed: 11/16/2022] Open
Abstract
(1) Background: Poor adherence to management behaviors in Chinese Type 2 diabetes mellitus (T2DM) patients leads to an uncontrolled prognosis of diabetes, which results in significant economic costs for China. It is imperative to quickly locate vulnerability factors in the management behavior of patients with T2DM. (2) Methods: In this study, a thematic analysis of the collected interview materials was conducted to construct the themes of T2DM management vulnerability. We explored the applicability of the pre-trained models based on the evaluation metrics in text classification. (3) Results: We constructed 12 themes of vulnerability related to the health and well-being of people with T2DM in Tianjin. We considered that Bidirectional Encoder Representation from Transformers (BERT) performed better in this Natural Language Processing (NLP) task with a shorter completion time. With the splitting ratio of 6:3:1 and batch size of 64 for BERT, the test accuracy was 97.71%, the completion time was 10 min 24 s, and the macro-F1 score was 0.9752. (4) Conclusions: Our results proved the applicability of NLP techniques in this specific Chinese-language medical environment. We filled the knowledge gap in the application of NLP technologies in diabetes management. Our study provided strong support for using NLP techniques to rapidly locate vulnerability factors in T2DM management.
Collapse
|
10
|
Berman AN, Biery DW, Ginder C, Hulme OL, Marcusa D, Leiva O, Wu WY, Cardin N, Hainer J, Bhatt DL, Di Carli MF, Turchin A, Blankstein R. Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio-Canary comorbidity project. Clin Cardiol 2021; 44:1296-1304. [PMID: 34347314 PMCID: PMC8428009 DOI: 10.1002/clc.23687] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 06/24/2021] [Indexed: 12/02/2022] Open
Abstract
Objective: Accurate ascertainment of comorbidities is paramount in clinical research. While manual adjudication is labor‐intensive and expensive, the adoption of electronic health records enables computational analysis of free‐text documentation using natural language processing (NLP) tools. Hypothesis: We sought to develop highly accurate NLP modules to assess for the presence of five key cardiovascular comorbidities in a large electronic health record system. Methods: One‐thousand clinical notes were randomly selected from a cardiovascular registry at Mass General Brigham. Trained physicians manually adjudicated these notes for the following five diagnostic comorbidities: hypertension, dyslipidemia, diabetes, coronary artery disease, and stroke/transient ischemic attack. Using the open‐source Canary NLP system, five separate NLP modules were designed based on 800 “training‐set” notes and validated on 200 “test‐set” notes. Results: Across the five NLP modules, the sentence‐level and note‐level sensitivity, specificity, and positive predictive value was always greater than 85% and was most often greater than 90%. Accuracy tended to be highest for conditions with greater diagnostic clarity (e.g. diabetes and hypertension) and slightly lower for conditions whose greater diagnostic challenges (e.g. myocardial infarction and embolic stroke) may lead to less definitive documentation. Conclusion: We designed five open‐source and highly accurate NLP modules that can be used to assess for the presence of important cardiovascular comorbidities in free‐text health records. These modules have been placed in the public domain and can be used for clinical research, trial recruitment and population management at any institution as well as serve as the basis for further development of cardiovascular NLP tools.
Collapse
Affiliation(s)
- Adam N Berman
- Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - David W Biery
- Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Curtis Ginder
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Olivia L Hulme
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Daniel Marcusa
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Orly Leiva
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Wanda Y Wu
- Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Nicholas Cardin
- Division of Endocrinology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Jon Hainer
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Deepak L Bhatt
- Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Marcelo F Di Carli
- Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.,Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Alexander Turchin
- Division of Endocrinology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Ron Blankstein
- Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.,Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|