1
|
Guo Q, Fu B, Tian Y, Xu S, Meng X. Recent progress in artificial intelligence and machine learning for novel diabetes mellitus medications development. Curr Med Res Opin 2024; 40:1483-1493. [PMID: 39083361 DOI: 10.1080/03007995.2024.2387187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 07/29/2024] [Indexed: 08/02/2024]
Abstract
Diabetes mellitus, stemming from either insulin resistance or inadequate insulin secretion, represents a complex ailment that results in prolonged hyperglycemia and severe complications. Patients endure severe ramifications such as kidney disease, vision impairment, cardiovascular disorders, and susceptibility to infections, leading to significant physical suffering and imposing substantial socio-economic burdens. This condition has evolved into an increasingly severe health crisis. There is an urgent need to develop new treatments with improved efficacy and fewer adverse effects to meet clinical demands. However, novel drug development is costly, time-consuming, and often associated with side effects and suboptimal efficacy, making it a major challenge. Artificial Intelligence (AI) and Machine Learning (ML) have revolutionized drug development across its comprehensive lifecycle, spanning drug discovery, preclinical studies, clinical trials, and post-market surveillance. These technologies have significantly accelerated the identification of promising therapeutic candidates, optimized trial designs, and enhanced post-approval safety monitoring. Recent advances in AI, including data augmentation, interpretable AI, and integration of AI with traditional experimental methods, offer promising strategies for overcoming the challenges inherent in AI-based drug discovery. Despite these advancements, there exists a notable gap in comprehensive reviews detailing AI and ML applications throughout the entirety of developing medications for diabetes mellitus. This review aims to fill this gap by evaluating the impact and potential of AI and ML technologies at various stages of diabetes mellitus drug development. It does that by synthesizing current research findings and technological advances so as to effectively control diabetes mellitus and mitigate its far-reaching social and economic impacts. The integration of AI and ML promises to revolutionize diabetes mellitus treatment strategies, offering hope for improved patient outcomes and reduced healthcare burdens worldwide.
Collapse
Affiliation(s)
- Qi Guo
- School of Pharmacy, Heilongjiang University of Chinese Medicine, Harbin, P. R. China
| | - Bo Fu
- School of Pharmacy, Heilongjiang University of Chinese Medicine, Harbin, P. R. China
| | - Yuan Tian
- School of Pharmacy, Heilongjiang University of Chinese Medicine, Harbin, P. R. China
| | - Shujun Xu
- School of Pharmacy, Heilongjiang University of Chinese Medicine, Harbin, P. R. China
| | - Xin Meng
- School of Pharmacy, Heilongjiang University of Chinese Medicine, Harbin, P. R. China
| |
Collapse
|
2
|
Petit-Jean T, Gérardin C, Berthelot E, Chatellier G, Frank M, Tannier X, Kempf E, Bey R. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. J Am Med Inform Assoc 2024; 31:1280-1290. [PMID: 38573195 PMCID: PMC11105139 DOI: 10.1093/jamia/ocae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 02/28/2024] [Accepted: 03/13/2024] [Indexed: 04/05/2024] Open
Abstract
OBJECTIVE To develop and validate a natural language processing (NLP) pipeline that detects 18 conditions in French clinical notes, including 16 comorbidities of the Charlson index, while exploring a collaborative and privacy-enhancing workflow. MATERIALS AND METHODS The detection pipeline relied both on rule-based and machine learning algorithms, respectively, for named entity recognition and entity qualification, respectively. We used a large language model pre-trained on millions of clinical notes along with annotated clinical notes in the context of 3 cohort studies related to oncology, cardiology, and rheumatology. The overall workflow was conceived to foster collaboration between studies while respecting the privacy constraints of the data warehouse. We estimated the added values of the advanced technologies and of the collaborative setting. RESULTS The pipeline reached macro-averaged F1-score positive predictive value, sensitivity, and specificity of 95.7 (95%CI 94.5-96.3), 95.4 (95%CI 94.0-96.3), 96.0 (95%CI 94.0-96.7), and 99.2 (95%CI 99.0-99.4), respectively. F1-scores were superior to those observed using alternative technologies or non-collaborative settings. The models were shared through a secured registry. CONCLUSIONS We demonstrated that a community of investigators working on a common clinical data warehouse could efficiently and securely collaborate to develop, validate and use sensitive artificial intelligence models. In particular, we provided an efficient and robust NLP pipeline that detects conditions mentioned in clinical notes.
Collapse
Affiliation(s)
- Thomas Petit-Jean
- Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France
| | - Christel Gérardin
- Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France
- Institut Pierre-Louis d’Epidémiologie et de Santé Publique, INSERM, Sorbonne Université, Paris, 75012, France
| | - Emmanuelle Berthelot
- Department of Cardiology, Hôpital Bicêtre, Assistance Publique-Hôpitaux de Paris, Le Kremlin Bicêtre, 94270, France
| | - Gilles Chatellier
- Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France
- Department of Medical Informatics, Assistance Publique-Hôpitaux de Paris, Centre-Université de Paris (APHP-CUP), Université de Paris, Paris, 75015, France
| | - Marie Frank
- Department of Medical Informatics, Hôpitaux Universitaires Paris-Saclay, Assistance Publique-Hôpitaux de Paris, Le Kremlin-Bicêtre, 94270, France
| | - Xavier Tannier
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé (LIMICS), INSERM, Université Sorbonne Paris Nord, Sorbonne Université, Paris, 75005, France
| | - Emmanuelle Kempf
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé (LIMICS), INSERM, Université Sorbonne Paris Nord, Sorbonne Université, Paris, 75005, France
- Department of Medical Oncology, Henri Mondor and Albert Chenevier Teaching Hospital, Assistance Publique-Hôpitaux de Paris, Créteil, 94000, France
| | - Romain Bey
- Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France
| |
Collapse
|
3
|
Towler L, Bondaronek P, Papakonstantinou T, Amlôt R, Chadborn T, Ainsworth B, Yardley L. Applying machine-learning to rapidly analyze large qualitative text datasets to inform the COVID-19 pandemic response: comparing human and machine-assisted topic analysis techniques. Front Public Health 2023; 11:1268223. [PMID: 38026376 PMCID: PMC10644111 DOI: 10.3389/fpubh.2023.1268223] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
Introduction Machine-assisted topic analysis (MATA) uses artificial intelligence methods to help qualitative researchers analyze large datasets. This is useful for researchers to rapidly update healthcare interventions during changing healthcare contexts, such as a pandemic. We examined the potential to support healthcare interventions by comparing MATA with "human-only" thematic analysis techniques on the same dataset (1,472 user responses from a COVID-19 behavioral intervention). Methods In MATA, an unsupervised topic-modeling approach identified latent topics in the text, from which researchers identified broad themes. In human-only codebook analysis, researchers developed an initial codebook based on previous research that was applied to the dataset by the team, who met regularly to discuss and refine the codes. Formal triangulation using a "convergence coding matrix" compared findings between methods, categorizing them as "agreement", "complementary", "dissonant", or "silent". Results Human analysis took much longer than MATA (147.5 vs. 40 h). Both methods identified key themes about what users found helpful and unhelpful. Formal triangulation showed both sets of findings were highly similar. The formal triangulation showed high similarity between the findings. All MATA codes were classified as in agreement or complementary to the human themes. When findings differed slightly, this was due to human researcher interpretations or nuance from human-only analysis. Discussion Results produced by MATA were similar to human-only thematic analysis, with substantial time savings. For simple analyses that do not require an in-depth or subtle understanding of the data, MATA is a useful tool that can support qualitative researchers to interpret and analyze large datasets quickly. This approach can support intervention development and implementation, such as enabling rapid optimization during public health emergencies.
Collapse
Affiliation(s)
- Lauren Towler
- School of Psychology, University of Southampton, Southampton, United Kingdom
- School of Psychological Science, University of Bristol, Bristol, United Kingdom
| | - Paulina Bondaronek
- Department of Health and Social Care, Office for Health Improvement and Disparities, London, United Kingdom
- Institute for Health Informatics, University College London, London, United Kingdom
| | - Trisevgeni Papakonstantinou
- Department of Health and Social Care, Office for Health Improvement and Disparities, London, United Kingdom
- Department of Experimental Psychology, Division of Psychology and Language Sciences, University College London, London, United Kingdom
| | - Richard Amlôt
- Behavioural Science and Insights Unit, UK Health Security Agency, London, United Kingdom
| | - Tim Chadborn
- Department of Health and Social Care, Office for Health Improvement and Disparities, London, United Kingdom
| | - Ben Ainsworth
- Department of Psychology, University of Bath, Bath, United Kingdom
- National Institute for Health Research Biomedical Research Centre, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Lucy Yardley
- School of Psychology, University of Southampton, Southampton, United Kingdom
- School of Psychological Science, University of Bristol, Bristol, United Kingdom
| |
Collapse
|
4
|
Borna S, Maniaci MJ, Haider CR, Maita KC, Torres-Guzman RA, Avila FR, Lunde JJ, Coffey JD, Demaerschalk BM, Forte AJ. Artificial Intelligence Models in Health Information Exchange: A Systematic Review of Clinical Implications. Healthcare (Basel) 2023; 11:2584. [PMID: 37761781 PMCID: PMC10531020 DOI: 10.3390/healthcare11182584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/14/2023] [Accepted: 09/16/2023] [Indexed: 09/29/2023] Open
Abstract
Electronic health record (EHR) systems collate patient data, and the integration and standardization of documents through Health Information Exchange (HIE) play a pivotal role in refining patient management. Although the clinical implications of AI in EHR systems have been extensively analyzed, its application in HIE as a crucial source of patient data is less explored. Addressing this gap, our systematic review delves into utilizing AI models in HIE, gauging their predictive prowess and potential limitations. Employing databases such as Scopus, CINAHL, Google Scholar, PubMed/Medline, and Web of Science and adhering to the PRISMA guidelines, we unearthed 1021 publications. Of these, 11 were shortlisted for the final analysis. A noticeable preference for machine learning models in prognosticating clinical results, notably in oncology and cardiac failures, was evident. The metrics displayed AUC values ranging between 61% and 99.91%. Sensitivity metrics spanned from 12% to 96.50%, specificity from 76.30% to 98.80%, positive predictive values varied from 83.70% to 94.10%, and negative predictive values between 94.10% and 99.10%. Despite variations in specific metrics, AI models drawing on HIE data unfailingly showcased commendable predictive proficiency in clinical verdicts, emphasizing the transformative potential of melding AI with HIE. However, variations in sensitivity highlight underlying challenges. As healthcare's path becomes more enmeshed with AI, a well-rounded, enlightened approach is pivotal to guarantee the delivery of trustworthy and effective AI-augmented healthcare solutions.
Collapse
Affiliation(s)
- Sahar Borna
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Michael J. Maniaci
- Division of Hospital Internal Medicine, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Clifton R. Haider
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN 55902, USA
| | - Karla C. Maita
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
| | | | | | | | - Jordan D. Coffey
- Center for Digital Health, Mayo Clinic, Rochester, MN 55902, USA
| | - Bart M. Demaerschalk
- Center for Digital Health, Mayo Clinic, Rochester, MN 55902, USA
- Department of Neurology, Mayo Clinic College of Medicine and Science, Phoenix, AZ 85054, USA
| | - Antonio J. Forte
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
| |
Collapse
|
5
|
Pethani F, Dunn AG. Natural language processing for clinical notes in dentistry: A systematic review. J Biomed Inform 2023; 138:104282. [PMID: 36623780 DOI: 10.1016/j.jbi.2023.104282] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 12/01/2022] [Accepted: 01/04/2023] [Indexed: 01/09/2023]
Abstract
OBJECTIVE To identify and synthesise research on applications of natural language processing (NLP) for information extraction and retrieval from clinical notes in dentistry. MATERIALS AND METHODS A predefined search strategy was applied in EMBASE, CINAHL and Medline. Studies eligible for inclusion were those that that described, evaluated, or applied NLP to clinical notes containing either human or simulated patient information. Quality of the study design and reporting was independently assessed based on a set of questions derived from relevant tools including CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS). A narrative synthesis was conducted to present the results. RESULTS Of the 17 included studies, 10 developed and evaluated NLP methods and 7 described applications of NLP-based information retrieval methods in dental records. Studies were published between 2015 and 2021, most were missing key details needed for reproducibility, and there was no consistency in design or reporting. The 10 studies developing or evaluating NLP methods used document classification or entity extraction, and 4 compared NLP methods to non-NLP methods. The quality of reporting on NLP studies in dentistry has modestly improved over time. CONCLUSIONS Study design heterogeneity and incomplete reporting of studies currently limits our ability to synthesise NLP applications in dental records. Standardisation of reporting and improved connections between NLP methods and applied NLP in dentistry may improve how we can make use of clinical notes from dentistry in population health or decision support systems. PROTOCOL REGISTRATION PROSPERO CRD42021227823.
Collapse
Affiliation(s)
- Farhana Pethani
- Biomedical Informatics and Digital Health, Faculty of Medicine and Health, the University of Sydney, Sydney, Australia
| | - Adam G Dunn
- Biomedical Informatics and Digital Health, Faculty of Medicine and Health, the University of Sydney, Sydney, Australia.
| |
Collapse
|
6
|
Wang L, Zhang Y, Chignell M, Shan B, Sheehan KA, Razak F, Verma A. Boosting Delirium Identification Accuracy With Sentiment-Based Natural Language Processing: Mixed Methods Study. JMIR Med Inform 2022; 10:e38161. [PMID: 36538363 PMCID: PMC9812273 DOI: 10.2196/38161] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 08/22/2022] [Accepted: 09/19/2022] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Delirium is an acute neurocognitive disorder that affects up to half of older hospitalized medical patients and can lead to dementia, longer hospital stays, increased health costs, and death. Although delirium can be prevented and treated, it is difficult to identify and predict. OBJECTIVE This study aimed to improve machine learning models that retrospectively identify the presence of delirium during hospital stays (eg, to measure the effectiveness of delirium prevention interventions) by using the natural language processing (NLP) technique of sentiment analysis (in this case a feature that identifies sentiment toward, or away from, a delirium diagnosis). METHODS Using data from the General Medicine Inpatient Initiative, a Canadian hospital data and analytics network, a detailed manual review of medical records was conducted from nearly 4000 admissions at 6 Toronto area hospitals. Furthermore, 25.74% (994/3862) of the eligible hospital admissions were labeled as having delirium. Using the data set collected from this study, we developed machine learning models with, and without, the benefit of NLP methods applied to diagnostic imaging reports, and we asked the question "can NLP improve machine learning identification of delirium?" RESULTS Among the eligible 3862 hospital admissions, 994 (25.74%) admissions were labeled as having delirium. Identification and calibration of the models were satisfactory. The accuracy and area under the receiver operating characteristic curve of the main model with NLP in the independent testing data set were 0.807 and 0.930, respectively. The accuracy and area under the receiver operating characteristic curve of the main model without NLP in the independent testing data set were 0.811 and 0.869, respectively. Model performance was also found to be stable over the 5-year period used in the experiment, with identification for a likely future holdout test set being no worse than identification for retrospective holdout test sets. CONCLUSIONS Our machine learning model that included NLP (ie, sentiment analysis in medical image description text mining) produced valid identification of delirium with the sentiment analysis, providing significant additional benefit over the model without NLP.
Collapse
Affiliation(s)
- Lu Wang
- Department of Mechanical & Industrial Engineering, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, Texas State University, San Marcos, TX, United States
| | - Yilun Zhang
- Department of Mechanical & Industrial Engineering, University of Toronto, Toronto, ON, Canada
| | - Mark Chignell
- Department of Mechanical & Industrial Engineering, University of Toronto, Toronto, ON, Canada
| | - Baizun Shan
- Department of Mechanical & Industrial Engineering, University of Toronto, Toronto, ON, Canada
| | - Kathleen A Sheehan
- GEMINI - The General Medicine Inpatient Initiative, Unity Health Toronto, Toronto, ON, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, Canada
| | - Fahad Razak
- GEMINI - The General Medicine Inpatient Initiative, Unity Health Toronto, Toronto, ON, Canada
- Faculty of Medicine & Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
| | - Amol Verma
- GEMINI - The General Medicine Inpatient Initiative, Unity Health Toronto, Toronto, ON, Canada
- Faculty of Medicine & Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
7
|
Kuo HC, Hao S, Jin B, Chou CJ, Han Z, Chang LS, Huang YH, Hwa K, Whitin JC, Sylvester KG, Reddy CD, Chubb H, Ceresnak SR, Kanegaye JT, Tremoulet AH, Burns JC, McElhinney D, Cohen HJ, Ling XB. Single center blind testing of a US multi-center validated diagnostic algorithm for Kawasaki disease in Taiwan. Front Immunol 2022; 13:1031387. [PMID: 36263040 PMCID: PMC9575935 DOI: 10.3389/fimmu.2022.1031387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open
Abstract
BackgroundKawasaki disease (KD) is the leading cause of acquired heart disease in children. The major challenge in KD diagnosis is that it shares clinical signs with other childhood febrile control (FC) subjects. We sought to determine if our algorithmic approach applied to a Taiwan cohort.MethodsA single center (Chang Gung Memorial Hospital in Taiwan) cohort of patients suspected with acute KD were prospectively enrolled by local KD specialists for KD analysis. Our previously single-center developed computer-based two-step algorithm was further tested by a five-center validation in US. This first blinded multi-center trial validated our approach, with sufficient sensitivity and positive predictive value, to identify most patients with KD diagnosed at centers across the US. This study involved 418 KDs and 259 FCs from the Chang Gung Memorial Hospital in Taiwan.FindingsOur diagnostic algorithm retained sensitivity (379 of 418; 90.7%), specificity (223 of 259; 86.1%), PPV (379 of 409; 92.7%), and NPV (223 of 247; 90.3%) comparable to previous US 2016 single center and US 2020 fiver center results. Only 4.7% (15 of 418) of KD and 2.3% (6 of 259) of FC patients were identified as indeterminate. The algorithm identified 18 of 50 (36%) KD patients who presented 2 or 3 principal criteria. Of 418 KD patients, 157 were infants younger than one year and 89.2% (140 of 157) were classified correctly. Of the 44 patients with KD who had coronary artery abnormalities, our diagnostic algorithm correctly identified 43 (97.7%) including all patients with dilated coronary artery but one who found to resolve in 8 weeks.InterpretationThis work demonstrates the applicability of our algorithmic approach and diagnostic portability in Taiwan.
Collapse
Affiliation(s)
- Ho-Chang Kuo
- Kawasaki Disease Center, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
- Department of Pediatrics, Chang Gung University College of Medicine, Kaohsiung, Taiwan
- *Correspondence: Xuefeng B. Ling, ;Ho-Chang Kuo,
| | - Shiying Hao
- School of Medicine, Stanford University, Stanford, CA, United States
| | - Bo Jin
- School of Medicine, Stanford University, Stanford, CA, United States
| | - C. James Chou
- School of Medicine, Stanford University, Stanford, CA, United States
| | - Zhi Han
- School of Medicine, Stanford University, Stanford, CA, United States
| | - Ling-Sai Chang
- Kawasaki Disease Center, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
- Department of Pediatrics, Chang Gung University College of Medicine, Kaohsiung, Taiwan
| | - Ying-Hsien Huang
- Kawasaki Disease Center, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
- Department of Pediatrics, Chang Gung University College of Medicine, Kaohsiung, Taiwan
| | - Kuoyuan Hwa
- Center for Biomedical Industry, Department of Molecular Science and Engineering, National Taipei University of Technology, Taipei, Taiwan
| | - John C. Whitin
- School of Medicine, Stanford University, Stanford, CA, United States
| | - Karl G. Sylvester
- School of Medicine, Stanford University, Stanford, CA, United States
| | - Charitha D. Reddy
- School of Medicine, Stanford University, Stanford, CA, United States
| | - Henry Chubb
- School of Medicine, Stanford University, Stanford, CA, United States
| | - Scott R. Ceresnak
- School of Medicine, Stanford University, Stanford, CA, United States
| | - John T. Kanegaye
- Pediatrics, University of California San Diego, San Diego, CA, United States
| | | | - Jane C. Burns
- Pediatrics, University of California San Diego, San Diego, CA, United States
| | - Doff McElhinney
- School of Medicine, Stanford University, Stanford, CA, United States
| | - Harvey J. Cohen
- School of Medicine, Stanford University, Stanford, CA, United States
| | - Xuefeng B. Ling
- School of Medicine, Stanford University, Stanford, CA, United States
- *Correspondence: Xuefeng B. Ling, ;Ho-Chang Kuo,
| |
Collapse
|
8
|
Wang S, Song F, Qiao Q, Liu Y, Chen J, Ma J. A Comparative Study of Natural Language Processing Algorithms Based on Cities Changing Diabetes Vulnerability Data. Healthcare (Basel) 2022; 10:healthcare10061119. [PMID: 35742169 PMCID: PMC9223144 DOI: 10.3390/healthcare10061119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 06/08/2022] [Accepted: 06/13/2022] [Indexed: 11/16/2022] Open
Abstract
(1) Background: Poor adherence to management behaviors in Chinese Type 2 diabetes mellitus (T2DM) patients leads to an uncontrolled prognosis of diabetes, which results in significant economic costs for China. It is imperative to quickly locate vulnerability factors in the management behavior of patients with T2DM. (2) Methods: In this study, a thematic analysis of the collected interview materials was conducted to construct the themes of T2DM management vulnerability. We explored the applicability of the pre-trained models based on the evaluation metrics in text classification. (3) Results: We constructed 12 themes of vulnerability related to the health and well-being of people with T2DM in Tianjin. We considered that Bidirectional Encoder Representation from Transformers (BERT) performed better in this Natural Language Processing (NLP) task with a shorter completion time. With the splitting ratio of 6:3:1 and batch size of 64 for BERT, the test accuracy was 97.71%, the completion time was 10 min 24 s, and the macro-F1 score was 0.9752. (4) Conclusions: Our results proved the applicability of NLP techniques in this specific Chinese-language medical environment. We filled the knowledge gap in the application of NLP technologies in diabetes management. Our study provided strong support for using NLP techniques to rapidly locate vulnerability factors in T2DM management.
Collapse
|
9
|
Multi-label text mining to identify reasons for appointments to drive population health analytics at a primary care setting. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07306-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
10
|
Montoto C, Gisbert JP, Guerra I, Plaza R, Pajares Villarroya R, Moreno Almazán L, López Martín MDC, Domínguez Antonaya M, Vera Mendoza I, Aparicio J, Martínez V, Tagarro I, Fernandez-Nistal A, Canales L, Menke S, Gomollón F. Evaluation of Natural Language Processing for the Identification of Crohn Disease-Related Variables in Spanish Electronic Health Records: A Validation Study for the PREMONITION-CD Project. JMIR Med Inform 2022; 10:e30345. [PMID: 35179507 PMCID: PMC8900906 DOI: 10.2196/30345] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 07/22/2021] [Accepted: 01/02/2022] [Indexed: 12/29/2022] Open
Abstract
Background The exploration of clinically relevant information in the free text of electronic health records (EHRs) holds the potential to positively impact clinical practice as well as knowledge regarding Crohn disease (CD), an inflammatory bowel disease that may affect any segment of the gastrointestinal tract. The EHRead technology, a clinical natural language processing (cNLP) system, was designed to detect and extract clinical information from narratives in the clinical notes contained in EHRs. Objective The aim of this study is to validate the performance of the EHRead technology in identifying information of patients with CD. Methods We used the EHRead technology to explore and extract CD-related clinical information from EHRs. To validate this tool, we compared the output of the EHRead technology with a manually curated gold standard to assess the quality of our cNLP system in detecting records containing any reference to CD and its related variables. Results The validation metrics for the main variable (CD) were a precision of 0.88, a recall of 0.98, and an F1 score of 0.93. Regarding the secondary variables, we obtained a precision of 0.91, a recall of 0.71, and an F1 score of 0.80 for CD flare, while for the variable vedolizumab (treatment), a precision, recall, and F1 score of 0.86, 0.94, and 0.90 were obtained, respectively. Conclusions This evaluation demonstrates the ability of the EHRead technology to identify patients with CD and their related variables from the free text of EHRs. To the best of our knowledge, this study is the first to use a cNLP system for the identification of CD in EHRs written in Spanish.
Collapse
Affiliation(s)
| | - Javier P Gisbert
- Hospital Universitario de La Princesa, Madrid, Spain.,Instituto de Investigación Sanitaria Princesa (IIS-IP), Madrid, Spain.,Universidad Autónoma de Madrid, Madrid, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Madrid, Spain
| | - Iván Guerra
- Hospital Universitario de Fuenlabrada, Madrid, Spain
| | - Rocío Plaza
- Hospital Universitario Infanta Leonor, Madrid, Spain
| | | | | | | | | | | | | | | | | | | | - Lea Canales
- Department of Software and Computing System, University of Alicante, Alicante, Spain
| | | | - Fernando Gomollón
- Hospital Clínico Universitario Lozano Blesa, Zaragoza, Spain.,Instituto de Investigación Sanitaria Aragón (IISA), Zaragoza, Spain.,Universidad de Zaragoza, Zaragoza, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Zaragoza, Spain
| | | |
Collapse
|
11
|
Chen X, Cheng G, Wang FL, Tao X, Xie H, Xu L. Machine and cognitive intelligence for human health: systematic review. Brain Inform 2022; 9:5. [PMID: 35150379 PMCID: PMC8840949 DOI: 10.1186/s40708-022-00153-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 01/25/2022] [Indexed: 12/27/2022] Open
Abstract
Brain informatics is a novel interdisciplinary area that focuses on scientifically studying the mechanisms of human brain information processing by integrating experimental cognitive neuroscience with advanced Web intelligence-centered information technologies. Web intelligence, which aims to understand the computational, cognitive, physical, and social foundations of the future Web, has attracted increasing attention to facilitate the study of brain informatics to promote human health. A large number of articles created in the recent few years are proof of the investment in Web intelligence-assisted human health. This study systematically reviews academic studies regarding article trends, top journals, subjects, countries/regions, and institutions, study design, artificial intelligence technologies, clinical tasks, and performance evaluation. Results indicate that literature is especially welcomed in subjects such as medical informatics and health care sciences and service. There are several promising topics, for example, random forests, support vector machines, and conventional neural networks for disease detection and diagnosis, semantic Web, ontology mining, and topic modeling for clinical or biomedical text mining, artificial neural networks and logistic regression for prediction, and convolutional neural networks and support vector machines for monitoring and classification. Additionally, future research should focus on algorithm innovations, additional information use, functionality improvement, model and system generalization, scalability, evaluation, and automation, data acquirement and quality improvement, and allowing interaction. The findings of this study help better understand what and how Web intelligence can be applied to promote healthcare procedures and clinical outcomes. This provides important insights into the effective use of Web intelligence to support informatics-enabled brain studies.
Collapse
Affiliation(s)
- Xieling Chen
- Department of Mathematics and Information Technology, The Education University of Hong Kong, Hong Kong SAR, China
| | - Gary Cheng
- Department of Mathematics and Information Technology, The Education University of Hong Kong, Hong Kong SAR, China.
| | - Fu Lee Wang
- School of Science and Technology, Hong Kong Metropolitan University, Hong Kong SAR, China
| | - Xiaohui Tao
- School of Sciences, University of Southern Queensland, Toowoomba, Australia
| | - Haoran Xie
- Department of Computing and Decision Sciences, Lingnan University, Hong Kong SAR, China
| | - Lingling Xu
- School of Science and Technology, Hong Kong Metropolitan University, Hong Kong SAR, China
| |
Collapse
|
12
|
Buchlak QD, Esmaili N, Bennett C, Farrokhi F. Natural Language Processing Applications in the Clinical Neurosciences: A Machine Learning Augmented Systematic Review. ACTA NEUROCHIRURGICA. SUPPLEMENT 2022; 134:277-289. [PMID: 34862552 DOI: 10.1007/978-3-030-85292-4_32] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Natural language processing (NLP), a domain of artificial intelligence (AI) that models human language, has been used in medicine to automate diagnostics, detect adverse events, support decision making and predict clinical outcomes. However, applications to the clinical neurosciences appear to be limited. NLP has matured with the implementation of deep transformer models (e.g., XLNet, BERT, T5, and RoBERTa) and transfer learning. The objectives of this study were to (1) systematically review NLP applications in the clinical neurosciences, and (2) explore NLP analysis to facilitate literature synthesis, providing clear examples to demonstrate the potential capabilities of these technologies for a clinical audience. Our NLP analysis consisted of keyword identification, text summarization and document classification. A total of 48 articles met inclusion criteria. NLP has been applied in the clinical neurosciences to facilitate literature synthesis, data extraction, patient identification, automated clinical reporting and outcome prediction. The number of publications applying NLP has increased rapidly over the past five years. Document classifiers trained to differentiate included and excluded articles demonstrated moderate performance (XLNet AUC = 0.66, BERT AUC = 0.59, RoBERTa AUC = 0.62). The T5 transformer model generated acceptable abstract summaries. The application of NLP has the potential to enhance research and practice in the clinical neurosciences.
Collapse
Affiliation(s)
- Quinlan D Buchlak
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW, Australia.
| | - Nazanin Esmaili
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW, Australia
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, Australia
| | - Christine Bennett
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW, Australia
| | - Farrokh Farrokhi
- Neuroscience Institute, Virginia Mason Medical Center, Seattle, WA, USA
| |
Collapse
|
13
|
Turchin A, Florez Builes LF. Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review. J Diabetes Sci Technol 2021; 15:553-560. [PMID: 33736486 PMCID: PMC8120048 DOI: 10.1177/19322968211000831] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
BACKGROUND Real-world evidence research plays an increasingly important role in diabetes care. However, a large fraction of real-world data are "locked" in narrative format. Natural language processing (NLP) technology offers a solution for analysis of narrative electronic data. METHODS We conducted a systematic review of studies of NLP technology focused on diabetes. Articles published prior to June 2020 were included. RESULTS We included 38 studies in the analysis. The majority (24; 63.2%) described only development of NLP tools; the remainder used NLP tools to conduct clinical research. A large fraction (17; 44.7%) of studies focused on identification of patients with diabetes; the rest covered a broad range of subjects that included hypoglycemia, lifestyle counseling, diabetic kidney disease, insulin therapy and others. The mean F1 score for all studies where it was available was 0.882. It tended to be lower (0.817) in studies of more linguistically complex concepts. Seven studies reported findings with potential implications for improving delivery of diabetes care. CONCLUSION Research in NLP technology to study diabetes is growing quickly, although challenges (e.g. in analysis of more linguistically complex concepts) remain. Its potential to deliver evidence on treatment and improving quality of diabetes care is demonstrated by a number of studies. Further growth in this area would be aided by deeper collaboration between developers and end-users of natural language processing tools as well as by broader sharing of the tools themselves and related resources.
Collapse
Affiliation(s)
- Alexander Turchin
- Brigham and Women’s Hospital, Boston,
MA, USA
- Alexander Turchin, MD, MS, Brigham and
Women’s Hospital, 221 Longwood Avenue, Boston, MA 02115, USA.
| | | |
Collapse
|
14
|
Lee S, Doktorchik C, Martin EA, D'Souza AG, Eastwood C, Shaheen AA, Naugler C, Lee J, Quan H. Electronic Medical Record-Based Case Phenotyping for the Charlson Conditions: Scoping Review. JMIR Med Inform 2021; 9:e23934. [PMID: 33522976 PMCID: PMC7884219 DOI: 10.2196/23934] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 11/20/2020] [Accepted: 12/05/2020] [Indexed: 12/16/2022] Open
Abstract
Background Electronic medical records (EMRs) contain large amounts of rich clinical information. Developing EMR-based case definitions, also known as EMR phenotyping, is an active area of research that has implications for epidemiology, clinical care, and health services research. Objective This review aims to describe and assess the present landscape of EMR-based case phenotyping for the Charlson conditions. Methods A scoping review of EMR-based algorithms for defining the Charlson comorbidity index conditions was completed. This study covered articles published between January 2000 and April 2020, both inclusive. Embase (Excerpta Medica database) and MEDLINE (Medical Literature Analysis and Retrieval System Online) were searched using keywords developed in the following 3 domains: terms related to EMR, terms related to case finding, and disease-specific terms. The manuscript follows the Preferred Reporting Items for Systematic reviews and Meta-analyses extension for Scoping Reviews (PRISMA) guidelines. Results A total of 274 articles representing 299 algorithms were assessed and summarized. Most studies were undertaken in the United States (181/299, 60.5%), followed by the United Kingdom (42/299, 14.0%) and Canada (15/299, 5.0%). These algorithms were mostly developed either in primary care (103/299, 34.4%) or inpatient (168/299, 56.2%) settings. Diabetes, congestive heart failure, myocardial infarction, and rheumatology had the highest number of developed algorithms. Data-driven and clinical rule–based approaches have been identified. EMR-based phenotype and algorithm development reflect the data access allowed by respective health systems, and algorithms vary in their performance. Conclusions Recognizing similarities and differences in health systems, data collection strategies, extraction, data release protocols, and existing clinical pathways is critical to algorithm development strategies. Several strategies to assist with phenotype-based case definitions have been proposed.
Collapse
Affiliation(s)
- Seungwon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Chelsea Doktorchik
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Elliot Asher Martin
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
| | - Adam Giles D'Souza
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
| | - Cathy Eastwood
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Abdel Aziz Shaheen
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Christopher Naugler
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Pathology and Laboratory Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Joon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Hude Quan
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
15
|
Sai Prashanthi G, Deva A, Vadapalli R, Das AV. Automated Categorization of Systemic Disease and Duration From Electronic Medical Record System Data Using Finite-State Machine Modeling: Prospective Validation Study. JMIR Form Res 2020; 4:e24490. [PMID: 33331823 PMCID: PMC7775202 DOI: 10.2196/24490] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 11/12/2020] [Accepted: 11/17/2020] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND One of the major challenges in the health care sector is that approximately 80% of generated data remains unstructured and unused. Since it is difficult to handle unstructured data from electronic medical record systems, it tends to be neglected for analyses in most hospitals and medical centers. Therefore, there is a need to analyze unstructured big data in health care systems so that we can optimally utilize and unearth all unexploited information from it. OBJECTIVE In this study, we aimed to extract a list of diseases and associated keywords along with the corresponding time durations from an indigenously developed electronic medical record system and describe the possibility of analytics from the acquired datasets. METHODS We propose a novel, finite-state machine to sequentially detect and cluster disease names from patients' medical history. We defined 3 states in the finite-state machine and transition matrix, which depend on the identified keyword. In addition, we also defined a state-change action matrix, which is essentially an action associated with each transition. The dataset used in this study was obtained from an indigenously developed electronic medical record system called eyeSmart that was implemented across a large, multitier ophthalmology network in India. The dataset included patients' past medical history and contained records of 10,000 distinct patients. RESULTS We extracted disease names and associated keywords by using the finite-state machine with an accuracy of 95%, sensitivity of 94.9%, and positive predictive value of 100%. For the extraction of the duration of disease, the machine's accuracy was 93%, sensitivity was 92.9%, and the positive predictive value was 100%. CONCLUSIONS We demonstrated that the finite-state machine we developed in this study can be used to accurately identify disease names, associated keywords, and time durations from a large cohort of patient records obtained using an electronic medical record system.
Collapse
Affiliation(s)
| | - Ayush Deva
- International Institute of Information Technology, Hyderabad , Telangana, India
| | - Ranganath Vadapalli
- Department of eyeSmart EMR & AEye, LV Prasad Eye Institute, Hyderabad, Telangana, India
| | - Anthony Vipin Das
- Department of eyeSmart EMR & AEye, LV Prasad Eye Institute, Hyderabad, Telangana, India
| |
Collapse
|
16
|
Nguyen H, Agu E, Tulu B, Strong D, Mombini H, Pedersen P, Lindsay C, Dunn R, Loretz L. Machine learning models for synthesizing actionable care decisions on lower extremity wounds. SMART HEALTH (AMSTERDAM, NETHERLANDS) 2020; 18. [PMID: 33299924 DOI: 10.1016/j.smhl.2020.100139] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Lower extremity chronic wounds affect 4.5 million Americans annually. Due to inadequate access to wound experts in underserved areas, many patients receive non-uniform, non-standard wound care, resulting in increased costs and lower quality of life. We explored machine learning classifiers to generate actionable wound care decisions about four chronic wound types (diabetic foot, pressure, venous, and arterial ulcers). These decisions (target classes) were: (1) Continue current treatment, (2) Request non-urgent change in treatment from a wound specialist, (3) Refer patient to a wound specialist. We compare classification methods (single classifiers, bagged & boosted ensembles, and a deep learning network) to investigate (1) whether visual wound features are sufficient for generating a decision and (2) whether adding unstructured text from wound experts increases classifier accuracy. Using 205 wound images, the Gradient Boosted Machine (XGBoost) outperformed other methods when using both visual and textual wound features, achieving 81% accuracy. Using only visual features decreased the accuracy to 76%, achieved by a Support Vector Machine classifier. We conclude that machine learning classifiers can generate accurate wound care decisions on lower extremity chronic wounds, an important step toward objective, standardized wound care. Higher decision-making accuracy was achieved by leveraging clinical comments from wound experts.
Collapse
Affiliation(s)
- Holly Nguyen
- Worcester Polytechnic Institute, 100 Institute Road, Worcester and 01609, United States
| | - Emmanuel Agu
- Worcester Polytechnic Institute, 100 Institute Road, Worcester and 01609, United States
| | - Bengisu Tulu
- Worcester Polytechnic Institute, 100 Institute Road, Worcester and 01609, United States
| | - Diane Strong
- Worcester Polytechnic Institute, 100 Institute Road, Worcester and 01609, United States
| | - Haadi Mombini
- Worcester Polytechnic Institute, 100 Institute Road, Worcester and 01609, United States
| | - Peder Pedersen
- Worcester Polytechnic Institute, 100 Institute Road, Worcester and 01609, United States
| | - Clifford Lindsay
- University of Massachusetts Medical School/UMass Memorial Health Car, 55 N Lake Ave, Worcester and 01655, United States
| | - Raymond Dunn
- University of Massachusetts Medical School/UMass Memorial Health Car, 55 N Lake Ave, Worcester and 01655, United States
| | - Lorraine Loretz
- University of Massachusetts Medical School/UMass Memorial Health Car, 55 N Lake Ave, Worcester and 01655, United States
| |
Collapse
|
17
|
Hendrickx JO, van Gastel J, Leysen H, Martin B, Maudsley S. High-dimensionality Data Analysis of Pharmacological Systems Associated with Complex Diseases. Pharmacol Rev 2020; 72:191-217. [PMID: 31843941 DOI: 10.1124/pr.119.017921] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
It is widely accepted that molecular reductionist views of highly complex human physiologic activity, e.g., the aging process, as well as therapeutic drug efficacy are largely oversimplifications. Currently some of the most effective appreciation of biologic disease and drug response complexity is achieved using high-dimensionality (H-D) data streams from transcriptomic, proteomic, metabolomics, or epigenomic pipelines. Multiple H-D data sets are now common and freely accessible for complex diseases such as metabolic syndrome, cardiovascular disease, and neurodegenerative conditions such as Alzheimer's disease. Over the last decade our ability to interrogate these high-dimensionality data streams has been profoundly enhanced through the development and implementation of highly effective bioinformatic platforms. Employing these computational approaches to understand the complexity of age-related diseases provides a facile mechanism to then synergize this pathologic appreciation with a similar level of understanding of therapeutic-mediated signaling. For informative pathology and drug-based analytics that are able to generate meaningful therapeutic insight across diverse data streams, novel informatics processes such as latent semantic indexing and topological data analyses will likely be important. Elucidation of H-D molecular disease signatures from diverse data streams will likely generate and refine new therapeutic strategies that will be designed with a cognizance of a realistic appreciation of the complexity of human age-related disease and drug effects. We contend that informatic platforms should be synergistic with more advanced chemical/drug and phenotypic cellular/tissue-based analytical predictive models to assist in either de novo drug prioritization or effective repurposing for the intervention of aging-related diseases. SIGNIFICANCE STATEMENT: All diseases, as well as pharmacological mechanisms, are far more complex than previously thought a decade ago. With the advent of commonplace access to technologies that produce large volumes of high-dimensionality data (e.g., transcriptomics, proteomics, metabolomics), it is now imperative that effective tools to appreciate this highly nuanced data are developed. Being able to appreciate the subtleties of high-dimensionality data will allow molecular pharmacologists to develop the most effective multidimensional therapeutics with effectively engineered efficacy profiles.
Collapse
Affiliation(s)
- Jhana O Hendrickx
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| | - Jaana van Gastel
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| | - Hanne Leysen
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| | - Bronwen Martin
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| | - Stuart Maudsley
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| |
Collapse
|
18
|
Yu CS, Lin YJ, Lin CH, Lin SY, Wu JL, Chang SS. Development of an Online Health Care Assessment for Preventive Medicine: A Machine Learning Approach. J Med Internet Res 2020; 22:e18585. [PMID: 32501272 PMCID: PMC7305560 DOI: 10.2196/18585] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 04/13/2020] [Accepted: 05/14/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND In the era of information explosion, the use of the internet to assist with clinical practice and diagnosis has become a cutting-edge area of research. The application of medical informatics allows patients to be aware of their clinical conditions, which may contribute toward the prevention of several chronic diseases and disorders. OBJECTIVE In this study, we applied machine learning techniques to construct a medical database system from electronic medical records (EMRs) of subjects who have undergone health examination. This system aims to provide online self-health evaluation to clinicians and patients worldwide, enabling personalized health and preventive health. METHODS We built a medical database system based on the literature, and data preprocessing and cleaning were performed for the database. We utilized both supervised and unsupervised machine learning technology to analyze the EMR data to establish prediction models. The models with EMR databases were then applied to the internet platform. RESULTS The validation data were used to validate the online diagnosis prediction system. The accuracy of the prediction model for metabolic syndrome reached 91%, and the area under the receiver operating characteristic (ROC) curve was 0.904 in this system. For chronic kidney disease, the prediction accuracy of the model reached 94.7%, and the area under the ROC curve (AUC) was 0.982. In addition, the system also provided disease diagnosis visualization via clustering, allowing users to check their outcome compared with those in the medical database, enabling increased awareness for a healthier lifestyle. CONCLUSIONS Our web-based health care machine learning system allowed users to access online diagnosis predictions and provided a health examination report. Users could understand and review their health status accordingly. In the future, we aim to connect hospitals worldwide with our platform, so that health care practitioners can make diagnoses or provide patient education to remote patients. This platform can increase the value of preventive medicine and telemedicine.
Collapse
Affiliation(s)
- Cheng-Sheng Yu
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan
- Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Yu-Jiun Lin
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan
- Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Chang-Hsien Lin
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan
- Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Shiyng-Yu Lin
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan
- Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Jenny L Wu
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan
- Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Shy-Shin Chang
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan
- Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
19
|
Bloomgarden ZT. Use of online information in diabetes. J Diabetes 2020; 12:268-269. [PMID: 31943760 DOI: 10.1111/1753-0407.13022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Affiliation(s)
- Zachary T Bloomgarden
- Department of Medicine, Division of Endocrinology, Diabetes, and Bone Disease, Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|
20
|
Ye Q, Patel R, Khan U, Boren SA, Kim MS. Evaluation of provider documentation patterns as a tool to deliver ongoing patient-centred diabetes education and support. Int J Clin Pract 2020; 74:e13451. [PMID: 31769903 PMCID: PMC7047595 DOI: 10.1111/ijcp.13451] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 10/08/2019] [Accepted: 11/20/2019] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Diabetes mellitus (DM) is one of the most common chronic diseases in the world. As a disease with long-term complications requiring changes in management, DM requires not only education at the time of diagnosis, but ongoing diabetes self-management education and support (DSME/S). In the United States, however, only a small proportion of people with DM receive DSME/S, although evidence supports benefits of ongoing DSME/S. The diabetes education that providers deliver during follow-up visits may be an important source for DSME/S for many people with DM. METHODS We collected 200 clinic notes of follow-up visits for 100 adults with DM and studied the History of Present Illness (HPI) and Impression and Plan (I&P) sections. Using a codebook based on the seven principles of American Association of Diabetes Educators Self-Care Behaviors (AADE7), we conducted a multi-step deductive thematic analysis to determine the patterns of DSME/S information occurrence in clinic notes. Additionally, we used the generalised linear mixed models for investigating whether providers delivered DSME/S to people with DM based on patient characteristics. RESULTS During follow-up visits, Monitoring was the most common self-care behaviour mentioned in both HPI and I&P sections. Being Active was the least common self-care behaviour mentioned in the HPI section and Healthy Coping was the least common self-care behaviour mentioned in the I&P section. We found providers delivered more information on Healthy Eating to men compared to women in I&P section. Generally, providers delivered DSME/S to people with DM regardless of patient characteristics. CONCLUSIONS This study focused on the frequency distribution of information providers delivered to the people with DM during follow-up clinic visits based on the AADE7. The results may indicate a lack of patient-centred education when people with DM visit providers for ongoing management. Further studies are needed to identify the underlying reasons why providers have difficulty delivering patient-centred education.
Collapse
Affiliation(s)
- Qing Ye
- University of Missouri Informatics Institute, University of Missouri, Columbia, MO, USA
- Department of Health Management and Informatics, University of Missouri, Columbia, MO, USA
| | - Richa Patel
- Department of Medicine, University of Missouri, Columbia, MO, USA
| | - Uzma Khan
- Department of Medicine, University of Missouri, Columbia, MO, USA
| | - Suzanne Austin Boren
- University of Missouri Informatics Institute, University of Missouri, Columbia, MO, USA
- Department of Health Management and Informatics, University of Missouri, Columbia, MO, USA
| | - Min Soon Kim
- University of Missouri Informatics Institute, University of Missouri, Columbia, MO, USA
- Department of Health Management and Informatics, University of Missouri, Columbia, MO, USA
| |
Collapse
|
21
|
Kersloot MG, Lau F, Abu-Hanna A, Arts DL, Cornet R. Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES. J Biomed Semantics 2019; 10:14. [PMID: 31533810 PMCID: PMC6749652 DOI: 10.1186/s13326-019-0207-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 08/13/2019] [Indexed: 12/05/2022] Open
Abstract
Background Information in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extraction. However, most algorithms in MLP are institution-specific or address only one clinical need, and thus cannot be broadly applied. In addition, most MLP systems do not detect concepts in misspelled text and cannot detect attribute relationships between concepts. The objective of this study was to develop and evaluate an MLP application that includes generic algorithms for the detection of (misspelled) concepts and of attribute relationships between them. Methods An implementation of the MLP system cTAKES, called DIRECT, was developed with generic SNOMED CT concept filter, concept relationship detection, and attribute relationship detection algorithms and a custom dictionary. Four implementations of cTAKES were evaluated by comparing 98 manually annotated oncology charts with the output of DIRECT. The F1-score was determined for named-entity recognition and attribute relationship detection for the concepts ‘lung cancer’, ‘non-small cell lung cancer’, and ‘recurrence’. The performance of the four implementations was compared with a two-tailed permutation test. Results DIRECT detected lung cancer and non-small cell lung cancer concepts with F1-scores between 0.828 and 0.947 and between 0.862 and 0.933, respectively. The concept recurrence was detected with a significantly higher F1-score of 0.921, compared to the other implementations, and the relationship between recurrence and lung cancer with an F1-score of 0.857. The precision of the detection of lung cancer, non-small cell lung cancer, and recurrence concepts were 1.000, 0.966, and 0.879, compared to precisions of 0.943, 0.967, and 0.000 in the original implementation, respectively. Conclusion DIRECT can detect oncology concepts and attribute relationships with high precision and can detect recurrence with significant increase in F1-score, compared to the original implementation of cTAKES, due to the usage of a custom dictionary and a generic concept relationship detection algorithm. These concepts and relationships can be used to encode clinical narratives, and can thus substantially reduce manual chart abstraction efforts, saving time for clinicians and researchers.
Collapse
Affiliation(s)
- Martijn G Kersloot
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, 1105AZ, Amsterdam, The Netherlands.
| | - Francis Lau
- School of Health Information Science, University of Victoria, Victoria, Canada
| | - Ameen Abu-Hanna
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, 1105AZ, Amsterdam, The Netherlands
| | - Derk L Arts
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, 1105AZ, Amsterdam, The Netherlands
| | - Ronald Cornet
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, 1105AZ, Amsterdam, The Netherlands
| |
Collapse
|
22
|
Moon S, Liu S, Scott CG, Samudrala S, Abidian MM, Geske JB, Noseworthy PA, Shellum JL, Chaudhry R, Ommen SR, Nishimura RA, Liu H, Arruda-Olson AM. Automated extraction of sudden cardiac death risk factors in hypertrophic cardiomyopathy patients by natural language processing. Int J Med Inform 2019; 128:32-38. [PMID: 31160009 DOI: 10.1016/j.ijmedinf.2019.05.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 01/19/2019] [Accepted: 05/11/2019] [Indexed: 01/12/2023]
Abstract
BACKGROUND The management of hypertrophic cardiomyopathy (HCM) patients requires the knowledge of risk factors associated with sudden cardiac death (SCD). SCD risk factors such as syncope and family history of SCD (FH-SCD) as well as family history of HCM (FH-HCM) are documented in electronic health records (EHRs) as clinical narratives. Automated extraction of risk factors from clinical narratives by natural language processing (NLP) may expedite management workflow of HCM patients. The aim of this study was to develop and deploy NLP algorithms for automated extraction of syncope, FH-SCD, and FH-HCM from clinical narratives. METHODS AND RESULTS We randomly selected 200 patients from the Mayo HCM registry for development (n = 100) and testing (n = 100) of NLP algorithms for extraction of syncope, FH-SCD as well as FH-HCM from clinical narratives of EHRs. The clinical reference standard was manually abstracted by 2 independent annotators. Performance of NLP algorithms was compared to aggregation and summarization of data entries in the HCM registry for syncope, FH-SCD, and FH-HCM. We also compared the NLP algorithms with billing codes for syncope as well as responses to patient survey questions for FH-SCD and FH-HCM. These analyses demonstrated NLP had superior sensitivity (0.96 vs 0.39, p < 0.001) and comparable specificity (0.90 vs 0.92, p = 0.74) and PPV (0.90 vs 0.83, p = 0.37) compared to billing codes for syncope. For FH-SCD, NLP outperformed survey responses for all parameters (sensitivity: 0.91 vs 0.59, p = 0.002; specificity: 0.98 vs 0.50, p < 0.001; PPV: 0.97 vs 0.38, p < 0.001). NLP also achieved superior sensitivity (0.95 vs 0.24, p < 0.001) with comparable specificity (0.95 vs 1.0, p-value not calculable) and positive predictive value (PPV) (0.92 vs 1.0, p = 0.09) compared to survey responses for FH-HCM. CONCLUSIONS Automated extraction of syncope, FH-SCD and FH-HCM using NLP is feasible and has promise to increase efficiency of workflow for providers managing HCM patients.
Collapse
Affiliation(s)
- Sungrim Moon
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Sijia Liu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Christopher G Scott
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Sujith Samudrala
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Mohamed M Abidian
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Jeffrey B Geske
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | | | - Jane L Shellum
- Robert and Patricia Kern Center for Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA
| | - Rajeev Chaudhry
- Robert and Patricia Kern Center for Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA; Division of Community Internal Medicine, Mayo Clinic, Rochester, MN, USA
| | - Steve R Ommen
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Rick A Nishimura
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Hongfang Liu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Adelaide M Arruda-Olson
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA; Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
23
|
Wang X, Zhang Y, Hao S, Zheng L, Liao J, Ye C, Xia M, Wang O, Liu M, Weng CH, Duong SQ, Jin B, Alfreds ST, Stearns F, Kanov L, Sylvester KG, Widen E, McElhinney DB, Ling XB. Prediction of the 1-Year Risk of Incident Lung Cancer: Prospective Study Using Electronic Health Records from the State of Maine. J Med Internet Res 2019; 21:e13260. [PMID: 31099339 PMCID: PMC6542253 DOI: 10.2196/13260] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 04/18/2019] [Accepted: 04/23/2019] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Lung cancer is the leading cause of cancer death worldwide. Early detection of individuals at risk of lung cancer is critical to reduce the mortality rate. OBJECTIVE The aim of this study was to develop and validate a prospective risk prediction model to identify patients at risk of new incident lung cancer within the next 1 year in the general population. METHODS Data from individual patient electronic health records (EHRs) were extracted from the Maine Health Information Exchange network. The study population consisted of patients with at least one EHR between April 1, 2016, and March 31, 2018, who had no history of lung cancer. A retrospective cohort (N=873,598) and a prospective cohort (N=836,659) were formed for model construction and validation. An Extreme Gradient Boosting (XGBoost) algorithm was adopted to build the model. It assigned a score to each individual to quantify the probability of a new incident lung cancer diagnosis from October 1, 2016, to September 31, 2017. The model was trained with the clinical profile in the retrospective cohort from the preceding 6 months and validated with the prospective cohort to predict the risk of incident lung cancer from April 1, 2017, to March 31, 2018. RESULTS The model had an area under the curve (AUC) of 0.881 (95% CI 0.873-0.889) in the prospective cohort. Two thresholds of 0.0045 and 0.01 were applied to the predictive scores to stratify the population into low-, medium-, and high-risk categories. The incidence of lung cancer in the high-risk category (579/53,922, 1.07%) was 7.7 times higher than that in the overall cohort (1167/836,659, 0.14%). Age, a history of pulmonary diseases and other chronic diseases, medications for mental disorders, and social disparities were found to be associated with new incident lung cancer. CONCLUSIONS We retrospectively developed and prospectively validated an accurate risk prediction model of new incident lung cancer occurring in the next 1 year. Through statistical learning from the statewide EHR data in the preceding 6 months, our model was able to identify statewide high-risk patients, which will benefit the population health through establishment of preventive interventions or more intensive surveillance.
Collapse
Affiliation(s)
- Xiaofang Wang
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China.,Department of Surgery, Stanford University, Stanford, CA, United States
| | - Yan Zhang
- Department of Oncology, The First Hospital of Shijiazhuang, Shijiazhuang, China
| | - Shiying Hao
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Le Zheng
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Jiayu Liao
- Department of Bioengineering, University of California, Riverside, CA, United States.,West China-California Multiomics Research Center, West China Hospital, Sichuan University, Chengdu, China
| | - Chengyin Ye
- Department of Health Management, Hangzhou Normal University, Hangzhou, China
| | - Minjie Xia
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Oliver Wang
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Modi Liu
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Ching Ho Weng
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Son Q Duong
- Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Bo Jin
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | | | - Frank Stearns
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Laura Kanov
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Karl G Sylvester
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Eric Widen
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Doff B McElhinney
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Xuefeng B Ling
- Department of Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| |
Collapse
|
24
|
Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform 2019; 7:e12239. [PMID: 31066697 PMCID: PMC6528438 DOI: 10.2196/12239] [Citation(s) in RCA: 204] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 03/04/2019] [Accepted: 03/24/2019] [Indexed: 01/08/2023] Open
Abstract
Background Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset. Objective The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives. Methods Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using “clinical notes,” “natural language processing,” and “chronic disease” and their variations as keywords to maximize coverage of the articles. Results Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes. Conclusions Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.
Collapse
Affiliation(s)
- Seyedmostafa Sheikhalishahi
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy.,Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
| | - Riccardo Miotto
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Joel T Dudley
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Alberto Lavelli
- NLP Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| | - Fabio Rinaldi
- Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Venet Osmani
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| |
Collapse
|
25
|
Guetterman TC, Chang T, DeJonckheere M, Basu T, Scruggs E, Vydiswaran VGV. Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study. J Med Internet Res 2018; 20:e231. [PMID: 29959110 PMCID: PMC6045788 DOI: 10.2196/jmir.9702] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 05/14/2018] [Accepted: 05/15/2018] [Indexed: 11/18/2022] Open
Abstract
Background Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when analyzing text data. One potential method to address this concern is natural language processing (NLP). Qualitative text analysis involves researchers reading data, assigning code labels, and iteratively developing findings; NLP has the potential to automate part of this process. Unfortunately, little methodological research has been done to compare automatic coding using NLP techniques and qualitative coding, which is critical to establish the viability of NLP as a useful, rigorous analysis procedure. Objective The purpose of this study was to compare the utility of a traditional qualitative text analysis, an NLP analysis, and an augmented approach that combines qualitative and NLP methods. Methods We conducted a 2-arm cross-over experiment to compare qualitative and NLP approaches to analyze data generated through 2 text (short message service) message survey questions, one about prescription drugs and the other about police interactions, sent to youth aged 14-24 years. We randomly assigned a question to each of the 2 experienced qualitative analysis teams for independent coding and analysis before receiving NLP results. A third team separately conducted NLP analysis of the same 2 questions. We examined the results of our analyses to compare (1) the similarity of findings derived, (2) the quality of inferences generated, and (3) the time spent in analysis. Results The qualitative-only analysis for the drug question (n=58) yielded 4 major findings, whereas the NLP analysis yielded 3 findings that missed contextual elements. The qualitative and NLP-augmented analysis was the most comprehensive. For the police question (n=68), the qualitative-only analysis yielded 4 primary findings and the NLP-only analysis yielded 4 slightly different findings. Again, the augmented qualitative and NLP analysis was the most comprehensive and produced the highest quality inferences, increasing our depth of understanding (ie, details and frequencies). In terms of time, the NLP-only approach was quicker than the qualitative-only approach for the drug (120 vs 270 minutes) and police (40 vs 270 minutes) questions. An approach beginning with qualitative analysis followed by qualitative- or NLP-augmented analysis took longer time than that beginning with NLP for both drug (450 vs 240 minutes) and police (390 vs 220 minutes) questions. Conclusions NLP provides both a foundation to code qualitatively more quickly and a method to validate qualitative findings. NLP methods were able to identify major themes found with traditional qualitative analysis but were not useful in identifying nuances. Traditional qualitative text analysis added important details and context.
Collapse
Affiliation(s)
- Timothy C Guetterman
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Tammy Chang
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States.,Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI, United States
| | - Melissa DeJonckheere
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Tanmay Basu
- Ramakrishna Mission Vivekananda Educational and Research Institute, Belur Math, West Bengal, India
| | - Elizabeth Scruggs
- Department of Internal Medicine-Pediatrics, University of Michigan, Ann Arbor, MI, United States
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, Medical School, University of Michigan, Ann Arbor, MI, United States.,School of Information, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
26
|
Patel YR, Robbins JM, Kurgansky KE, Imran T, Orkaby AR, McLean RR, Ho YL, Cho K, Michael Gaziano J, Djousse L, Gagnon DR, Joseph J. Development and validation of a heart failure with preserved ejection fraction cohort using electronic medical records. BMC Cardiovasc Disord 2018; 18:128. [PMID: 29954337 PMCID: PMC6022342 DOI: 10.1186/s12872-018-0866-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 06/20/2018] [Indexed: 01/14/2023] Open
Abstract
Background Heart failure (HF) with preserved ejection fraction (HFpEF) comprises nearly half of prevalent HF, yet is challenging to curate in a large database of electronic medical records (EMR) since it requires both accurate HF diagnosis and left ventricular ejection fraction (EF) values to be consistently ≥50%. Methods We used the national Veterans Affairs EMR to curate a cohort of HFpEF patients from 2002 to 2014. EF values were extracted from clinical documents utilizing natural language processing and an iterative approach was used to refine the algorithm for verification of clinical HFpEF. The final algorithm utilized the following inclusion criteria: any International Classification of Diseases-9 (ICD-9) code of HF (428.xx); all recorded EF ≥50%; and either B-type natriuretic peptide (BNP) or aminoterminal pro-BNP (NT-proBNP) values recorded OR diuretic use within one month of diagnosis of HF. Validation of the algorithm was performed by 3 independent reviewers doing manual chart review of 100 HFpEF cases and 100 controls. Results We established a HFpEF cohort of 80,248 patients (out of a total 1,155,376 patients with the ICD-9 diagnosis of HF). Mean age was 72 years; 96% were males and 12% were African-Americans. Validation analysis of the HFpEF algorithm had a sensitivity of 88%, specificity of 96%, positive predictive value of 96%, and a negative predictive value of 87% to identify HFpEF cases. Conclusion We developed a sensitive, highly specific algorithm for detecting HFpEF in a large national database. This approach may be applicable to other large EMR databases to identify HFpEF patients.
Collapse
Affiliation(s)
- Yash R Patel
- Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Mount Sinai St Luke's & Mount Sinai West Hospitals, New York, NY, USA
| | - Jeremy M Robbins
- Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Division of Cardiology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | - Katherine E Kurgansky
- Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA
| | - Tasnim Imran
- Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Boston Medical Center, Boston University School of Medicine, Boston, MA, USA
| | - Ariela R Orkaby
- Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Geriatric Research, Education and Clinical Center (GRECC), Veterans Affairs Boston Healthcare System, Boston, MA, USA
| | - Robert R McLean
- Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Institute for Aging Research, Hebrew SeniorLife, Boston, MA, USA.,Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA
| | - Kelly Cho
- Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA
| | - J Michael Gaziano
- Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Luc Djousse
- Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Department of Biostatistics, Boston University School of Public Health, Boston, USA
| | - David R Gagnon
- Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Department of Biostatistics, Boston University School of Public Health, Boston, USA
| | - Jacob Joseph
- Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA. .,Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA. .,Cardiology Section, VA Boston Healthcare System, 1400 VFW Parkway, West Roxbury, MA, 02132, USA.
| |
Collapse
|
27
|
Guo Y, Zheng G, Fu T, Hao S, Ye C, Zheng L, Liu M, Xia M, Jin B, Zhu C, Wang O, Wu Q, Culver DS, Alfreds ST, Stearns F, Kanov L, Bhatia A, Sylvester KG, Widen E, McElhinney DB, Ling XB. Assessing Statewide All-Cause Future One-Year Mortality: Prospective Study With Implications for Quality of Life, Resource Utilization, and Medical Futility. J Med Internet Res 2018; 20:e10311. [PMID: 29866643 PMCID: PMC6066632 DOI: 10.2196/10311] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 04/24/2018] [Accepted: 04/26/2018] [Indexed: 01/19/2023] Open
Abstract
Background For many elderly patients, a disproportionate amount of health care resources and expenditures is spent during the last year of life, despite the discomfort and reduced quality of life associated with many aggressive medical approaches. However, few prognostic tools have focused on predicting all-cause 1-year mortality among elderly patients at a statewide level, an issue that has implications for improving quality of life while distributing scarce resources fairly. Objective Using data from a statewide elderly population (aged ≥65 years), we sought to prospectively validate an algorithm to identify patients at risk for dying in the next year for the purpose of minimizing decision uncertainty, improving quality of life, and reducing futile treatment. Methods Analysis was performed using electronic medical records from the Health Information Exchange in the state of Maine, which covered records of nearly 95% of the statewide population. The model was developed from 125,896 patients aged at least 65 years who were discharged from any care facility in the Health Information Exchange network from September 5, 2013, to September 4, 2015. Validation was conducted using 153,199 patients with same inclusion and exclusion criteria from September 5, 2014, to September 4, 2016. Patients were stratified into risk groups. The association between all-cause 1-year mortality and risk factors was screened by chi-squared test and manually reviewed by 2 clinicians. We calculated risk scores for individual patients using a gradient tree-based boost algorithm, which measured the probability of mortality within the next year based on the preceding 1-year clinical profile. Results The development sample included 125,896 patients (72,572 women, 57.64%; mean 74.2 [SD 7.7] years). The final validation cohort included 153,199 patients (88,177 women, 57.56%; mean 74.3 [SD 7.8] years). The c-statistic for discrimination was 0.96 (95% CI 0.93-0.98) in the development group and 0.91 (95% CI 0.90-0.94) in the validation cohort. The mortality was 0.99% in the low-risk group, 16.75% in the intermediate-risk group, and 72.12% in the high-risk group. A total of 99 independent risk factors (n=99) for mortality were identified (reported as odds ratios; 95% CI). Age was on the top of list (1.41; 1.06-1.48); congestive heart failure (20.90; 15.41-28.08) and different tumor sites were also recognized as driving risk factors, such as cancer of the ovaries (14.42; 2.24-53.04), colon (14.07; 10.08-19.08), and stomach (13.64; 3.26-86.57). Disparities were also found in patients’ social determinants like respiratory hazard index (1.24; 0.92-1.40) and unemployment rate (1.18; 0.98-1.24). Among high-risk patients who expired in our dataset, cerebrovascular accident, amputation, and type 1 diabetes were the top 3 diseases in terms of average cost in the last year of life. Conclusions Our study prospectively validated an accurate 1-year risk prediction model and stratification for the elderly population (≥65 years) at risk of mortality with statewide electronic medical record datasets. It should be a valuable adjunct for helping patients to make better quality-of-life choices and alerting care givers to target high-risk elderly for appropriate care and discussions, thus cutting back on futile treatment.
Collapse
Affiliation(s)
- Yanting Guo
- School of Management, Zhejiang University, Hangzhou, China.,Department of Surgery, Stanford University, Stanford, CA, United States
| | - Gang Zheng
- School of Management, Zhejiang University, Hangzhou, China
| | - Tianyun Fu
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Shiying Hao
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Chengyin Ye
- Department of Surgery, Stanford University, Stanford, CA, United States.,Department of Health Management, Hangzhou Normal University, Hangzhou, China
| | - Le Zheng
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Modi Liu
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Minjie Xia
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Bo Jin
- HBI Solutions Inc, Palo Alto, CA, United States
| | | | - Oliver Wang
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Qian Wu
- Department of Surgery, Stanford University, Stanford, CA, United States.,China Electric Power Research Institute, Beijing, China
| | | | | | | | - Laura Kanov
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Ajay Bhatia
- Department of Pediatrics, Stanford University, Stanford, CA, United States
| | - Karl G Sylvester
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Eric Widen
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Doff B McElhinney
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Xuefeng Bruce Ling
- Department of Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States.,Department of Epidemiology and Health Statistics, School of Public Health, School of Medicine, Zhejiang University, Hangzhou, China
| |
Collapse
|
28
|
Shim H, Ailshire J, Zelinski E, Crimmins E. The Health and Retirement Study: Analysis of Associations Between Use of the Internet for Health Information and Use of Health Services at Multiple Time Points. J Med Internet Res 2018; 20:e200. [PMID: 29802088 PMCID: PMC5993973 DOI: 10.2196/jmir.8203] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Revised: 12/14/2017] [Accepted: 04/11/2018] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The use of the internet for health information among older people is receiving increasing attention, but how it is associated with chronic health conditions and health service use at concurrent and subsequent time points using nationally representative data is less known. OBJECTIVE This study aimed to determine whether the use of the internet for health information is associated with health service utilization and whether the association is affected by specific health conditions. METHODS The study used data collected in a technology module from a nationally representative sample of community-dwelling older Americans aged 52 years and above from the 2012 Health and Retirement Study (HRS; N=991). Negative binomial regressions were used to examine the association between use of Web-based health information and the reported health service uses in 2012 and 2014. Analyses included additional covariates adjusting for predisposing, enabling, and need factors. Interactions between the use of the internet for health information and chronic health conditions were also tested. RESULTS A total of 48.0% (476/991) of Americans aged 52 years and above reported using Web-based health information. The use of Web-based health information was positively associated with the concurrent reports of doctor visits, but not over 2 years. However, an interaction of using Web-based health information with diabetes showed that users had significantly fewer doctor visits compared with nonusers with diabetes at both times. CONCLUSIONS The use of the internet for health information was associated with higher health service use at the concurrent time, but not at the subsequent time. The interaction between the use of the internet for health information and diabetes was significant at both time points, which suggests that health-related internet use may be associated with fewer doctor visits for certain chronic health conditions. Results provide some insight into how Web-based health information may provide an alternative health care resource for managing chronic conditions.
Collapse
Affiliation(s)
- Hyunju Shim
- USC Davis School of Gerontology, University of Southern California, Los Angeles, CA, United States
| | - Jennifer Ailshire
- USC Davis School of Gerontology, University of Southern California, Los Angeles, CA, United States
| | - Elizabeth Zelinski
- USC Davis School of Gerontology, University of Southern California, Los Angeles, CA, United States
| | - Eileen Crimmins
- USC Davis School of Gerontology, University of Southern California, Los Angeles, CA, United States
| |
Collapse
|
29
|
Chen X, Xie H, Wang FL, Liu Z, Xu J, Hao T. A bibliometric analysis of natural language processing in medical research. BMC Med Inform Decis Mak 2018; 18:14. [PMID: 29589569 PMCID: PMC5872501 DOI: 10.1186/s12911-018-0594-x] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Background Natural language processing (NLP) has become an increasingly significant role in advancing medicine. Rich research achievements of NLP methods and applications for medical information processing are available. It is of great significance to conduct a deep analysis to understand the recent development of NLP-empowered medical research field. However, limited study examining the research status of this field could be found. Therefore, this study aims to quantitatively assess the academic output of NLP in medical research field. Methods We conducted a bibliometric analysis on NLP-empowered medical research publications retrieved from PubMed in the period 2007–2016. The analysis focused on three aspects. Firstly, the literature distribution characteristics were obtained with a statistics analysis method. Secondly, a network analysis method was used to reveal scientific collaboration relations. Finally, thematic discovery and evolution was reflected using an affinity propagation clustering method. Results There were 1405 NLP-empowered medical research publications published during the 10 years with an average annual growth rate of 18.39%. 10 most productive publication sources together contributed more than 50% of the total publications. The USA had the highest number of publications. A moderately significant correlation between country’s publications and GDP per capita was revealed. Denny, Joshua C was the most productive author. Mayo Clinic was the most productive affiliation. The annual co-affiliation and co-country rates reached 64.04% and 15.79% in 2016, respectively. 10 main great thematic areas were identified including Computational biology, Terminology mining, Information extraction, Text classification, Social medium as data source, Information retrieval, etc. Conclusions A bibliometric analysis of NLP-empowered medical research publications for uncovering the recent research status is presented. The results can assist relevant researchers, especially newcomers in understanding the research development systematically, seeking scientific cooperation partners, optimizing research topic choices and monitoring new scientific or technological activities.
Collapse
Affiliation(s)
- Xieling Chen
- College of Economics, Jinan University, Guangzhou, China
| | - Haoran Xie
- Department of Mathematics and Information Technology, The Education University of Hong Kong, Hong Kong, Hong Kong, Special Administrative Region of China
| | - Fu Lee Wang
- School of Science and Technology, The Open University of Hong Kong, Hong Kong, Hong Kong, Special Administrative Region of China
| | - Ziqing Liu
- The Second Clinical Medical College, Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Juan Xu
- The Research Institute of National Supervision and Audit Law, Nanjing Audit University, Nanjing, China
| | - Tianyong Hao
- School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China. .,School of Computer, South China Normal University, Guangzhou, China.
| |
Collapse
|
30
|
Afzal N, Mallipeddi VP, Sohn S, Liu H, Chaudhry R, Scott CG, Kullo IJ, Arruda-Olson AM. Natural language processing of clinical notes for identification of critical limb ischemia. Int J Med Inform 2018; 111:83-89. [PMID: 29425639 PMCID: PMC5808583 DOI: 10.1016/j.ijmedinf.2017.12.024] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 12/17/2017] [Accepted: 12/27/2017] [Indexed: 12/27/2022]
Abstract
BACKGROUND Critical limb ischemia (CLI) is a complication of advanced peripheral artery disease (PAD) with diagnosis based on the presence of clinical signs and symptoms. However, automated identification of cases from electronic health records (EHRs) is challenging due to absence of a single definitive International Classification of Diseases (ICD-9 or ICD-10) code for CLI. METHODS AND RESULTS In this study, we extend a previously validated natural language processing (NLP) algorithm for PAD identification to develop and validate a subphenotyping NLP algorithm (CLI-NLP) for identification of CLI cases from clinical notes. We compared performance of the CLI-NLP algorithm with CLI-related ICD-9 billing codes. The gold standard for validation was human abstraction of clinical notes from EHRs. Compared to billing codes the CLI-NLP algorithm had higher positive predictive value (PPV) (CLI-NLP 96%, billing codes 67%, p < 0.001), specificity (CLI-NLP 98%, billing codes 74%, p < 0.001) and F1-score (CLI-NLP 90%, billing codes 76%, p < 0.001). The sensitivity of these two methods was similar (CLI-NLP 84%; billing codes 88%; p < 0.12). CONCLUSIONS The CLI-NLP algorithm for identification of CLI from narrative clinical notes in an EHR had excellent PPV and has potential for translation to patient care as it will enable automated identification of CLI cases for quality projects, clinical decision support tools and support a learning healthcare system.
Collapse
Affiliation(s)
- Naveed Afzal
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Vishnu Priya Mallipeddi
- Department of Cardiovascular Diseases, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Rajeev Chaudhry
- Division of Primary Care Medicine, Knowledge Delivery Center and Center for Innovation, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Christopher G Scott
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Iftikhar J Kullo
- Department of Cardiovascular Diseases, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Adelaide M Arruda-Olson
- Department of Cardiovascular Diseases, Mayo Clinic and Mayo Foundation, Rochester, MN, United States.
| |
Collapse
|
31
|
Duncan I, Fitzner K, Handmaker KE. Augmented Intelligence: Enhancing the Roles of Health Actuaries and Health Economists for Population Health Management. Popul Health Manag 2017; 21:341-343. [PMID: 29064330 DOI: 10.1089/pop.2017.0146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ian Duncan
- 1 Department of Statistics and Applied Probability, University of California Santa Barbara , Santa Barbara, California
| | | | | |
Collapse
|
32
|
Wi CI, Sohn S, Rolfes MC, Seabright A, Ryu E, Voge G, Bachman KA, Park MA, Kita H, Croghan IT, Liu H, Juhn YJ. Application of a Natural Language Processing Algorithm to Asthma Ascertainment. An Automated Chart Review. Am J Respir Crit Care Med 2017; 196:430-437. [PMID: 28375665 DOI: 10.1164/rccm.201610-2006oc] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
RATIONALE Difficulty of asthma ascertainment and its associated methodologic heterogeneity have created significant barriers to asthma care and research. OBJECTIVES We evaluated the validity of an existing natural language processing (NLP) algorithm for asthma criteria to enable an automated chart review using electronic medical records (EMRs). METHODS The study was designed as a retrospective birth cohort study using a random sample of 500 subjects from the 1997-2007 Mayo Birth Cohort who were born at Mayo Clinic and enrolled in primary pediatric care at Mayo Clinic Rochester. Performance of NLP-based asthma ascertainment using predetermined asthma criteria was assessed by determining both criterion validity (chart review of EMRs by abstractor as a gold standard) and construct validity (association with known risk factors for asthma, such as allergic rhinitis). MEASUREMENTS AND MAIN RESULTS After excluding three subjects whose respiratory symptoms could be attributed to other conditions (e.g., tracheomalacia), among the remaining eligible 497 subjects, 51% were male, 77% white persons, and the median age at last follow-up date was 11.5 years. The asthma prevalence was 31% in the study cohort. Sensitivity, specificity, positive predictive value, and negative predictive value for NLP algorithm in predicting asthma status were 97%, 95%, 90%, and 98%, respectively. The risk factors for asthma (e.g., allergic rhinitis) that were identified either by NLP or the abstractor were the same. CONCLUSIONS Asthma ascertainment through NLP should be considered in the era of EMRs because it can enable large-scale clinical studies in a more time-efficient manner and improve the recognition and care of childhood asthma in practice.
Collapse
Affiliation(s)
- Chung-Il Wi
- 1 Department of Pediatric and Adolescent Medicine.,2 Asthma Epidemiology Research Unit
| | - Sunghwan Sohn
- 3 Division of Biomedical Statistics and Informatics, and
| | - Mary C Rolfes
- 2 Asthma Epidemiology Research Unit.,4 Mayo Medical School, Rochester, Minnesota
| | | | - Euijung Ryu
- 3 Division of Biomedical Statistics and Informatics, and
| | - Gretchen Voge
- 1 Department of Pediatric and Adolescent Medicine.,2 Asthma Epidemiology Research Unit.,5 Division of Neonatology, Children's Hospitals and Clinics of Minnesota, Minneapolis, Minnesota; and
| | - Kay A Bachman
- 6 Division of Allergic Diseases, Mayo Clinic, Mayo Clinic, Rochester, Minnesota
| | - Miguel A Park
- 6 Division of Allergic Diseases, Mayo Clinic, Mayo Clinic, Rochester, Minnesota
| | - Hirohito Kita
- 6 Division of Allergic Diseases, Mayo Clinic, Mayo Clinic, Rochester, Minnesota
| | - Ivana T Croghan
- 7 Department of Medicine Research, Mayo Clinic, Rochester, Minnesota
| | - Hongfang Liu
- 3 Division of Biomedical Statistics and Informatics, and
| | - Young J Juhn
- 1 Department of Pediatric and Adolescent Medicine.,2 Asthma Epidemiology Research Unit
| |
Collapse
|
33
|
Névéol A, Zweigenbaum P. Making Sense of Big Textual Data for Health Care: Findings from the Section on Clinical Natural Language Processing. Yearb Med Inform 2017; 26:228-234. [PMID: 29063569 PMCID: PMC6239234 DOI: 10.15265/iy-2017-027] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Indexed: 02/01/2023] Open
Abstract
Objectives: To summarize recent research and present a selection of the best papers published in 2016 in the field of clinical Natural Language Processing (NLP). Method: A survey of the literature was performed by the two section editors of the IMIA Yearbook NLP section. Bibliographic databases were searched for papers with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. Papers were automatically ranked and then manually reviewed based on titles and abstracts. A shortlist of candidate best papers was first selected by the section editors before being peer-reviewed by independent external reviewers. Results: The five clinical NLP best papers provide a contribution that ranges from emerging original foundational methods to transitioning solid established research results to a practical clinical setting. They offer a framework for abbreviation disambiguation and coreference resolution, a classification method to identify clinically useful sentences, an analysis of counseling conversations to improve support to patients with mental disorder and grounding of gradable adjectives. Conclusions: Clinical NLP continued to thrive in 2016, with an increasing number of contributions towards applications compared to fundamental methods. Fundamental work addresses increasingly complex problems such as lexical semantics, coreference resolution, and discourse analysis. Research results translate into freely available tools, mainly for English.
Collapse
Affiliation(s)
- A. Névéol
- LIMSI, CNRS, Université Paris Saclay, Orsay, France
| | | | | |
Collapse
|
34
|
Hao S, Fu T, Wu Q, Jin B, Zhu C, Hu Z, Guo Y, Zhang Y, Yu Y, Fouts T, Ng P, Culver DS, Alfreds ST, Stearns F, Sylvester KG, Widen E, McElhinney DB, Ling XB. Estimating One-Year Risk of Incident Chronic Kidney Disease: Retrospective Development and Validation Study Using Electronic Medical Record Data From the State of Maine. JMIR Med Inform 2017; 5:e21. [PMID: 28747298 PMCID: PMC5550735 DOI: 10.2196/medinform.7954] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 06/29/2017] [Accepted: 07/10/2017] [Indexed: 01/28/2023] Open
Abstract
Background Chronic kidney disease (CKD) is a major public health concern in the United States with high prevalence, growing incidence, and serious adverse outcomes. Objective We aimed to develop and validate a model to identify patients at risk of receiving a new diagnosis of CKD (incident CKD) during the next 1 year in a general population. Methods The study population consisted of patients who had visited any care facility in the Maine Health Information Exchange network any time between January 1, 2013, and December 31, 2015, and had no history of CKD diagnosis. Two retrospective cohorts of electronic medical records (EMRs) were constructed for model derivation (N=1,310,363) and validation (N=1,430,772). The model was derived using a gradient tree-based boost algorithm to assign a score to each individual that measured the probability of receiving a new diagnosis of CKD from January 1, 2014, to December 31, 2014, based on the preceding 1-year clinical profile. A feature selection process was conducted to reduce the dimension of the data from 14,680 EMR features to 146 as predictors in the final model. Relative risk was calculated by the model to gauge the risk ratio of the individual to population mean of receiving a CKD diagnosis in next 1 year. The model was tested on the validation cohort to predict risk of CKD diagnosis in the period from January 1, 2015, to December 31, 2015, using the preceding 1-year clinical profile. Results The final model had a c-statistic of 0.871 in the validation cohort. It stratified patients into low-risk (score 0-0.005), intermediate-risk (score 0.005-0.05), and high-risk (score ≥ 0.05) levels. The incidence of CKD in the high-risk patient group was 7.94%, 13.7 times higher than the incidence in the overall cohort (0.58%). Survival analysis showed that patients in the 3 risk categories had significantly different CKD outcomes as a function of time (P<.001), indicating an effective classification of patients by the model. Conclusions We developed and validated a model that is able to identify patients at high risk of having CKD in the next 1 year by statistically learning from the EMR-based clinical history in the preceding 1 year. Identification of these patients indicates care opportunities such as monitoring and adopting intervention plans that may benefit the quality of care and outcomes in the long term.
Collapse
Affiliation(s)
- Shiying Hao
- Department of Epidemiology and Health Statistics, School of Public Health, School of Medicine, Zhejiang University, Hangzhou, China.,Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Tianyun Fu
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Qian Wu
- Department of Surgery, Stanford University, Stanford, CA, United States.,China Electric Power Research Institute, Beijing, China
| | - Bo Jin
- HBI Solutions Inc, Palo Alto, CA, United States
| | | | - Zhongkai Hu
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Yanting Guo
- Department of Surgery, Stanford University, Stanford, CA, United States.,School of Management, Zhejiang University, Hangzhou, China
| | - Yan Zhang
- Department of Surgery, Stanford University, Stanford, CA, United States.,Department of Oncology, The First Hospital of Shijiazhuang, Shijiazhuang, China
| | - Yunxian Yu
- Department of Epidemiology and Health Statistics, School of Public Health, School of Medicine, Zhejiang University, Hangzhou, China
| | - Terry Fouts
- Empactful Capital, San Francisco, CA, United States
| | - Phillip Ng
- Sequoia Hospital, Redwood City, CA, United States
| | | | | | | | - Karl G Sylvester
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Eric Widen
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Doff B McElhinney
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Xuefeng B Ling
- Department of Epidemiology and Health Statistics, School of Public Health, School of Medicine, Zhejiang University, Hangzhou, China.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States.,Department of Surgery, Stanford University, Stanford, CA, United States
| |
Collapse
|
35
|
Defining and characterizing the critical transition state prior to the type 2 diabetes disease. PLoS One 2017; 12:e0180937. [PMID: 28686739 PMCID: PMC5501620 DOI: 10.1371/journal.pone.0180937] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 06/24/2017] [Indexed: 11/19/2022] Open
Abstract
Background Type 2 diabetes mellitus (T2DM), with increased risk of serious long-term complications, currently represents 8.3% of the adult population. We hypothesized that a critical transition state prior to the new onset T2DM can be revealed through the longitudinal electronic medical record (EMR) analysis. Method We applied the transition-based network entropy methodology which previously identified a dynamic driver network (DDN) underlying the critical T2DM transition at the tissue molecular biological level. To profile pre-disease phenotypical changes that indicated a critical transition state, a cohort of 7,334 patients was assembled from the Maine State Health Information Exchange (HIE). These patients all had their first confirmative diagnosis of T2DM between January 1, 2013 and June 30, 2013. The cohort’s EMRs from the 24 months preceding their date of first T2DM diagnosis were extracted. Results Analysis of these patients’ pre-disease clinical history identified a dynamic driver network (DDN) and an associated critical transition state six months prior to their first confirmative T2DM state. Conclusions This 6-month window before the disease state provides an early warning of the impending T2DM, warranting an opportunity to apply proactive interventions to prevent or delay the new onset of T2DM.
Collapse
|