Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zheng L, Wang Y, Hao S, Shin AY, Jin B, Ngo AD, Jackson-Browne MS, Feller DJ, Fu T, Zhang K, Zhou X, Zhu C, Dai D, Yu Y, Zheng G, Li YM, McElhinney DB, Culver DS, Alfreds ST, Stearns F, Sylvester KG, Widen E, Ling XB. Web-based Real-Time Case Finding for the Population Health Management of Patients With Diabetes Mellitus: A Prospective Validation of the Natural Language Processing-Based Algorithm With Statewide Electronic Medical Records. JMIR Med Inform 2016;4:e37. [PMID: 27836816 PMCID: PMC5124114 DOI: 10.2196/medinform.6328] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Revised: 10/01/2016] [Accepted: 10/12/2016] [Indexed: 02/06/2023] Open

For:	Zheng L, Wang Y, Hao S, Shin AY, Jin B, Ngo AD, Jackson-Browne MS, Feller DJ, Fu T, Zhang K, Zhou X, Zhu C, Dai D, Yu Y, Zheng G, Li YM, McElhinney DB, Culver DS, Alfreds ST, Stearns F, Sylvester KG, Widen E, Ling XB. Web-based Real-Time Case Finding for the Population Health Management of Patients With Diabetes Mellitus: A Prospective Validation of the Natural Language Processing-Based Algorithm With Statewide Electronic Medical Records. JMIR Med Inform 2016;4:e37. [PMID: 27836816 PMCID: PMC5124114 DOI: 10.2196/medinform.6328] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Revised: 10/01/2016] [Accepted: 10/12/2016] [Indexed: 02/06/2023] Open

Number

Cited by Other Article(s)

Guo Q, Fu B, Tian Y, Xu S, Meng X. Recent progress in artificial intelligence and machine learning for novel diabetes mellitus medications development. Curr Med Res Opin 2024;40:1483-1493. [PMID: 39083361 DOI: 10.1080/03007995.2024.2387187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 07/29/2024] [Indexed: 08/02/2024]

Abstract

Diabetes mellitus, stemming from either insulin resistance or inadequate insulin secretion, represents a complex ailment that results in prolonged hyperglycemia and severe complications. Patients endure severe ramifications such as kidney disease, vision impairment, cardiovascular disorders, and susceptibility to infections, leading to significant physical suffering and imposing substantial socio-economic burdens. This condition has evolved into an increasingly severe health crisis. There is an urgent need to develop new treatments with improved efficacy and fewer adverse effects to meet clinical demands. However, novel drug development is costly, time-consuming, and often associated with side effects and suboptimal efficacy, making it a major challenge. Artificial Intelligence (AI) and Machine Learning (ML) have revolutionized drug development across its comprehensive lifecycle, spanning drug discovery, preclinical studies, clinical trials, and post-market surveillance. These technologies have significantly accelerated the identification of promising therapeutic candidates, optimized trial designs, and enhanced post-approval safety monitoring. Recent advances in AI, including data augmentation, interpretable AI, and integration of AI with traditional experimental methods, offer promising strategies for overcoming the challenges inherent in AI-based drug discovery. Despite these advancements, there exists a notable gap in comprehensive reviews detailing AI and ML applications throughout the entirety of developing medications for diabetes mellitus. This review aims to fill this gap by evaluating the impact and potential of AI and ML technologies at various stages of diabetes mellitus drug development. It does that by synthesizing current research findings and technological advances so as to effectively control diabetes mellitus and mitigate its far-reaching social and economic impacts. The integration of AI and ML promises to revolutionize diabetes mellitus treatment strategies, offering hope for improved patient outcomes and reduced healthcare burdens worldwide.

Collapse

Petit-Jean T, Gérardin C, Berthelot E, Chatellier G, Frank M, Tannier X, Kempf E, Bey R. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. J Am Med Inform Assoc 2024;31:1280-1290. [PMID: 38573195 PMCID: PMC11105139 DOI: 10.1093/jamia/ocae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 02/28/2024] [Accepted: 03/13/2024] [Indexed: 04/05/2024] Open

Towler L, Bondaronek P, Papakonstantinou T, Amlôt R, Chadborn T, Ainsworth B, Yardley L. Applying machine-learning to rapidly analyze large qualitative text datasets to inform the COVID-19 pandemic response: comparing human and machine-assisted topic analysis techniques. Front Public Health 2023;11:1268223. [PMID: 38026376 PMCID: PMC10644111 DOI: 10.3389/fpubh.2023.1268223] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open

Abstract

Introduction

Machine-assisted topic analysis (MATA) uses artificial intelligence methods to help qualitative researchers analyze large datasets. This is useful for researchers to rapidly update healthcare interventions during changing healthcare contexts, such as a pandemic. We examined the potential to support healthcare interventions by comparing MATA with "human-only" thematic analysis techniques on the same dataset (1,472 user responses from a COVID-19 behavioral intervention).

Methods

In MATA, an unsupervised topic-modeling approach identified latent topics in the text, from which researchers identified broad themes. In human-only codebook analysis, researchers developed an initial codebook based on previous research that was applied to the dataset by the team, who met regularly to discuss and refine the codes. Formal triangulation using a "convergence coding matrix" compared findings between methods, categorizing them as "agreement", "complementary", "dissonant", or "silent".

Results

Human analysis took much longer than MATA (147.5 vs. 40 h). Both methods identified key themes about what users found helpful and unhelpful. Formal triangulation showed both sets of findings were highly similar. The formal triangulation showed high similarity between the findings. All MATA codes were classified as in agreement or complementary to the human themes. When findings differed slightly, this was due to human researcher interpretations or nuance from human-only analysis.

Discussion

Results produced by MATA were similar to human-only thematic analysis, with substantial time savings. For simple analyses that do not require an in-depth or subtle understanding of the data, MATA is a useful tool that can support qualitative researchers to interpret and analyze large datasets quickly. This approach can support intervention development and implementation, such as enabling rapid optimization during public health emergencies.

Collapse

Borna S, Maniaci MJ, Haider CR, Maita KC, Torres-Guzman RA, Avila FR, Lunde JJ, Coffey JD, Demaerschalk BM, Forte AJ. Artificial Intelligence Models in Health Information Exchange: A Systematic Review of Clinical Implications. Healthcare (Basel) 2023;11:2584. [PMID: 37761781 PMCID: PMC10531020 DOI: 10.3390/healthcare11182584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/14/2023] [Accepted: 09/16/2023] [Indexed: 09/29/2023] Open

Pethani F, Dunn AG. Natural language processing for clinical notes in dentistry: A systematic review. J Biomed Inform 2023;138:104282. [PMID: 36623780 DOI: 10.1016/j.jbi.2023.104282] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 12/01/2022] [Accepted: 01/04/2023] [Indexed: 01/09/2023]

Wang L, Zhang Y, Chignell M, Shan B, Sheehan KA, Razak F, Verma A. Boosting Delirium Identification Accuracy With Sentiment-Based Natural Language Processing: Mixed Methods Study. JMIR Med Inform 2022;10:e38161. [PMID: 36538363 PMCID: PMC9812273 DOI: 10.2196/38161] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 08/22/2022] [Accepted: 09/19/2022] [Indexed: 01/07/2023] Open

Abstract

BACKGROUND

Delirium is an acute neurocognitive disorder that affects up to half of older hospitalized medical patients and can lead to dementia, longer hospital stays, increased health costs, and death. Although delirium can be prevented and treated, it is difficult to identify and predict.

OBJECTIVE

This study aimed to improve machine learning models that retrospectively identify the presence of delirium during hospital stays (eg, to measure the effectiveness of delirium prevention interventions) by using the natural language processing (NLP) technique of sentiment analysis (in this case a feature that identifies sentiment toward, or away from, a delirium diagnosis).

METHODS

Using data from the General Medicine Inpatient Initiative, a Canadian hospital data and analytics network, a detailed manual review of medical records was conducted from nearly 4000 admissions at 6 Toronto area hospitals. Furthermore, 25.74% (994/3862) of the eligible hospital admissions were labeled as having delirium. Using the data set collected from this study, we developed machine learning models with, and without, the benefit of NLP methods applied to diagnostic imaging reports, and we asked the question "can NLP improve machine learning identification of delirium?"

RESULTS

Among the eligible 3862 hospital admissions, 994 (25.74%) admissions were labeled as having delirium. Identification and calibration of the models were satisfactory. The accuracy and area under the receiver operating characteristic curve of the main model with NLP in the independent testing data set were 0.807 and 0.930, respectively. The accuracy and area under the receiver operating characteristic curve of the main model without NLP in the independent testing data set were 0.811 and 0.869, respectively. Model performance was also found to be stable over the 5-year period used in the experiment, with identification for a likely future holdout test set being no worse than identification for retrospective holdout test sets.

CONCLUSIONS

Our machine learning model that included NLP (ie, sentiment analysis in medical image description text mining) produced valid identification of delirium with the sentiment analysis, providing significant additional benefit over the model without NLP.

Collapse

Kuo HC, Hao S, Jin B, Chou CJ, Han Z, Chang LS, Huang YH, Hwa K, Whitin JC, Sylvester KG, Reddy CD, Chubb H, Ceresnak SR, Kanegaye JT, Tremoulet AH, Burns JC, McElhinney D, Cohen HJ, Ling XB. Single center blind testing of a US multi-center validated diagnostic algorithm for Kawasaki disease in Taiwan. Front Immunol 2022;13:1031387. [PMID: 36263040 PMCID: PMC9575935 DOI: 10.3389/fimmu.2022.1031387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open

Affiliation(s)

Ho-Chang Kuo Kawasaki Disease Center, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan Department of Pediatrics, Chang Gung University College of Medicine, Kaohsiung, Taiwan *Correspondence: Xuefeng B. Ling, ;Ho-Chang Kuo,
Shiying Hao School of Medicine, Stanford University, Stanford, CA, United States
Bo Jin School of Medicine, Stanford University, Stanford, CA, United States
C. James Chou School of Medicine, Stanford University, Stanford, CA, United States
Zhi Han School of Medicine, Stanford University, Stanford, CA, United States
Ling-Sai Chang Kawasaki Disease Center, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan Department of Pediatrics, Chang Gung University College of Medicine, Kaohsiung, Taiwan
Ying-Hsien Huang Kawasaki Disease Center, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan Department of Pediatrics, Chang Gung University College of Medicine, Kaohsiung, Taiwan
Kuoyuan Hwa Center for Biomedical Industry, Department of Molecular Science and Engineering, National Taipei University of Technology, Taipei, Taiwan
John C. Whitin School of Medicine, Stanford University, Stanford, CA, United States
Karl G. Sylvester School of Medicine, Stanford University, Stanford, CA, United States
Charitha D. Reddy School of Medicine, Stanford University, Stanford, CA, United States
Henry Chubb School of Medicine, Stanford University, Stanford, CA, United States
Scott R. Ceresnak School of Medicine, Stanford University, Stanford, CA, United States
John T. Kanegaye Pediatrics, University of California San Diego, San Diego, CA, United States
Adriana H. Tremoulet Pediatrics, University of California San Diego, San Diego, CA, United States
Jane C. Burns Pediatrics, University of California San Diego, San Diego, CA, United States
Doff McElhinney School of Medicine, Stanford University, Stanford, CA, United States
Harvey J. Cohen School of Medicine, Stanford University, Stanford, CA, United States
Xuefeng B. Ling School of Medicine, Stanford University, Stanford, CA, United States *Correspondence: Xuefeng B. Ling, ;Ho-Chang Kuo,

Collapse

Wang S, Song F, Qiao Q, Liu Y, Chen J, Ma J. A Comparative Study of Natural Language Processing Algorithms Based on Cities Changing Diabetes Vulnerability Data. Healthcare (Basel) 2022;10:healthcare10061119. [PMID: 35742169 PMCID: PMC9223144 DOI: 10.3390/healthcare10061119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 06/08/2022] [Accepted: 06/13/2022] [Indexed: 11/16/2022] Open

Multi-label text mining to identify reasons for appointments to drive population health analytics at a primary care setting. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07306-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Montoto C, Gisbert JP, Guerra I, Plaza R, Pajares Villarroya R, Moreno Almazán L, López Martín MDC, Domínguez Antonaya M, Vera Mendoza I, Aparicio J, Martínez V, Tagarro I, Fernandez-Nistal A, Canales L, Menke S, Gomollón F. Evaluation of Natural Language Processing for the Identification of Crohn Disease-Related Variables in Spanish Electronic Health Records: A Validation Study for the PREMONITION-CD Project. JMIR Med Inform 2022;10:e30345. [PMID: 35179507 PMCID: PMC8900906 DOI: 10.2196/30345] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 07/22/2021] [Accepted: 01/02/2022] [Indexed: 12/29/2022] Open

Chen X, Cheng G, Wang FL, Tao X, Xie H, Xu L. Machine and cognitive intelligence for human health: systematic review. Brain Inform 2022;9:5. [PMID: 35150379 PMCID: PMC8840949 DOI: 10.1186/s40708-022-00153-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 01/25/2022] [Indexed: 12/27/2022] Open

Abstract

Brain informatics is a novel interdisciplinary area that focuses on scientifically studying the mechanisms of human brain information processing by integrating experimental cognitive neuroscience with advanced Web intelligence-centered information technologies. Web intelligence, which aims to understand the computational, cognitive, physical, and social foundations of the future Web, has attracted increasing attention to facilitate the study of brain informatics to promote human health. A large number of articles created in the recent few years are proof of the investment in Web intelligence-assisted human health. This study systematically reviews academic studies regarding article trends, top journals, subjects, countries/regions, and institutions, study design, artificial intelligence technologies, clinical tasks, and performance evaluation. Results indicate that literature is especially welcomed in subjects such as medical informatics and health care sciences and service. There are several promising topics, for example, random forests, support vector machines, and conventional neural networks for disease detection and diagnosis, semantic Web, ontology mining, and topic modeling for clinical or biomedical text mining, artificial neural networks and logistic regression for prediction, and convolutional neural networks and support vector machines for monitoring and classification. Additionally, future research should focus on algorithm innovations, additional information use, functionality improvement, model and system generalization, scalability, evaluation, and automation, data acquirement and quality improvement, and allowing interaction. The findings of this study help better understand what and how Web intelligence can be applied to promote healthcare procedures and clinical outcomes. This provides important insights into the effective use of Web intelligence to support informatics-enabled brain studies.

Collapse

Buchlak QD, Esmaili N, Bennett C, Farrokhi F. Natural Language Processing Applications in the Clinical Neurosciences: A Machine Learning Augmented Systematic Review. ACTA NEUROCHIRURGICA. SUPPLEMENT 2022;134:277-289. [PMID: 34862552 DOI: 10.1007/978-3-030-85292-4_32] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Turchin A, Florez Builes LF. Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review. J Diabetes Sci Technol 2021;15:553-560. [PMID: 33736486 PMCID: PMC8120048 DOI: 10.1177/19322968211000831] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Lee S, Doktorchik C, Martin EA, D'Souza AG, Eastwood C, Shaheen AA, Naugler C, Lee J, Quan H. Electronic Medical Record-Based Case Phenotyping for the Charlson Conditions: Scoping Review. JMIR Med Inform 2021;9:e23934. [PMID: 33522976 PMCID: PMC7884219 DOI: 10.2196/23934] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 11/20/2020] [Accepted: 12/05/2020] [Indexed: 12/16/2022] Open

Abstract

Background

Electronic medical records (EMRs) contain large amounts of rich clinical information. Developing EMR-based case definitions, also known as EMR phenotyping, is an active area of research that has implications for epidemiology, clinical care, and health services research.

Objective

This review aims to describe and assess the present landscape of EMR-based case phenotyping for the Charlson conditions.

Methods

A scoping review of EMR-based algorithms for defining the Charlson comorbidity index conditions was completed. This study covered articles published between January 2000 and April 2020, both inclusive. Embase (Excerpta Medica database) and MEDLINE (Medical Literature Analysis and Retrieval System Online) were searched using keywords developed in the following 3 domains: terms related to EMR, terms related to case finding, and disease-specific terms. The manuscript follows the Preferred Reporting Items for Systematic reviews and Meta-analyses extension for Scoping Reviews (PRISMA) guidelines.

Results

A total of 274 articles representing 299 algorithms were assessed and summarized. Most studies were undertaken in the United States (181/299, 60.5%), followed by the United Kingdom (42/299, 14.0%) and Canada (15/299, 5.0%). These algorithms were mostly developed either in primary care (103/299, 34.4%) or inpatient (168/299, 56.2%) settings. Diabetes, congestive heart failure, myocardial infarction, and rheumatology had the highest number of developed algorithms. Data-driven and clinical rule–based approaches have been identified. EMR-based phenotype and algorithm development reflect the data access allowed by respective health systems, and algorithms vary in their performance.

Conclusions

Recognizing similarities and differences in health systems, data collection strategies, extraction, data release protocols, and existing clinical pathways is critical to algorithm development strategies. Several strategies to assist with phenotype-based case definitions have been proposed.

Collapse

Affiliation(s)

Seungwon Lee Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Chelsea Doktorchik Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Elliot Asher Martin Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
Adam Giles D'Souza Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
Cathy Eastwood Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Abdel Aziz Shaheen Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Christopher Naugler Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Pathology and Laboratory Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Joon Lee Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Hude Quan Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada

Collapse

Sai Prashanthi G, Deva A, Vadapalli R, Das AV. Automated Categorization of Systemic Disease and Duration From Electronic Medical Record System Data Using Finite-State Machine Modeling: Prospective Validation Study. JMIR Form Res 2020;4:e24490. [PMID: 33331823 PMCID: PMC7775202 DOI: 10.2196/24490] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 11/12/2020] [Accepted: 11/17/2020] [Indexed: 01/14/2023] Open

Abstract

BACKGROUND

One of the major challenges in the health care sector is that approximately 80% of generated data remains unstructured and unused. Since it is difficult to handle unstructured data from electronic medical record systems, it tends to be neglected for analyses in most hospitals and medical centers. Therefore, there is a need to analyze unstructured big data in health care systems so that we can optimally utilize and unearth all unexploited information from it.

OBJECTIVE

In this study, we aimed to extract a list of diseases and associated keywords along with the corresponding time durations from an indigenously developed electronic medical record system and describe the possibility of analytics from the acquired datasets.

METHODS

We propose a novel, finite-state machine to sequentially detect and cluster disease names from patients' medical history. We defined 3 states in the finite-state machine and transition matrix, which depend on the identified keyword. In addition, we also defined a state-change action matrix, which is essentially an action associated with each transition. The dataset used in this study was obtained from an indigenously developed electronic medical record system called eyeSmart that was implemented across a large, multitier ophthalmology network in India. The dataset included patients' past medical history and contained records of 10,000 distinct patients.

RESULTS

We extracted disease names and associated keywords by using the finite-state machine with an accuracy of 95%, sensitivity of 94.9%, and positive predictive value of 100%. For the extraction of the duration of disease, the machine's accuracy was 93%, sensitivity was 92.9%, and the positive predictive value was 100%.

CONCLUSIONS

We demonstrated that the finite-state machine we developed in this study can be used to accurately identify disease names, associated keywords, and time durations from a large cohort of patient records obtained using an electronic medical record system.

Collapse

Nguyen H, Agu E, Tulu B, Strong D, Mombini H, Pedersen P, Lindsay C, Dunn R, Loretz L. Machine learning models for synthesizing actionable care decisions on lower extremity wounds. SMART HEALTH (AMSTERDAM, NETHERLANDS) 2020;18. [PMID: 33299924 DOI: 10.1016/j.smhl.2020.100139] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

Hendrickx JO, van Gastel J, Leysen H, Martin B, Maudsley S. High-dimensionality Data Analysis of Pharmacological Systems Associated with Complex Diseases. Pharmacol Rev 2020;72:191-217. [PMID: 31843941 DOI: 10.1124/pr.119.017921] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Abstract

It is widely accepted that molecular reductionist views of highly complex human physiologic activity, e.g., the aging process, as well as therapeutic drug efficacy are largely oversimplifications. Currently some of the most effective appreciation of biologic disease and drug response complexity is achieved using high-dimensionality (H-D) data streams from transcriptomic, proteomic, metabolomics, or epigenomic pipelines. Multiple H-D data sets are now common and freely accessible for complex diseases such as metabolic syndrome, cardiovascular disease, and neurodegenerative conditions such as Alzheimer's disease. Over the last decade our ability to interrogate these high-dimensionality data streams has been profoundly enhanced through the development and implementation of highly effective bioinformatic platforms. Employing these computational approaches to understand the complexity of age-related diseases provides a facile mechanism to then synergize this pathologic appreciation with a similar level of understanding of therapeutic-mediated signaling. For informative pathology and drug-based analytics that are able to generate meaningful therapeutic insight across diverse data streams, novel informatics processes such as latent semantic indexing and topological data analyses will likely be important. Elucidation of H-D molecular disease signatures from diverse data streams will likely generate and refine new therapeutic strategies that will be designed with a cognizance of a realistic appreciation of the complexity of human age-related disease and drug effects. We contend that informatic platforms should be synergistic with more advanced chemical/drug and phenotypic cellular/tissue-based analytical predictive models to assist in either de novo drug prioritization or effective repurposing for the intervention of aging-related diseases. SIGNIFICANCE STATEMENT: All diseases, as well as pharmacological mechanisms, are far more complex than previously thought a decade ago. With the advent of commonplace access to technologies that produce large volumes of high-dimensionality data (e.g., transcriptomics, proteomics, metabolomics), it is now imperative that effective tools to appreciate this highly nuanced data are developed. Being able to appreciate the subtleties of high-dimensionality data will allow molecular pharmacologists to develop the most effective multidimensional therapeutics with effectively engineered efficacy profiles.

Collapse

Yu CS, Lin YJ, Lin CH, Lin SY, Wu JL, Chang SS. Development of an Online Health Care Assessment for Preventive Medicine: A Machine Learning Approach. J Med Internet Res 2020;22:e18585. [PMID: 32501272 PMCID: PMC7305560 DOI: 10.2196/18585] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 04/13/2020] [Accepted: 05/14/2020] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

In the era of information explosion, the use of the internet to assist with clinical practice and diagnosis has become a cutting-edge area of research. The application of medical informatics allows patients to be aware of their clinical conditions, which may contribute toward the prevention of several chronic diseases and disorders.

OBJECTIVE

In this study, we applied machine learning techniques to construct a medical database system from electronic medical records (EMRs) of subjects who have undergone health examination. This system aims to provide online self-health evaluation to clinicians and patients worldwide, enabling personalized health and preventive health.

METHODS

We built a medical database system based on the literature, and data preprocessing and cleaning were performed for the database. We utilized both supervised and unsupervised machine learning technology to analyze the EMR data to establish prediction models. The models with EMR databases were then applied to the internet platform.

RESULTS

The validation data were used to validate the online diagnosis prediction system. The accuracy of the prediction model for metabolic syndrome reached 91%, and the area under the receiver operating characteristic (ROC) curve was 0.904 in this system. For chronic kidney disease, the prediction accuracy of the model reached 94.7%, and the area under the ROC curve (AUC) was 0.982. In addition, the system also provided disease diagnosis visualization via clustering, allowing users to check their outcome compared with those in the medical database, enabling increased awareness for a healthier lifestyle.

CONCLUSIONS

Our web-based health care machine learning system allowed users to access online diagnosis predictions and provided a health examination report. Users could understand and review their health status accordingly. In the future, we aim to connect hospitals worldwide with our platform, so that health care practitioners can make diagnoses or provide patient education to remote patients. This platform can increase the value of preventive medicine and telemedicine.

Collapse

Bloomgarden ZT. Use of online information in diabetes. J Diabetes 2020;12:268-269. [PMID: 31943760 DOI: 10.1111/1753-0407.13022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Ye Q, Patel R, Khan U, Boren SA, Kim MS. Evaluation of provider documentation patterns as a tool to deliver ongoing patient-centred diabetes education and support. Int J Clin Pract 2020;74:e13451. [PMID: 31769903 PMCID: PMC7047595 DOI: 10.1111/ijcp.13451] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 10/08/2019] [Accepted: 11/20/2019] [Indexed: 01/28/2023] Open

Abstract

BACKGROUND

Diabetes mellitus (DM) is one of the most common chronic diseases in the world. As a disease with long-term complications requiring changes in management, DM requires not only education at the time of diagnosis, but ongoing diabetes self-management education and support (DSME/S). In the United States, however, only a small proportion of people with DM receive DSME/S, although evidence supports benefits of ongoing DSME/S. The diabetes education that providers deliver during follow-up visits may be an important source for DSME/S for many people with DM.

METHODS

We collected 200 clinic notes of follow-up visits for 100 adults with DM and studied the History of Present Illness (HPI) and Impression and Plan (I&P) sections. Using a codebook based on the seven principles of American Association of Diabetes Educators Self-Care Behaviors (AADE7), we conducted a multi-step deductive thematic analysis to determine the patterns of DSME/S information occurrence in clinic notes. Additionally, we used the generalised linear mixed models for investigating whether providers delivered DSME/S to people with DM based on patient characteristics.

RESULTS

During follow-up visits, Monitoring was the most common self-care behaviour mentioned in both HPI and I&P sections. Being Active was the least common self-care behaviour mentioned in the HPI section and Healthy Coping was the least common self-care behaviour mentioned in the I&P section. We found providers delivered more information on Healthy Eating to men compared to women in I&P section. Generally, providers delivered DSME/S to people with DM regardless of patient characteristics.

CONCLUSIONS

This study focused on the frequency distribution of information providers delivered to the people with DM during follow-up clinic visits based on the AADE7. The results may indicate a lack of patient-centred education when people with DM visit providers for ongoing management. Further studies are needed to identify the underlying reasons why providers have difficulty delivering patient-centred education.

Collapse

Kersloot MG, Lau F, Abu-Hanna A, Arts DL, Cornet R. Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES. J Biomed Semantics 2019;10:14. [PMID: 31533810 PMCID: PMC6749652 DOI: 10.1186/s13326-019-0207-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 08/13/2019] [Indexed: 12/05/2022] Open

Abstract

Background

Information in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extraction. However, most algorithms in MLP are institution-specific or address only one clinical need, and thus cannot be broadly applied. In addition, most MLP systems do not detect concepts in misspelled text and cannot detect attribute relationships between concepts. The objective of this study was to develop and evaluate an MLP application that includes generic algorithms for the detection of (misspelled) concepts and of attribute relationships between them.

Methods

An implementation of the MLP system cTAKES, called DIRECT, was developed with generic SNOMED CT concept filter, concept relationship detection, and attribute relationship detection algorithms and a custom dictionary. Four implementations of cTAKES were evaluated by comparing 98 manually annotated oncology charts with the output of DIRECT. The F₁-score was determined for named-entity recognition and attribute relationship detection for the concepts ‘lung cancer’, ‘non-small cell lung cancer’, and ‘recurrence’. The performance of the four implementations was compared with a two-tailed permutation test.

Results

DIRECT detected lung cancer and non-small cell lung cancer concepts with F₁-scores between 0.828 and 0.947 and between 0.862 and 0.933, respectively. The concept recurrence was detected with a significantly higher F₁-score of 0.921, compared to the other implementations, and the relationship between recurrence and lung cancer with an F₁-score of 0.857. The precision of the detection of lung cancer, non-small cell lung cancer, and recurrence concepts were 1.000, 0.966, and 0.879, compared to precisions of 0.943, 0.967, and 0.000 in the original implementation, respectively.

Conclusion

DIRECT can detect oncology concepts and attribute relationships with high precision and can detect recurrence with significant increase in F₁-score, compared to the original implementation of cTAKES, due to the usage of a custom dictionary and a generic concept relationship detection algorithm. These concepts and relationships can be used to encode clinical narratives, and can thus substantially reduce manual chart abstraction efforts, saving time for clinicians and researchers.

Collapse

Moon S, Liu S, Scott CG, Samudrala S, Abidian MM, Geske JB, Noseworthy PA, Shellum JL, Chaudhry R, Ommen SR, Nishimura RA, Liu H, Arruda-Olson AM. Automated extraction of sudden cardiac death risk factors in hypertrophic cardiomyopathy patients by natural language processing. Int J Med Inform 2019;128:32-38. [PMID: 31160009 DOI: 10.1016/j.ijmedinf.2019.05.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 01/19/2019] [Accepted: 05/11/2019] [Indexed: 01/12/2023]

Abstract

BACKGROUND

The management of hypertrophic cardiomyopathy (HCM) patients requires the knowledge of risk factors associated with sudden cardiac death (SCD). SCD risk factors such as syncope and family history of SCD (FH-SCD) as well as family history of HCM (FH-HCM) are documented in electronic health records (EHRs) as clinical narratives. Automated extraction of risk factors from clinical narratives by natural language processing (NLP) may expedite management workflow of HCM patients. The aim of this study was to develop and deploy NLP algorithms for automated extraction of syncope, FH-SCD, and FH-HCM from clinical narratives.

METHODS AND RESULTS

We randomly selected 200 patients from the Mayo HCM registry for development (n = 100) and testing (n = 100) of NLP algorithms for extraction of syncope, FH-SCD as well as FH-HCM from clinical narratives of EHRs. The clinical reference standard was manually abstracted by 2 independent annotators. Performance of NLP algorithms was compared to aggregation and summarization of data entries in the HCM registry for syncope, FH-SCD, and FH-HCM. We also compared the NLP algorithms with billing codes for syncope as well as responses to patient survey questions for FH-SCD and FH-HCM. These analyses demonstrated NLP had superior sensitivity (0.96 vs 0.39, p < 0.001) and comparable specificity (0.90 vs 0.92, p = 0.74) and PPV (0.90 vs 0.83, p = 0.37) compared to billing codes for syncope. For FH-SCD, NLP outperformed survey responses for all parameters (sensitivity: 0.91 vs 0.59, p = 0.002; specificity: 0.98 vs 0.50, p < 0.001; PPV: 0.97 vs 0.38, p < 0.001). NLP also achieved superior sensitivity (0.95 vs 0.24, p < 0.001) with comparable specificity (0.95 vs 1.0, p-value not calculable) and positive predictive value (PPV) (0.92 vs 1.0, p = 0.09) compared to survey responses for FH-HCM.

CONCLUSIONS

Automated extraction of syncope, FH-SCD and FH-HCM using NLP is feasible and has promise to increase efficiency of workflow for providers managing HCM patients.

Collapse

Wang X, Zhang Y, Hao S, Zheng L, Liao J, Ye C, Xia M, Wang O, Liu M, Weng CH, Duong SQ, Jin B, Alfreds ST, Stearns F, Kanov L, Sylvester KG, Widen E, McElhinney DB, Ling XB. Prediction of the 1-Year Risk of Incident Lung Cancer: Prospective Study Using Electronic Health Records from the State of Maine. J Med Internet Res 2019;21:e13260. [PMID: 31099339 PMCID: PMC6542253 DOI: 10.2196/13260] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 04/18/2019] [Accepted: 04/23/2019] [Indexed: 02/05/2023] Open

Abstract

BACKGROUND

Lung cancer is the leading cause of cancer death worldwide. Early detection of individuals at risk of lung cancer is critical to reduce the mortality rate.

OBJECTIVE

The aim of this study was to develop and validate a prospective risk prediction model to identify patients at risk of new incident lung cancer within the next 1 year in the general population.

METHODS

Data from individual patient electronic health records (EHRs) were extracted from the Maine Health Information Exchange network. The study population consisted of patients with at least one EHR between April 1, 2016, and March 31, 2018, who had no history of lung cancer. A retrospective cohort (N=873,598) and a prospective cohort (N=836,659) were formed for model construction and validation. An Extreme Gradient Boosting (XGBoost) algorithm was adopted to build the model. It assigned a score to each individual to quantify the probability of a new incident lung cancer diagnosis from October 1, 2016, to September 31, 2017. The model was trained with the clinical profile in the retrospective cohort from the preceding 6 months and validated with the prospective cohort to predict the risk of incident lung cancer from April 1, 2017, to March 31, 2018.

RESULTS

The model had an area under the curve (AUC) of 0.881 (95% CI 0.873-0.889) in the prospective cohort. Two thresholds of 0.0045 and 0.01 were applied to the predictive scores to stratify the population into low-, medium-, and high-risk categories. The incidence of lung cancer in the high-risk category (579/53,922, 1.07%) was 7.7 times higher than that in the overall cohort (1167/836,659, 0.14%). Age, a history of pulmonary diseases and other chronic diseases, medications for mental disorders, and social disparities were found to be associated with new incident lung cancer.

CONCLUSIONS

We retrospectively developed and prospectively validated an accurate risk prediction model of new incident lung cancer occurring in the next 1 year. Through statistical learning from the statewide EHR data in the preceding 6 months, our model was able to identify statewide high-risk patients, which will benefit the population health through establishment of preventive interventions or more intensive surveillance.

Collapse

Affiliation(s)

Xiaofang Wang Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China.,Department of Surgery, Stanford University, Stanford, CA, United States
Yan Zhang Department of Oncology, The First Hospital of Shijiazhuang, Shijiazhuang, China
Shiying Hao Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
Le Zheng Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
Jiayu Liao Department of Bioengineering, University of California, Riverside, CA, United States.,West China-California Multiomics Research Center, West China Hospital, Sichuan University, Chengdu, China
Chengyin Ye Department of Health Management, Hangzhou Normal University, Hangzhou, China
Minjie Xia Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
Oliver Wang Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
Modi Liu Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
Ching Ho Weng Department of Surgery, Stanford University, Stanford, CA, United States
Son Q Duong Lucile Packard Children's Hospital, Palo Alto, CA, United States
Bo Jin Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
Shaun T Alfreds HealthInfoNet, Portland, ME, United States
Frank Stearns Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
Laura Kanov Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
Karl G Sylvester Department of Surgery, Stanford University, Stanford, CA, United States
Eric Widen Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
Doff B McElhinney Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
Xuefeng B Ling Department of Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States

Collapse

Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform 2019;7:e12239. [PMID: 31066697 PMCID: PMC6528438 DOI: 10.2196/12239] [Citation(s) in RCA: 204] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 03/04/2019] [Accepted: 03/24/2019] [Indexed: 01/08/2023] Open

Abstract

Background

Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset.

Objective

The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives.

Methods

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using “clinical notes,” “natural language processing,” and “chronic disease” and their variations as keywords to maximize coverage of the articles.

Results

Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes.

Conclusions

Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.

Collapse

Guetterman TC, Chang T, DeJonckheere M, Basu T, Scruggs E, Vydiswaran VGV. Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study. J Med Internet Res 2018;20:e231. [PMID: 29959110 PMCID: PMC6045788 DOI: 10.2196/jmir.9702] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 05/14/2018] [Accepted: 05/15/2018] [Indexed: 11/18/2022] Open

Abstract

Background

Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when analyzing text data. One potential method to address this concern is natural language processing (NLP). Qualitative text analysis involves researchers reading data, assigning code labels, and iteratively developing findings; NLP has the potential to automate part of this process. Unfortunately, little methodological research has been done to compare automatic coding using NLP techniques and qualitative coding, which is critical to establish the viability of NLP as a useful, rigorous analysis procedure.

Objective

The purpose of this study was to compare the utility of a traditional qualitative text analysis, an NLP analysis, and an augmented approach that combines qualitative and NLP methods.

Methods

We conducted a 2-arm cross-over experiment to compare qualitative and NLP approaches to analyze data generated through 2 text (short message service) message survey questions, one about prescription drugs and the other about police interactions, sent to youth aged 14-24 years. We randomly assigned a question to each of the 2 experienced qualitative analysis teams for independent coding and analysis before receiving NLP results. A third team separately conducted NLP analysis of the same 2 questions. We examined the results of our analyses to compare (1) the similarity of findings derived, (2) the quality of inferences generated, and (3) the time spent in analysis.

Results

The qualitative-only analysis for the drug question (n=58) yielded 4 major findings, whereas the NLP analysis yielded 3 findings that missed contextual elements. The qualitative and NLP-augmented analysis was the most comprehensive. For the police question (n=68), the qualitative-only analysis yielded 4 primary findings and the NLP-only analysis yielded 4 slightly different findings. Again, the augmented qualitative and NLP analysis was the most comprehensive and produced the highest quality inferences, increasing our depth of understanding (ie, details and frequencies). In terms of time, the NLP-only approach was quicker than the qualitative-only approach for the drug (120 vs 270 minutes) and police (40 vs 270 minutes) questions. An approach beginning with qualitative analysis followed by qualitative- or NLP-augmented analysis took longer time than that beginning with NLP for both drug (450 vs 240 minutes) and police (390 vs 220 minutes) questions.

Conclusions

NLP provides both a foundation to code qualitatively more quickly and a method to validate qualitative findings. NLP methods were able to identify major themes found with traditional qualitative analysis but were not useful in identifying nuances. Traditional qualitative text analysis added important details and context.

Collapse

Patel YR, Robbins JM, Kurgansky KE, Imran T, Orkaby AR, McLean RR, Ho YL, Cho K, Michael Gaziano J, Djousse L, Gagnon DR, Joseph J. Development and validation of a heart failure with preserved ejection fraction cohort using electronic medical records. BMC Cardiovasc Disord 2018;18:128. [PMID: 29954337 PMCID: PMC6022342 DOI: 10.1186/s12872-018-0866-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 06/20/2018] [Indexed: 01/14/2023] Open

Affiliation(s)

Yash R Patel Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Mount Sinai St Luke's & Mount Sinai West Hospitals, New York, NY, USA
Jeremy M Robbins Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Division of Cardiology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
Katherine E Kurgansky Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA
Tasnim Imran Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Boston Medical Center, Boston University School of Medicine, Boston, MA, USA
Ariela R Orkaby Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Geriatric Research, Education and Clinical Center (GRECC), Veterans Affairs Boston Healthcare System, Boston, MA, USA
Robert R McLean Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Institute for Aging Research, Hebrew SeniorLife, Boston, MA, USA.,Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
Yuk-Lam Ho Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA
Kelly Cho Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA
J Michael Gaziano Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
Luc Djousse Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Department of Biostatistics, Boston University School of Public Health, Boston, USA
David R Gagnon Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA.,Department of Biostatistics, Boston University School of Public Health, Boston, USA
Jacob Joseph Massachusetts Veterans Epidemiology and Research Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA. .,Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA. .,Cardiology Section, VA Boston Healthcare System, 1400 VFW Parkway, West Roxbury, MA, 02132, USA.

Collapse

Guo Y, Zheng G, Fu T, Hao S, Ye C, Zheng L, Liu M, Xia M, Jin B, Zhu C, Wang O, Wu Q, Culver DS, Alfreds ST, Stearns F, Kanov L, Bhatia A, Sylvester KG, Widen E, McElhinney DB, Ling XB. Assessing Statewide All-Cause Future One-Year Mortality: Prospective Study With Implications for Quality of Life, Resource Utilization, and Medical Futility. J Med Internet Res 2018;20:e10311. [PMID: 29866643 PMCID: PMC6066632 DOI: 10.2196/10311] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 04/24/2018] [Accepted: 04/26/2018] [Indexed: 01/19/2023] Open

Abstract

Background

For many elderly patients, a disproportionate amount of health care resources and expenditures is spent during the last year of life, despite the discomfort and reduced quality of life associated with many aggressive medical approaches. However, few prognostic tools have focused on predicting all-cause 1-year mortality among elderly patients at a statewide level, an issue that has implications for improving quality of life while distributing scarce resources fairly.

Objective

Using data from a statewide elderly population (aged ≥65 years), we sought to prospectively validate an algorithm to identify patients at risk for dying in the next year for the purpose of minimizing decision uncertainty, improving quality of life, and reducing futile treatment.

Methods

Analysis was performed using electronic medical records from the Health Information Exchange in the state of Maine, which covered records of nearly 95% of the statewide population. The model was developed from 125,896 patients aged at least 65 years who were discharged from any care facility in the Health Information Exchange network from September 5, 2013, to September 4, 2015. Validation was conducted using 153,199 patients with same inclusion and exclusion criteria from September 5, 2014, to September 4, 2016. Patients were stratified into risk groups. The association between all-cause 1-year mortality and risk factors was screened by chi-squared test and manually reviewed by 2 clinicians. We calculated risk scores for individual patients using a gradient tree-based boost algorithm, which measured the probability of mortality within the next year based on the preceding 1-year clinical profile.

Results

The development sample included 125,896 patients (72,572 women, 57.64%; mean 74.2 [SD 7.7] years). The final validation cohort included 153,199 patients (88,177 women, 57.56%; mean 74.3 [SD 7.8] years). The c-statistic for discrimination was 0.96 (95% CI 0.93-0.98) in the development group and 0.91 (95% CI 0.90-0.94) in the validation cohort. The mortality was 0.99% in the low-risk group, 16.75% in the intermediate-risk group, and 72.12% in the high-risk group. A total of 99 independent risk factors (n=99) for mortality were identified (reported as odds ratios; 95% CI). Age was on the top of list (1.41; 1.06-1.48); congestive heart failure (20.90; 15.41-28.08) and different tumor sites were also recognized as driving risk factors, such as cancer of the ovaries (14.42; 2.24-53.04), colon (14.07; 10.08-19.08), and stomach (13.64; 3.26-86.57). Disparities were also found in patients’ social determinants like respiratory hazard index (1.24; 0.92-1.40) and unemployment rate (1.18; 0.98-1.24). Among high-risk patients who expired in our dataset, cerebrovascular accident, amputation, and type 1 diabetes were the top 3 diseases in terms of average cost in the last year of life.

Conclusions

Our study prospectively validated an accurate 1-year risk prediction model and stratification for the elderly population (≥65 years) at risk of mortality with statewide electronic medical record datasets. It should be a valuable adjunct for helping patients to make better quality-of-life choices and alerting care givers to target high-risk elderly for appropriate care and discussions, thus cutting back on futile treatment.

Collapse

Affiliation(s)

Yanting Guo School of Management, Zhejiang University, Hangzhou, China.,Department of Surgery, Stanford University, Stanford, CA, United States
Gang Zheng School of Management, Zhejiang University, Hangzhou, China
Tianyun Fu HBI Solutions Inc, Palo Alto, CA, United States
Shiying Hao Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
Chengyin Ye Department of Surgery, Stanford University, Stanford, CA, United States.,Department of Health Management, Hangzhou Normal University, Hangzhou, China
Le Zheng Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
Modi Liu HBI Solutions Inc, Palo Alto, CA, United States
Minjie Xia HBI Solutions Inc, Palo Alto, CA, United States
Bo Jin HBI Solutions Inc, Palo Alto, CA, United States
Chunqing Zhu HBI Solutions Inc, Palo Alto, CA, United States
Oliver Wang HBI Solutions Inc, Palo Alto, CA, United States
Qian Wu Department of Surgery, Stanford University, Stanford, CA, United States.,China Electric Power Research Institute, Beijing, China
Devore S Culver HealthInfoNet, Portland, ME, United States
Shaun T Alfreds HealthInfoNet, Portland, ME, United States
Frank Stearns HBI Solutions Inc, Palo Alto, CA, United States
Laura Kanov HBI Solutions Inc, Palo Alto, CA, United States
Ajay Bhatia Department of Pediatrics, Stanford University, Stanford, CA, United States
Karl G Sylvester Department of Surgery, Stanford University, Stanford, CA, United States
Eric Widen HBI Solutions Inc, Palo Alto, CA, United States
Doff B McElhinney Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
Xuefeng Bruce Ling Department of Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States.,Department of Epidemiology and Health Statistics, School of Public Health, School of Medicine, Zhejiang University, Hangzhou, China

Collapse

Shim H, Ailshire J, Zelinski E, Crimmins E. The Health and Retirement Study: Analysis of Associations Between Use of the Internet for Health Information and Use of Health Services at Multiple Time Points. J Med Internet Res 2018;20:e200. [PMID: 29802088 PMCID: PMC5993973 DOI: 10.2196/jmir.8203] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Revised: 12/14/2017] [Accepted: 04/11/2018] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

The use of the internet for health information among older people is receiving increasing attention, but how it is associated with chronic health conditions and health service use at concurrent and subsequent time points using nationally representative data is less known.

OBJECTIVE

This study aimed to determine whether the use of the internet for health information is associated with health service utilization and whether the association is affected by specific health conditions.

METHODS

The study used data collected in a technology module from a nationally representative sample of community-dwelling older Americans aged 52 years and above from the 2012 Health and Retirement Study (HRS; N=991). Negative binomial regressions were used to examine the association between use of Web-based health information and the reported health service uses in 2012 and 2014. Analyses included additional covariates adjusting for predisposing, enabling, and need factors. Interactions between the use of the internet for health information and chronic health conditions were also tested.

RESULTS

A total of 48.0% (476/991) of Americans aged 52 years and above reported using Web-based health information. The use of Web-based health information was positively associated with the concurrent reports of doctor visits, but not over 2 years. However, an interaction of using Web-based health information with diabetes showed that users had significantly fewer doctor visits compared with nonusers with diabetes at both times.

CONCLUSIONS

The use of the internet for health information was associated with higher health service use at the concurrent time, but not at the subsequent time. The interaction between the use of the internet for health information and diabetes was significant at both time points, which suggests that health-related internet use may be associated with fewer doctor visits for certain chronic health conditions. Results provide some insight into how Web-based health information may provide an alternative health care resource for managing chronic conditions.

Collapse

Chen X, Xie H, Wang FL, Liu Z, Xu J, Hao T. A bibliometric analysis of natural language processing in medical research. BMC Med Inform Decis Mak 2018;18:14. [PMID: 29589569 PMCID: PMC5872501 DOI: 10.1186/s12911-018-0594-x] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Abstract

Background

Natural language processing (NLP) has become an increasingly significant role in advancing medicine. Rich research achievements of NLP methods and applications for medical information processing are available. It is of great significance to conduct a deep analysis to understand the recent development of NLP-empowered medical research field. However, limited study examining the research status of this field could be found. Therefore, this study aims to quantitatively assess the academic output of NLP in medical research field.

Methods

We conducted a bibliometric analysis on NLP-empowered medical research publications retrieved from PubMed in the period 2007–2016. The analysis focused on three aspects. Firstly, the literature distribution characteristics were obtained with a statistics analysis method. Secondly, a network analysis method was used to reveal scientific collaboration relations. Finally, thematic discovery and evolution was reflected using an affinity propagation clustering method.

Results

There were 1405 NLP-empowered medical research publications published during the 10 years with an average annual growth rate of 18.39%. 10 most productive publication sources together contributed more than 50% of the total publications. The USA had the highest number of publications. A moderately significant correlation between country’s publications and GDP per capita was revealed. Denny, Joshua C was the most productive author. Mayo Clinic was the most productive affiliation. The annual co-affiliation and co-country rates reached 64.04% and 15.79% in 2016, respectively. 10 main great thematic areas were identified including Computational biology, Terminology mining, Information extraction, Text classification, Social medium as data source, Information retrieval, etc.

Conclusions

A bibliometric analysis of NLP-empowered medical research publications for uncovering the recent research status is presented. The results can assist relevant researchers, especially newcomers in understanding the research development systematically, seeking scientific cooperation partners, optimizing research topic choices and monitoring new scientific or technological activities.

Collapse

Afzal N, Mallipeddi VP, Sohn S, Liu H, Chaudhry R, Scott CG, Kullo IJ, Arruda-Olson AM. Natural language processing of clinical notes for identification of critical limb ischemia. Int J Med Inform 2018;111:83-89. [PMID: 29425639 PMCID: PMC5808583 DOI: 10.1016/j.ijmedinf.2017.12.024] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 12/17/2017] [Accepted: 12/27/2017] [Indexed: 12/27/2022]

Duncan I, Fitzner K, Handmaker KE. Augmented Intelligence: Enhancing the Roles of Health Actuaries and Health Economists for Population Health Management. Popul Health Manag 2017;21:341-343. [PMID: 29064330 DOI: 10.1089/pop.2017.0146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Wi CI, Sohn S, Rolfes MC, Seabright A, Ryu E, Voge G, Bachman KA, Park MA, Kita H, Croghan IT, Liu H, Juhn YJ. Application of a Natural Language Processing Algorithm to Asthma Ascertainment. An Automated Chart Review. Am J Respir Crit Care Med 2017;196:430-437. [PMID: 28375665 DOI: 10.1164/rccm.201610-2006oc] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Névéol A, Zweigenbaum P. Making Sense of Big Textual Data for Health Care: Findings from the Section on Clinical Natural Language Processing. Yearb Med Inform 2017;26:228-234. [PMID: 29063569 PMCID: PMC6239234 DOI: 10.15265/iy-2017-027] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Indexed: 02/01/2023] Open

Hao S, Fu T, Wu Q, Jin B, Zhu C, Hu Z, Guo Y, Zhang Y, Yu Y, Fouts T, Ng P, Culver DS, Alfreds ST, Stearns F, Sylvester KG, Widen E, McElhinney DB, Ling XB. Estimating One-Year Risk of Incident Chronic Kidney Disease: Retrospective Development and Validation Study Using Electronic Medical Record Data From the State of Maine. JMIR Med Inform 2017;5:e21. [PMID: 28747298 PMCID: PMC5550735 DOI: 10.2196/medinform.7954] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 06/29/2017] [Accepted: 07/10/2017] [Indexed: 01/28/2023] Open

Abstract

Background

Chronic kidney disease (CKD) is a major public health concern in the United States with high prevalence, growing incidence, and serious adverse outcomes.

Objective

We aimed to develop and validate a model to identify patients at risk of receiving a new diagnosis of CKD (incident CKD) during the next 1 year in a general population.

Methods

The study population consisted of patients who had visited any care facility in the Maine Health Information Exchange network any time between January 1, 2013, and December 31, 2015, and had no history of CKD diagnosis. Two retrospective cohorts of electronic medical records (EMRs) were constructed for model derivation (N=1,310,363) and validation (N=1,430,772). The model was derived using a gradient tree-based boost algorithm to assign a score to each individual that measured the probability of receiving a new diagnosis of CKD from January 1, 2014, to December 31, 2014, based on the preceding 1-year clinical profile. A feature selection process was conducted to reduce the dimension of the data from 14,680 EMR features to 146 as predictors in the final model. Relative risk was calculated by the model to gauge the risk ratio of the individual to population mean of receiving a CKD diagnosis in next 1 year. The model was tested on the validation cohort to predict risk of CKD diagnosis in the period from January 1, 2015, to December 31, 2015, using the preceding 1-year clinical profile.

Results

The final model had a c-statistic of 0.871 in the validation cohort. It stratified patients into low-risk (score 0-0.005), intermediate-risk (score 0.005-0.05), and high-risk (score ≥ 0.05) levels. The incidence of CKD in the high-risk patient group was 7.94%, 13.7 times higher than the incidence in the overall cohort (0.58%). Survival analysis showed that patients in the 3 risk categories had significantly different CKD outcomes as a function of time (P<.001), indicating an effective classification of patients by the model.

Conclusions

We developed and validated a model that is able to identify patients at high risk of having CKD in the next 1 year by statistically learning from the EMR-based clinical history in the preceding 1 year. Identification of these patients indicates care opportunities such as monitoring and adopting intervention plans that may benefit the quality of care and outcomes in the long term.

Collapse

Affiliation(s)

Shiying Hao Department of Epidemiology and Health Statistics, School of Public Health, School of Medicine, Zhejiang University, Hangzhou, China.,Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
Tianyun Fu HBI Solutions Inc, Palo Alto, CA, United States
Qian Wu Department of Surgery, Stanford University, Stanford, CA, United States.,China Electric Power Research Institute, Beijing, China
Bo Jin HBI Solutions Inc, Palo Alto, CA, United States
Chunqing Zhu HBI Solutions Inc, Palo Alto, CA, United States
Zhongkai Hu Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
Yanting Guo Department of Surgery, Stanford University, Stanford, CA, United States.,School of Management, Zhejiang University, Hangzhou, China
Yan Zhang Department of Surgery, Stanford University, Stanford, CA, United States.,Department of Oncology, The First Hospital of Shijiazhuang, Shijiazhuang, China
Yunxian Yu Department of Epidemiology and Health Statistics, School of Public Health, School of Medicine, Zhejiang University, Hangzhou, China
Terry Fouts Empactful Capital, San Francisco, CA, United States
Phillip Ng Sequoia Hospital, Redwood City, CA, United States
Devore S Culver HealthInfoNet, Portland, ME, United States
Shaun T Alfreds HealthInfoNet, Portland, ME, United States
Frank Stearns HBI Solutions Inc, Palo Alto, CA, United States
Karl G Sylvester Department of Surgery, Stanford University, Stanford, CA, United States
Eric Widen HBI Solutions Inc, Palo Alto, CA, United States
Doff B McElhinney Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
Xuefeng B Ling Department of Epidemiology and Health Statistics, School of Public Health, School of Medicine, Zhejiang University, Hangzhou, China.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States.,Department of Surgery, Stanford University, Stanford, CA, United States

Collapse

Defining and characterizing the critical transition state prior to the type 2 diabetes disease. PLoS One 2017;12:e0180937. [PMID: 28686739 PMCID: PMC5501620 DOI: 10.1371/journal.pone.0180937] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 06/24/2017] [Indexed: 11/19/2022] Open