Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wang L, Luo L, Wang Y, Wampfler J, Yang P, Liu H. Natural language processing for populating lung cancer clinical research data. BMC Med Inform Decis Mak 2019;19:239. [PMID: 31801515 PMCID: PMC6894100 DOI: 10.1186/s12911-019-0931-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

For:	Wang L, Luo L, Wang Y, Wampfler J, Yang P, Liu H. Natural language processing for populating lung cancer clinical research data. BMC Med Inform Decis Mak 2019;19:239. [PMID: 31801515 PMCID: PMC6894100 DOI: 10.1186/s12911-019-0931-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

Number

Cited by Other Article(s)

Kenney RC, Chen X, Shintani K, Gagnon C, Liu J, DaCosta Byfield S, Ochs L, Currie AM. Validation of Non-Small Cell Lung Cancer Clinical Insights Using a Generalized Oncology Natural Language Processing Model. JCO Clin Cancer Inform 2024;8:e2300099. [PMID: 39230200 DOI: 10.1200/cci.23.00099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 04/02/2024] [Accepted: 06/18/2024] [Indexed: 09/05/2024] Open

Abstract

PURPOSE

Limited studies have used natural language processing (NLP) in the context of non-small cell lung cancer (NSCLC). This study aimed to validate the application of an NLP model to an NSCLC cohort by extracting NSCLC concepts from free-text medical notes and converting them to structured, interpretable data.

METHODS

Patients with a lung neoplasm, NSCLC histology, and treatment information in their notes were selected from a repository of over 27 million patients. From these, 200 were randomly selected for this study with the longest and the most recent note included for each patient. An NLP model developed and validated on a large solid and blood cancer oncology cohort was applied to this NSCLC cohort. Two certified tumor registrars and a curator abstracted concepts from the notes: neoplasm, histology, stage, TNM values, and metastasis sites. This manually abstracted gold standard was compared with the NLP model output. Precision and recall scores were calculated.

RESULTS

The NLP model extracted the NSCLC concepts with excellent precision and recall with the following scores, respectively: Lung neoplasm 100% and 100%, NSCLC histology 99% and 88%, histology correctly linked to neoplasm 98% and 79%, stage value 98.8% and 92%, stage TNM value 93% and 98%, and metastasis site 97% and 89%. High precision is related to a low number of false positives, and therefore, extracted concepts are likely accurate. High recall indicates that the model captured most of the desired concepts.

CONCLUSION

This study validates that Optum's oncology NLP model has high precision and recall with clinical real-world data and is a reliable model to support research studies and clinical trials. This validation study shows that our nonspecific solid tumor and blood cancer oncology model is generalizable to successfully extract clinical information from specific cancer cohorts.

Collapse

Wi S, Goldhoff PE, Fuller LA, Grewal K, Wentzensen N, Clarke MA, Lorey TS. Using Natural Language Processing to Improve Discrete Data Capture From Interpretive Cervical Biopsy Diagnoses at a Large Health Care Organization. Arch Pathol Lab Med 2023;147:222-226. [PMID: 35390126 DOI: 10.5858/arpa.2021-0410-oa] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/15/2021] [Indexed: 02/05/2023]

Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022;6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open

Zhou S, Wang N, Wang L, Liu H, Zhang R. CancerBERT: a cancer domain-specific language model for extracting breast cancer phenotypes from electronic health records. J Am Med Inform Assoc 2022;29:1208-1216. [PMID: 35333345 PMCID: PMC9196678 DOI: 10.1093/jamia/ocac040] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 03/06/2022] [Accepted: 03/09/2022] [Indexed: 11/16/2022] Open

Yoo S, Yoon E, Boo D, Kim B, Kim S, Paeng JC, Yoo IR, Choi IY, Kim K, Ryoo HG, Lee SJ, Song E, Joo YH, Kim J, Lee HY. Transforming Thyroid Cancer Diagnosis and Staging Information from Unstructured Reports to the Observational Medical Outcome Partnership Common Data Model. Appl Clin Inform 2022;13:521-531. [PMID: 35705182 PMCID: PMC9200482 DOI: 10.1055/s-0042-1748144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open

Abstract

BACKGROUND

Cancer staging information is an essential component of cancer research. However, the information is primarily stored as either a full or semistructured free-text clinical document which is limiting the data use. By transforming the cancer-specific data to the Observational Medical Outcome Partnership Common Data Model (OMOP CDM), the information can contribute to establish multicenter observational cancer studies. To the best of our knowledge, there have been no studies on OMOP CDM transformation and natural language processing (NLP) for thyroid cancer to date.

OBJECTIVE

We aimed to demonstrate the applicability of the OMOP CDM oncology extension module for thyroid cancer diagnosis and cancer stage information by processing free-text medical reports.

METHODS

Thyroid cancer diagnosis and stage-related modifiers were extracted with rule-based NLP from 63,795 thyroid cancer pathology reports and 56,239 Iodine whole-body scan reports from three medical institutions in the Observational Health Data Sciences and Informatics data network. The data were converted into the OMOP CDM v6.0 according to the OMOP CDM oncology extension module. The cancer staging group was derived and populated using the transformed CDM data.

RESULTS

The extracted thyroid cancer data were completely converted into the OMOP CDM. The distributions of histopathological types of thyroid cancer were approximately 95.3 to 98.8% of papillary carcinoma, 0.9 to 3.7% of follicular carcinoma, 0.04 to 0.54% of adenocarcinoma, 0.17 to 0.81% of medullary carcinoma, and 0 to 0.3% of anaplastic carcinoma. Regarding cancer staging, stage-I thyroid cancer accounted for 55 to 64% of the cases, while stage III accounted for 24 to 26% of the cases. Stage-II and -IV thyroid cancers were detected at a low rate of 2 to 6%.

CONCLUSION

As a first study on OMOP CDM transformation and NLP for thyroid cancer, this study will help other institutions to standardize thyroid cancer-specific data for retrospective observational research and participate in multicenter studies.

Collapse

Affiliation(s)

Sooyoung Yoo Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
Eunsil Yoon Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
Dachung Boo Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
Borham Kim Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
Seok Kim Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
Jin Chul Paeng Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea
Ie Ryung Yoo Division of Nuclear Medicine, Department of Radiology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, South Korea
In Young Choi Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea.,Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea
Kwangsoo Kim Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Seoul, South Korea
Hyun Gee Ryoo Department of Nuclear Medicine, Seoul National University Hospital, Seoul, South Korea.,Department of Nuclear Medicine, Seoul National University Bundang Hospital, Seongnam, South Korea
Sun Jung Lee Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea.,Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea
Eunhye Song Department of Data Science Research, Innovative Medical Technology Research Institute, Seoul National University Hospital, Seoul, South Korea
Young-Hwan Joo Biomedical Research Institute, Seoul National University Hospital, Seoul, South Korea
Junmo Kim Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, South Korea
Ho-Young Lee Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea.,Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea

Collapse

Bernstam EV, Shireman PK, Meric‐Bernstam F, N. Zozus M, Jiang X, Brimhall BB, Windham AK, Schmidt S, Visweswaran S, Ye Y, Goodrum H, Ling Y, Barapatre S, Becich MJ. Artificial intelligence in clinical and translational science: Successes, challenges and opportunities. Clin Transl Sci 2022;15:309-321. [PMID: 34706145 PMCID: PMC8841416 DOI: 10.1111/cts.13175] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 10/01/2021] [Indexed: 01/12/2023] Open

Affiliation(s)

Elmer V. Bernstam School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTexasUSA Division of General Internal MedicineDepartment of Internal MedicineMcGovern Medical SchoolThe University of Texas Health Science Center at HoustonHoustonTexasUSA
Paula K. Shireman Departments of Surgery and MicrobiologyImmunology & Molecular GeneticsUniversity of Texas Health San AntonioSan AntonioTexasUSA University HealthSan AntonioTexasUSA South Texas Veterans Health Care SystemSan AntonioTexasUSA
Funda Meric‐Bernstam Department of Investigational Cancer TherapeuticsThe University of Texas MD Anderson Cancer CenterHoustonTexasUSA
Meredith N. Zozus Division of Clinical Research InformaticsDepartment of Population Health SciencesUniversity of Texas Health San AntonioSan AntonioTexasUSA
Xiaoqian Jiang School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTexasUSA
Bradley B. Brimhall University HealthSan AntonioTexasUSA Department of PathologyUniversity of Texas Health San AntonioSan AntonioTexasUSA
Ashley K. Windham University HealthSan AntonioTexasUSA Department of PathologyUniversity of Texas Health San AntonioSan AntonioTexasUSA
Susanne Schmidt Department of Population Health SciencesUniversity of Texas Health San AntonioSan AntonioTexasUSA
Shyam Visweswaran Department of Biomedical InformaticsUniversity of Pittsburgh School of MedicinePittsburghPennsylvaniaUSA
Ye Ye Department of Biomedical InformaticsUniversity of Pittsburgh School of MedicinePittsburghPennsylvaniaUSA
Heath Goodrum School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTexasUSA
Yaobin Ling School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTexasUSA
Seemran Barapatre Department of Biomedical InformaticsUniversity of Pittsburgh School of MedicinePittsburghPennsylvaniaUSA
Michael J. Becich Department of Biomedical InformaticsUniversity of Pittsburgh School of MedicinePittsburghPennsylvaniaUSA

Collapse

Zhan X, Long H, Gou F, Duan X, Kong G, Wu J. A Convolutional Neural Network-Based Intelligent Medical System with Sensors for Assistive Diagnosis and Decision-Making in Non-Small Cell Lung Cancer. SENSORS 2021;21:s21237996. [PMID: 34884000 PMCID: PMC8659811 DOI: 10.3390/s21237996] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 11/26/2021] [Accepted: 11/28/2021] [Indexed: 12/15/2022]

Gianfrancesco MA, Goldstein ND. A narrative review on the validity of electronic health record-based research in epidemiology. BMC Med Res Methodol 2021;21:234. [PMID: 34706667 PMCID: PMC8549408 DOI: 10.1186/s12874-021-01416-5] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 09/28/2021] [Indexed: 11/10/2022] Open

Bitterman DS, Miller TA, Mak RH, Savova GK. Clinical Natural Language Processing for Radiation Oncology: A Review and Practical Primer. Int J Radiat Oncol Biol Phys 2021;110:641-655. [PMID: 33545300 DOI: 10.1016/j.ijrobp.2021.01.044] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 12/22/2020] [Accepted: 01/23/2021] [Indexed: 02/07/2023]

Abstract

Natural language processing (NLP), which aims to convert human language into expressions that can be analyzed by computers, is one of the most rapidly developing and widely used technologies in the field of artificial intelligence. Natural language processing algorithms convert unstructured free text data into structured data that can be extracted and analyzed at scale. In medicine, this unlocking of the rich, expressive data within clinical free text in electronic medical records will help untap the full potential of big data for research and clinical purposes. Recent major NLP algorithmic advances have significantly improved the performance of these algorithms, leading to a surge in academic and industry interest in developing tools to automate information extraction and phenotyping from clinical texts. Thus, these technologies are poised to transform medical research and alter clinical practices in the future. Radiation oncology stands to benefit from NLP algorithms if they are appropriately developed and deployed, as they may enable advances such as automated inclusion of radiation therapy details into cancer registries, discovery of novel insights about cancer care, and improved patient data curation and presentation at the point of care. However, challenges remain before the full value of NLP is realized, such as the plethora of jargon specific to radiation oncology, nonstandard nomenclature, a lack of publicly available labeled data for model development, and interoperability limitations between radiation oncology data silos. Successful development and implementation of high quality and high value NLP models for radiation oncology will require close collaboration between computer scientists and the radiation oncology community. Here, we present a primer on artificial intelligence algorithms in general and NLP algorithms in particular; provide guidance on how to assess the performance of such algorithms; review prior research on NLP algorithms for oncology; and describe future avenues for NLP in radiation oncology research and clinics.

Collapse

Mensa E, Colla D, Dalmasso M, Giustini M, Mamo C, Pitidis A, Radicioni DP. Violence detection explanation via semantic roles embeddings. BMC Med Inform Decis Mak 2020;20:263. [PMID: 33059690 PMCID: PMC7559980 DOI: 10.1186/s12911-020-01237-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 09/02/2020] [Indexed: 11/22/2022] Open

Abstract

Background

Emergency room reports pose specific challenges to natural language processing techniques. In this setting, violence episodes on women, elderly and children are often under-reported. Categorizing textual descriptions as containing violence-related injuries (V) vs. non-violence-related injuries (NV) is thus a relevant task to the ends of devising alerting mechanisms to track (and prevent) violence episodes.

Methods

We present ViDeS (so dubbed after Violence Detection System), a system to detect episodes of violence from narrative texts in emergency room reports. It employs a deep neural network for categorizing textual ER reports data, and complements such output by making explicit which elements corroborate the interpretation of the record as reporting about violence-related injuries. To these ends we designed a novel hybrid technique for filling semantic frames that employs distributed representations of terms herein, along with syntactic and semantic information. The system has been validated on real data annotated with two sorts of information: about the presence vs. absence of violence-related injuries, and about some semantic roles that can be interpreted as major cues for violent episodes, such as the agent that committed violence, the victim, the body district involved, etc.. The employed dataset contains over 150K records annotated with class (V,NV) information, and 200 records with finer-grained information on the aforementioned semantic roles.

Results

We used data coming from an Italian branch of the EU-Injury Database (EU-IDB) project, compiled by hospital staff. Categorization figures approach full precision and recall for negative cases and.97 precision and.94 recall on positive cases. As regards as the recognition of semantic roles, we recorded an accuracy varying from.28 to.90 according to the semantic roles involved. Moreover, the system allowed unveiling annotation errors committed by hospital staff.

Conclusions

Explaining systems’ results, so to make their output more comprehensible and convincing, is today necessary for AI systems. Our proposal is to combine distributed and symbolic (frame-like) representations as a possible answer to such pressing request for interpretability. Although presently focused on the medical domain, the proposed methodology is general and, in principle, it can be extended to further application areas and categorization tasks.

Collapse

Wang Y, Xu H, Uzuner O. Editorial: The second international workshop on health natural language processing (HealthNLP 2019). BMC Med Inform Decis Mak 2019;19:233. [PMID: 31801516 PMCID: PMC6894102 DOI: 10.1186/s12911-019-0930-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open