Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Spasić I, Livsey J, Keane JA, Nenadić G. Text mining of cancer-related information: review of current status and future directions. Int J Med Inform 2014;83:605-23. [PMID: 25008281 DOI: 10.1016/j.ijmedinf.2014.06.009] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Revised: 06/12/2014] [Accepted: 06/14/2014] [Indexed: 12/21/2022]

For:	Spasić I, Livsey J, Keane JA, Nenadić G. Text mining of cancer-related information: review of current status and future directions. Int J Med Inform 2014;83:605-23. [PMID: 25008281 DOI: 10.1016/j.ijmedinf.2014.06.009] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Revised: 06/12/2014] [Accepted: 06/14/2014] [Indexed: 12/21/2022]

Number

Cited by Other Article(s)

Ahmad F, Muhmood T. Clinical translation of nanomedicine with integrated digital medicine and machine learning interventions. Colloids Surf B Biointerfaces 2024;241:114041. [PMID: 38897022 DOI: 10.1016/j.colsurfb.2024.114041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 06/11/2024] [Accepted: 06/13/2024] [Indexed: 06/21/2024]

Kapoor R, Sleeman WC, Ghosh P, Palta J. Infrastructure tools to support an effective Radiation Oncology Learning Health System. J Appl Clin Med Phys 2023;24:e14127. [PMID: 37624227 PMCID: PMC10562037 DOI: 10.1002/acm2.14127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/17/2023] [Accepted: 07/19/2023] [Indexed: 08/26/2023] Open

Abstract

PURPOSE

Radiation Oncology Learning Health System (RO-LHS) is a promising approach to improve the quality of care by integrating clinical, dosimetry, treatment delivery, research data in real-time. This paper describes a novel set of tools to support the development of a RO-LHS and the current challenges they can address.

METHODS

We present a knowledge graph-based approach to map radiotherapy data from clinical databases to an ontology-based data repository using FAIR concepts. This strategy ensures that the data are easily discoverable, accessible, and can be used by other clinical decision support systems. It allows for visualization, presentation, and data analyses of valuable information to identify trends and patterns in patient outcomes. We designed a search engine that utilizes ontology-based keyword searching, synonym-based term matching that leverages the hierarchical nature of ontologies to retrieve patient records based on parent and children classes, connects to the Bioportal database for relevant clinical attributes retrieval. To identify similar patients, a method involving text corpus creation and vector embedding models (Word2Vec, Doc2Vec, GloVe, and FastText) are employed, using cosine similarity and distance metrics.

RESULTS

The data pipeline and tool were tested with 1660 patient clinical and dosimetry records resulting in 504 180 RDF (Resource Description Framework) tuples and visualized data relationships using graph-based representations. Patient similarity analysis using embedding models showed that the Word2Vec model had the highest mean cosine similarity, while the GloVe model exhibited more compact embeddings with lower Euclidean and Manhattan distances.

CONCLUSIONS

The framework and tools described support the development of a RO-LHS. By integrating diverse data sources and facilitating data discovery and analysis, they contribute to continuous learning and improvement in patient care. The tools enhance the quality of care by enabling the identification of cohorts, clinical decision support, and the development of clinical studies and machine learning programs in radiation oncology.

Collapse

Luo N, Zhong X, Su L, Cheng Z, Ma W, Hao P. Artificial intelligence-assisted dermatology diagnosis: From unimodal to multimodal. Comput Biol Med 2023;165:107413. [PMID: 37703714 DOI: 10.1016/j.compbiomed.2023.107413] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/02/2023] [Accepted: 08/28/2023] [Indexed: 09/15/2023]

Berloco F, Ciavarella S, Colucci S, Grieco LA, Guarini A, Zaccaria GM. ARGO 2.0: a Hybrid NLP/ML Framework for Diagnosis Standardization. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023;2023:1-4. [PMID: 38083100 DOI: 10.1109/embc40787.2023.10340022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]

Honghong H, Xin Yi FL, Tianyu GG, Jiangchou MH, Hao Sen AF, Hui San EC, Yen Tze EB, Zhuling ST, Sun Sien HH, Shyi Peng JY, Aixin S, Kheng Sit JL. Natural language processing in urology: Automated extraction of clinical information from histopathology reports of uro-oncology procedures. Heliyon 2023;9:e14793. [PMID: 37025805 PMCID: PMC10070081 DOI: 10.1016/j.heliyon.2023.e14793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 03/16/2023] [Accepted: 03/17/2023] [Indexed: 03/30/2023] Open

Abstract

Objectives

We aimed to automate routine extraction of clinically relevant unstructured information from uro-oncological histopathology reports by applying rule-based and machine learning (ML)/deep learning (DL) methods to develop an oncology focused natural language processing (NLP) algorithm.

Methods

Our algorithm employs a combination of a rule-based approach and support vector machines/neural networks (BioBert/Clinical BERT), and is optimised for accuracy. We randomly extracted 5772 uro-oncological histology reports from 2008 to 2018 from electronic health records (EHRs) and split the data into training and validation datasets in an 80:20 ratio. The training dataset was annotated by medical professionals and reviewed by cancer registrars. The validation dataset was annotated by cancer registrars and defined as the gold standard with which the algorithm outcomes were compared. The accuracy of NLP-parsed data was matched against these human annotation results. We defined an accuracy rate of >95% as "acceptable" by professional human extraction, as per our cancer registry definition.

Results

There were 11 extraction variables in 268 free-text reports. We achieved an accuracy rate of between 61.2% and 99.0% using our algorithm. Of the 11 data fields, a total of 8 data fields met the acceptable accuracy standard, while another 3 data fields had an accuracy rate between 61.2% and 89.7%. Noticeably, the rule-based approach was shown to be more effective and robust in extracting variables of interest. On the other hand, ML/DL models had poorer predictive performances due to highly imbalanced data distribution and variable writing styles between different reports and data used for domain-specific pre-trained models.

Conclusion

We designed an NLP algorithm that can automate clinical information extraction accurately from histopathology reports with an overall average micro accuracy of 93.3%.

Collapse

Analysis of Risk Factors of Coal Chemical Enterprises Based on Text Mining. JOURNAL OF ENVIRONMENTAL AND PUBLIC HEALTH 2023;2023:4181159. [PMID: 36747503 PMCID: PMC9899145 DOI: 10.1155/2023/4181159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 09/20/2022] [Accepted: 10/10/2022] [Indexed: 01/29/2023]

Classification and diagnostic prediction of breast cancer metastasis on clinical data using machine learning algorithms. Sci Rep 2023;13:485. [PMID: 36627367 PMCID: PMC9831019 DOI: 10.1038/s41598-023-27548-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 01/04/2023] [Indexed: 01/12/2023] Open

Eggermont C, Wakkee M, Bruggink A, Voorham Q, Schreuder K, Louwman M, Mooyaart A, Hollestein L. Development and Validation of an Algorithm to Identify Patients with Advanced Cutaneous Squamous Cell Carcinoma from Pathology Reports. J Invest Dermatol 2023;143:98-104.e5. [PMID: 35926654 DOI: 10.1016/j.jid.2022.07.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 06/22/2022] [Accepted: 07/11/2022] [Indexed: 10/16/2022]

Laurent G, Craynest F, Thobois M, Hajjaji N. Automatic Classification of Tumor Response From Radiology Reports With Rule-Based Natural Language Processing Integrated Into the Clinical Oncology Workflow. JCO Clin Cancer Inform 2023;7:e2200139. [PMID: 36780606 DOI: 10.1200/cci.22.00139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023] Open

Abstract

PURPOSE

Imaging reports in oncology provide critical information about the disease evolution that should be timely shared to tailor the clinical decision making and care coordination of patients with advanced cancer. However, tumor response stays unstructured in free-text and underexploited. Natural language processing (NLP) methods can help provide this critical information into the electronic health records (EHR) in real time to assist health care workers.

METHODS

A rule-based algorithm was developed using SAS tools to automatically extract and categorize tumor response within progression or no progression categories. 2,970 magnetic resonance imaging, computed tomography scan, and positron emission tomography French reports were extracted from the EHR of a large comprehensive cancer center to build a 2,637-document training set and a 603-document validation set. The model was also tested on 189 imaging reports from 46 different radiology centers. A tumor dashboard was created in the EHR using the Timeline tool of the vis.js javascript library.

RESULTS

An NLP methodology was applied to create an ontology of radiographic terms defining tumor response, mapping text to five main concepts, and application decision rules on the basis of clinical practice RECIST guidelines. The model achieved an overall accuracy of 0.88 (ranging from 0.87 to 0.94), with similar performance on both progression and no progression classification. The overall accuracy was 0.82 on reports from different radiology centers. Data were visualized and organized in a dynamic tumor response timeline. This tool was deployed successfully at our institution both retrospectively and prospectively as part of an automatic pipeline to screen reports and classify tumor response in real time for all metastatic patients.

CONCLUSION

Our approach provides an NLP-based framework to structure and classify tumor response from the EHR and integrate tumor response classification into the clinical oncology workflow.

Collapse

Nundloll V, Smail R, Stevens C, Blair G. Automating the extraction of information from a historical text and building a linked data model for the domain of ecology and conservation science. Heliyon 2022;8:e10710. [PMID: 36262290 PMCID: PMC9573881 DOI: 10.1016/j.heliyon.2022.e10710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 08/05/2022] [Accepted: 09/15/2022] [Indexed: 11/06/2022] Open

Abstract

Data heterogeneity is a pressing issue and is further compounded if we have to deal with data from textual documents. The unstructured nature of such documents implies that collating, comparing and analysing the information contained therein can be a challenging task. Automating these processes can help to unleash insightful knowledge that otherwise remains buried in them. Moreover, integrating the extracted information from the documents with other related information can help to make more information-rich queries. In this context, the paper presents a comprehensive review of text extraction and data integration techniques to enable this automation process in an ecological context. The paper investigates into extracting valuable floristic information from a historical Botany journal. The purpose behind this extraction is to bring to light relevant pieces of information contained within the document. In addition, the paper also explores the need to integrate the extracted information together with other related information from disparate sources. All the information is then rendered into a query-able form in order to make unified queries. Hence, the paper makes use of a combination of Machine Learning, Natural Language Processing and Semantic Web techniques to achieve this. The proposed approach is demonstrated through the information extracted from the journal and the information-rich queries made through the integration process. The paper shows that the approach has a merit in extracting relevant information from the journal, discusses how the machine learning models have been designed to classify complex information and also gives a measure of their performance. The paper also shows that the approach has a merit in query time in regard to querying floristic information from a multi-source linked data model.

Collapse

Automatic Text Summarization of Biomedical Text Data: A Systematic Review. INFORMATION 2022. [DOI: 10.3390/info13080393] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

Automated medical chart review for breast cancer outcomes research: a novel natural language processing extraction system. BMC Med Res Methodol 2022;22:136. [PMID: 35549854 PMCID: PMC9101856 DOI: 10.1186/s12874-022-01583-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 03/15/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Manually extracted data points from health records are collated on an institutional, provincial, and national level to facilitate clinical research. However, the labour-intensive clinical chart review process puts an increasing burden on healthcare system budgets. Therefore, an automated information extraction system is needed to ensure the timeliness and scalability of research data.

METHODS

We used a dataset of 100 synoptic operative and 100 pathology reports, evenly split into 50 reports in training and test sets for each report type. The training set guided our development of a Natural Language Processing (NLP) extraction pipeline system, which accepts scanned images of operative and pathology reports. The system uses a combination of rule-based and transfer learning methods to extract numeric encodings from text. We also developed visualization tools to compare the manual and automated extractions. The code for this paper was made available on GitHub.

RESULTS

A test set of 50 operative and 50 pathology reports were used to evaluate the extraction accuracies of the NLP pipeline. Gold standard, defined as manual extraction by expert reviewers, yielded accuracies of 90.5% for operative reports and 96.0% for pathology reports, while the NLP system achieved overall 91.9% (operative) and 95.4% (pathology) accuracy. The pipeline successfully extracted outcomes data pertinent to breast cancer tumor characteristics (e.g. presence of invasive carcinoma, size, histologic type), prognostic factors (e.g. number of lymph nodes with micro-metastases and macro-metastases, pathologic stage), and treatment-related variables (e.g. margins, neo-adjuvant treatment, surgical indication) with high accuracy. Out of the 48 variables across operative and pathology codebooks, NLP yielded 43 variables with F-scores of at least 0.90; in comparison, a trained human annotator yielded 44 variables with F-scores of at least 0.90.

CONCLUSIONS

The NLP system achieves near-human-level accuracy in both operative and pathology reports using a minimal curated dataset. This system uniquely provides a robust solution for transparent, adaptable, and scalable automation of data extraction from patient health records. It may serve to advance breast cancer clinical research by facilitating collection of vast amounts of valuable health data at a population level.

Collapse

Yoo S, Yoon E, Boo D, Kim B, Kim S, Paeng JC, Yoo IR, Choi IY, Kim K, Ryoo HG, Lee SJ, Song E, Joo YH, Kim J, Lee HY. Transforming Thyroid Cancer Diagnosis and Staging Information from Unstructured Reports to the Observational Medical Outcome Partnership Common Data Model. Appl Clin Inform 2022;13:521-531. [PMID: 35705182 PMCID: PMC9200482 DOI: 10.1055/s-0042-1748144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open

Abstract

BACKGROUND

Cancer staging information is an essential component of cancer research. However, the information is primarily stored as either a full or semistructured free-text clinical document which is limiting the data use. By transforming the cancer-specific data to the Observational Medical Outcome Partnership Common Data Model (OMOP CDM), the information can contribute to establish multicenter observational cancer studies. To the best of our knowledge, there have been no studies on OMOP CDM transformation and natural language processing (NLP) for thyroid cancer to date.

OBJECTIVE

We aimed to demonstrate the applicability of the OMOP CDM oncology extension module for thyroid cancer diagnosis and cancer stage information by processing free-text medical reports.

METHODS

Thyroid cancer diagnosis and stage-related modifiers were extracted with rule-based NLP from 63,795 thyroid cancer pathology reports and 56,239 Iodine whole-body scan reports from three medical institutions in the Observational Health Data Sciences and Informatics data network. The data were converted into the OMOP CDM v6.0 according to the OMOP CDM oncology extension module. The cancer staging group was derived and populated using the transformed CDM data.

RESULTS

The extracted thyroid cancer data were completely converted into the OMOP CDM. The distributions of histopathological types of thyroid cancer were approximately 95.3 to 98.8% of papillary carcinoma, 0.9 to 3.7% of follicular carcinoma, 0.04 to 0.54% of adenocarcinoma, 0.17 to 0.81% of medullary carcinoma, and 0 to 0.3% of anaplastic carcinoma. Regarding cancer staging, stage-I thyroid cancer accounted for 55 to 64% of the cases, while stage III accounted for 24 to 26% of the cases. Stage-II and -IV thyroid cancers were detected at a low rate of 2 to 6%.

CONCLUSION

As a first study on OMOP CDM transformation and NLP for thyroid cancer, this study will help other institutions to standardize thyroid cancer-specific data for retrospective observational research and participate in multicenter studies.

Collapse

Affiliation(s)

Sooyoung Yoo Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
Eunsil Yoon Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
Dachung Boo Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
Borham Kim Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
Seok Kim Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
Jin Chul Paeng Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea
Ie Ryung Yoo Division of Nuclear Medicine, Department of Radiology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, South Korea
In Young Choi Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea.,Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea
Kwangsoo Kim Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Seoul, South Korea
Hyun Gee Ryoo Department of Nuclear Medicine, Seoul National University Hospital, Seoul, South Korea.,Department of Nuclear Medicine, Seoul National University Bundang Hospital, Seongnam, South Korea
Sun Jung Lee Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea.,Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea
Eunhye Song Department of Data Science Research, Innovative Medical Technology Research Institute, Seoul National University Hospital, Seoul, South Korea
Young-Hwan Joo Biomedical Research Institute, Seoul National University Hospital, Seoul, South Korea
Junmo Kim Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, South Korea
Ho-Young Lee Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea.,Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea

Collapse

Dipnall JF, Lu J, Gabbe BJ, Cosic F, Edwards E, Page R, Du L. Comparison of state-of-the-art machine and deep learning algorithms to classify proximal humeral fractures using radiology text. Eur J Radiol 2022;153:110366. [DOI: 10.1016/j.ejrad.2022.110366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/08/2022] [Accepted: 05/16/2022] [Indexed: 12/01/2022]

Viscosi C, Fidelbo P, Benedetto A, Varvarà M, Ferrante M. Selection of diagnosis with oncologic relevance information from histopathology free text reports: A machine learning approach. Int J Med Inform 2022;160:104714. [DOI: 10.1016/j.ijmedinf.2022.104714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 01/22/2022] [Accepted: 02/03/2022] [Indexed: 10/19/2022]

Rule-Based Information Extraction from Free-Text Pathology Reports Reveals Trends in South African Female Breast Cancer Molecular Subtypes and Ki67 Expression. BIOMED RESEARCH INTERNATIONAL 2022;2022:6157861. [PMID: 35355821 PMCID: PMC8960023 DOI: 10.1155/2022/6157861] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 12/29/2021] [Indexed: 12/23/2022]

Chinese named-entity recognition via self-attention mechanism and position-aware influence propagation embedding. DATA KNOWL ENG 2022. [DOI: 10.1016/j.datak.2022.101983] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Musa IH, Afolabi LO, Zamit I, Musa TH, Musa HH, Tassang A, Akintunde TY, Li W. Artificial Intelligence and Machine Learning in Cancer Research: A Systematic and Thematic Analysis of the Top 100 Cited Articles Indexed in Scopus Database. Cancer Control 2022;29:10732748221095946. [PMID: 35688650 PMCID: PMC9189515 DOI: 10.1177/10732748221095946] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

Zaccaria GM, Colella V, Colucci S, Clemente F, Pavone F, Vegliante MC, Esposito F, Opinto G, Scattone A, Loseto G, Minoia C, Rossini B, Quinto AM, Angiulli V, Grieco LA, Fama A, Ferrero S, Moia R, Di Rocco A, Quaglia FM, Tabanelli V, Guarini A, Ciavarella S. Electronic case report forms generation from pathology reports by ARGO, automatic record generator for onco-hematology. Sci Rep 2021;11:23823. [PMID: 34893665 PMCID: PMC8664934 DOI: 10.1038/s41598-021-03204-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 11/23/2021] [Indexed: 12/04/2022] Open

Affiliation(s)

Gian Maria Zaccaria Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy.
Vito Colella Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
Simona Colucci Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
Felice Clemente Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
Fabio Pavone Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
Maria Carmela Vegliante Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
Flavia Esposito Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy.,Department of Mathematics, University of Bari Aldo Moro, Bari, Italy
Giuseppina Opinto Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
Anna Scattone Pathology Department, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
Giacomo Loseto Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
Carla Minoia Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
Bernardo Rossini Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
Angela Maria Quinto Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
Vito Angiulli Clinical Engineering Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
Luigi Alfredo Grieco Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
Angelo Fama Hematology, Azienda USL - IRCCS Di Reggio Emilia, Reggio Emilia, Italy
Simone Ferrero Division of Hematology 1, AOU "Città Della Salute e Della Scienza di Torino", Torino, Italy.,Department of Molecular Biotechnologies and Health Sciences, University of Torino, Torino, Italy
Riccardo Moia Division of Hematology, Azienda Ospedaliero-Universitaria Maggiore Della Carità Di Novara, Novara, Italy
Alice Di Rocco Unit of Hematology, Azienda Ospedaliero-Universitaria Policlinico Umberto I, Roma, Italy
Francesca Maria Quaglia Department of Medicine, Section of Hematology, University of Verona, Verona, Italy
Valentina Tabanelli Division of Diagnostic Haematopathology, European Institute of Oncology, IRCCS, Milano, Italy
Attilio Guarini Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
Sabino Ciavarella Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy

Collapse

Cassim N, Mapundu M, Olago V, Celik T, George JA, Glencross DK. Using text mining techniques to extract prostate cancer predictive information (Gleason score) from semi-structured narrative laboratory reports in the Gauteng province, South Africa. BMC Med Inform Decis Mak 2021;21:330. [PMID: 34823522 PMCID: PMC8614040 DOI: 10.1186/s12911-021-01697-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 11/18/2021] [Indexed: 12/24/2022] Open

Abstract

Background

Prostate cancer (PCa) is the leading male neoplasm in South Africa with an age-standardised incidence rate of 68.0 per 100,000 population in 2018. The Gleason score (GS) is the strongest predictive factor for PCa treatment and is embedded within semi-structured prostate biopsy narrative reports. The manual extraction of the GS is labour-intensive. The objective of our study was to explore the use of text mining techniques to automate the extraction of the GS from irregularly reported text-intensive patient reports.

Methods

We used the associated Systematized Nomenclature of Medicine clinical terms morphology and topography codes to identify prostate biopsies with a PCa diagnosis for men aged > 30 years between 2006 and 2016 in the Gauteng Province, South Africa. We developed a text mining algorithm to extract the GS from 1000 biopsy reports with a PCa diagnosis from the National Health Laboratory Service database and validated the algorithm using 1000 biopsies from the private sector. The logical steps for the algorithm were data acquisition, pre-processing, feature extraction, feature value representation, feature selection, information extraction, classification, and discovered knowledge. We evaluated the algorithm using precision, recall and F-score. The GS was manually coded by two experts for both datasets. The top five GS were reported, with the remaining scores categorised as “Other” for both datasets. The percentage of biopsies with a high-risk GS (≥ 8) was also reported.

Results

The first output reported an F-score of 0.99 that improved to 1.00 after the algorithm was amended (the GS reported in clinical history was ignored). For the validation dataset, an F-score of 0.99 was reported. The most commonly reported GS were 5 + 4 = 9 (17.6%), 3 + 3 = 6 (17.5%), 4 + 3 = 7 (16.4%), 3 + 4 = 7 (14.7%) and 4 + 4 = 8 (14.2%). For the validation dataset, the most commonly reported GS were: (i) 3 + 3 = 6 (37.7%), (ii) 3 + 4 = 7 (19.4%), (iii) 4 + 3 = 7 (14.9%), (iv) 4 + 4 = 8 (10.0%) and (v) 4 + 5 = 9 (7.4%). A high-risk GS was reported for 31.8% compared to 17.4% for the validation dataset.

Conclusions

We demonstrated reliable extraction of information about GS from narrative text-based patient reports using an in-house developed text mining algorithm. A secondary outcome was that late presentation could be assessed.

Collapse

Xiao Y, Zheng X, Song W, Tong F, Mao Y, Liu S, Zhao D. CIDO-COVID-19: An Ontology for COVID-19 Based on CIDO. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021;2021:2119-2122. [PMID: 34891707 DOI: 10.1109/embc46164.2021.9629555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Lin C, Lee YT, Wu FJ, Lin SA, Hsu CJ, Lee CC, Tsai DJ, Fang WH. The Application of Projection Word Embeddings on Medical Records Scoring System. Healthcare (Basel) 2021;9:healthcare9101298. [PMID: 34682978 PMCID: PMC8544381 DOI: 10.3390/healthcare9101298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/24/2021] [Accepted: 09/28/2021] [Indexed: 11/16/2022] Open

Abstract

Medical records scoring is important in a health care system. Artificial intelligence (AI) with projection word embeddings has been validated in its performance disease coding tasks, which maintain the vocabulary diversity of open internet databases and the medical terminology understanding of electronic health records (EHRs). We considered that an AI-enhanced system might be also applied to automatically score medical records. This study aimed to develop a series of deep learning models (DLMs) and validated their performance in medical records scoring task. We also analyzed the practical value of the best model. We used the admission medical records from the Tri-Services General Hospital during January 2016 to May 2020, which were scored by our visiting staffs with different levels from different departments. The medical records were scored ranged 0 to 10. All samples were divided into a training set (n = 74,959) and testing set (n = 152,730) based on time, which were used to train and validate the DLMs, respectively. The mean absolute error (MAE) was used to evaluate each DLM performance. In original AI medical record scoring, the predicted score by BERT architecture is closer to the actual reviewer score than the projection word embedding and LSTM architecture. The original MAE is 0.84 ± 0.27 using the BERT model, and the MAE is 1.00 ± 0.32 using the LSTM model. Linear mixed model can be used to improve the model performance, and the adjusted predicted score was closer compared to the original score. However, the project word embedding with the LSTM model (0.66 ± 0.39) provided better performance compared to BERT (0.70 ± 0.33) after linear mixed model enhancement (p < 0.001). In addition to comparing different architectures to score the medical records, this study further uses a mixed linear model to successfully adjust the AI medical record score to make it closer to the actual physician's score.

Collapse

Affiliation(s)

Chin Lin School of Medicine, National Defense Medical Center, Taipei 114, Taiwan; School of Public Health, National Defense Medical Center, Taipei 114, Taiwan Graduate Institute of Life Sciences, National Defense Medical Center, Taipei 114, Taiwan Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
Yung-Tsai Lee Division of Cardiovascular Surgery, Cheng Hsin Rehabilitation and Medical Center, Taipei 112, Taiwan;
Feng-Jen Wu Department of Informatics, Taoyuan Armed Forces General Hospital, Taoyuan 325, Taiwan;
Shing-An Lin Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.)
Chia-Jung Hsu Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.)
Chia-Cheng Lee Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.) Division of Colorectal Surgery, Department of Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
Dung-Jang Tsai School of Public Health, National Defense Medical Center, Taipei 114, Taiwan Graduate Institute of Life Sciences, National Defense Medical Center, Taipei 114, Taiwan Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan Correspondence: (D.-J.T.); (W.-H.F.); Tel.: +886-2-8792-3100 (ext. #18305) (D.-J.T.); +886-2-8792-3100 (ext. #12322) (W.-H.F.); Fax: +886-2-8792-3147 (D.-J.T. & W.-H.F.)
Wen-Hui Fang Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan Department of Family and Community Medicine, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan Correspondence: (D.-J.T.); (W.-H.F.); Tel.: +886-2-8792-3100 (ext. #18305) (D.-J.T.); +886-2-8792-3100 (ext. #12322) (W.-H.F.); Fax: +886-2-8792-3147 (D.-J.T. & W.-H.F.)

Collapse

Conceição SIR, Couto FM. Text Mining for Building Biomedical Networks Using Cancer as a Case Study. Biomolecules 2021;11:biom11101430. [PMID: 34680062 PMCID: PMC8533101 DOI: 10.3390/biom11101430] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/24/2021] [Accepted: 09/27/2021] [Indexed: 12/15/2022] Open

Rossi KR, Echeverria D, Carroll A, Luse T, Rennix C. Development and Evaluation of Perl-Based Algorithms to Classify Neoplasms From Pathology Records in Synoptic Report Format. JCO Clin Cancer Inform 2021;5:295-303. [PMID: 33760628 DOI: 10.1200/cci.20.00152] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

PURPOSE

Synoptic reporting provides a mechanism for uniform and structured pathology diagnostics. This paper demonstrates the functionality of Perl alternation and grouping expressions to classify electronic pathology reports generated from military treatment facilities. Eight Perl-based algorithms are validated to classify malignant melanoma, Hodgkin lymphoma, non-Hodgkin lymphoma, leukemia, and malignant neoplasms of the breast, ovary, testis, and thyroid.

METHODS

Case finding cohorts were developed using diagnostic codes for neoplasm groups and matched by unique identifiers to obtain pathology records. Preprocessing techniques and Perl-based algorithms were applied to classify records as malignant, in situ, suspect, or nonapplicable, followed by a hand-review process to determine the accuracy of the algorithm classifications. Interrater reliability, sensitivity, specificity, positive predictive values, and negative predictive values were computed following abstractor adjudication.

RESULTS

The specificity of the Perl-based algorithms was consistently high, over 98%. Very few benign results were classified as malignant or in situ by the Perl-based algorithms; the leukemia algorithm classification was the only group to demonstrate a positive predictive value below 95%, at 91.9%. Three algorithm classification groups demonstrated a sensitivity of < 80%, including malignant neoplasm of the ovary (33.3%), leukemia (52.8%), and non-Hodgkin lymphoma (62.9%). The pathology records for these results included substantial linguistic variation.

CONCLUSION

This paper contextualizes the utility and value of an algorithm logic built around synoptic reporting to identify neoplasms from electronic pathology results. The major strength includes the application of Perl-based coding in SAS, an accessible software application, to develop highly specific algorithms across institutional variation in diagnostic documentation.

Collapse

Thompson BS, Hardy S, Pandeya N, Dusingize JC, Green AC, Millane A, Bourke D, Grande R, Bean CD, Olsen CM, Whiteman DC. Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers. JCO Clin Cancer Inform 2021;4:711-723. [PMID: 32755460 PMCID: PMC7469600 DOI: 10.1200/cci.19.00152] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Abstract

PURPOSE

Keratinocyte cancers are exceedingly common in high-risk populations, but accurate measures of incidence are seldom derived because the burden of manually reviewing pathology reports to extract relevant diagnostic information is excessive. Thus, we sought to develop supervised learning algorithms for classifying basal and squamous cell carcinomas and other diagnoses, as well as disease site, and incorporate these into a Web application capable of processing large numbers of pathology reports.

METHODS

Participants in the QSkin study were recruited in 2011 and comprised men and women age 40-69 years at baseline (N = 43,794) who were randomly selected from a population register in Queensland, Australia. Histologic data were manually extracted from free-text pathology reports for participants with histologically confirmed keratinocyte cancers for whom a pathology report was available (n = 25,786 reports). This provided a training data set for the development of algorithms capable of deriving diagnosis and site from free-text pathology reports. We calculated agreement statistics between algorithm-derived classifications and 3 independent validation data sets of manually abstracted pathology reports.

RESULTS

The agreement for classifications of basal cell carcinoma (κ = 0.97 and κ = 0.96) and squamous cell carcinoma (κ = 0.93 for both) was almost perfect in 2 validation data sets but was slightly lower for a third (κ = 0.82 and κ = 0.90, respectively). Agreement for total counts of specific diagnoses was also high (κ > 0.8). Similar levels of agreement between algorithm-derived and manually extracted data were observed for classifications of keratoacanthoma and intraepidermal carcinoma.

CONCLUSION

Supervised learning methods were used to develop a Web application capable of accurately and rapidly classifying large numbers of pathology reports for keratinocyte cancers and related diagnoses. Such tools may provide the means to accurately measure subtype-specific skin cancer incidence.

Collapse

Torous VF, Simpson RW, Balani JP, Baras AS, Berman MA, Birdsong GG, Giannico GA, Paner GP, Pettus JR, Sessions Z, Sirintrapun SJ, Srigley JR, Spencer S. College of American Pathologists Cancer Protocols: From Optimizing Cancer Patient Care to Facilitating Interoperable Reporting and Downstream Data Use. JCO Clin Cancer Inform 2021;5:47-55. [PMID: 33439728 PMCID: PMC8140812 DOI: 10.1200/cci.20.00104] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open

Jung E, Jain H, Sinha AP, Gaudioso C. Building a specialized lexicon for breast cancer clinical trial subject eligibility analysis. Health Informatics J 2021;27:1460458221989392. [PMID: 33535885 DOI: 10.1177/1460458221989392] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Deshmukh PR, Phalnikar R. Prognostic elements extraction from documents to detect prognostic stage. Comput Methods Biomech Biomed Engin 2021;25:371-386. [PMID: 34319178 DOI: 10.1080/10255842.2021.1955359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Deshmukh PR, Phalnikar R. Information extraction for prognostic stage prediction from breast cancer medical records using NLP and ML. Med Biol Eng Comput 2021;59:1751-1772. [PMID: 34297300 DOI: 10.1007/s11517-021-02399-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 07/01/2021] [Indexed: 11/24/2022]

Kapoor R, Sleeman WC, Nalluri JJ, Turner P, Bose P, Cherevko A, Srinivasan S, Syed K, Ghosh P, Hagan M, Palta JR. Automated data abstraction for quality surveillance and outcome assessment in radiation oncology. J Appl Clin Med Phys 2021;22:177-187. [PMID: 34101349 PMCID: PMC8292697 DOI: 10.1002/acm2.13308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/22/2021] [Accepted: 05/10/2021] [Indexed: 11/24/2022] Open

Affiliation(s)

Rishabh Kapoor Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA
William C Sleeman Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA
Joseph J Nalluri Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA
Paul Turner Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA
Priyankar Bose Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
Andrii Cherevko Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
Sriram Srinivasan Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA
Khajamoinuddin Syed Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
Preetam Ghosh Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
Michael Hagan Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA
Jatinder R Palta Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA

Collapse

Withall A, Karystianis G, Duncan D, Hwang YI, Hagos Kidane A, Butler T. Domestic Violence in Residential Care Facilities in New South Wales, Australia: A Text Mining Study. THE GERONTOLOGIST 2021;62:223-231. [PMID: 34023902 DOI: 10.1093/geront/gnab068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Indexed: 11/14/2022] Open

Turchin A, Florez Builes LF. Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review. J Diabetes Sci Technol 2021;15:553-560. [PMID: 33736486 PMCID: PMC8120048 DOI: 10.1177/19322968211000831] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Alawad M, Gao S, Qiu JX, Yoon HJ, Blair Christian J, Penberthy L, Mumphrey B, Wu XC, Coyle L, Tourassi G. Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks. J Am Med Inform Assoc 2021;27:89-98. [PMID: 31710668 PMCID: PMC7489089 DOI: 10.1093/jamia/ocz153] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 07/09/2019] [Accepted: 07/22/2019] [Indexed: 11/13/2022] Open

Abstract

OBJECTIVE

We implement 2 different multitask learning (MTL) techniques, hard parameter sharing and cross-stitch, to train a word-level convolutional neural network (CNN) specifically designed for automatic extraction of cancer data from unstructured text in pathology reports. We show the importance of learning related information extraction (IE) tasks leveraging shared representations across the tasks to achieve state-of-the-art performance in classification accuracy and computational efficiency.

MATERIALS AND METHODS

Multitask CNN (MTCNN) attempts to tackle document information extraction by learning to extract multiple key cancer characteristics simultaneously. We trained our MTCNN to perform 5 information extraction tasks: (1) primary cancer site (65 classes), (2) laterality (4 classes), (3) behavior (3 classes), (4) histological type (63 classes), and (5) histological grade (5 classes). We evaluated the performance on a corpus of 95 231 pathology documents (71 223 unique tumors) obtained from the Louisiana Tumor Registry. We compared the performance of the MTCNN models against single-task CNN models and 2 traditional machine learning approaches, namely support vector machine (SVM) and random forest classifier (RFC).

RESULTS

MTCNNs offered superior performance across all 5 tasks in terms of classification accuracy as compared with the other machine learning models. Based on retrospective evaluation, the hard parameter sharing and cross-stitch MTCNN models correctly classified 59.04% and 57.93% of the pathology reports respectively across all 5 tasks. The baseline models achieved 53.68% (CNN), 46.37% (RFC), and 36.75% (SVM). Based on prospective evaluation, the percentages of correctly classified cases across the 5 tasks were 60.11% (hard parameter sharing), 58.13% (cross-stitch), 51.30% (single-task CNN), 42.07% (RFC), and 35.16% (SVM). Moreover, hard parameter sharing MTCNNs outperformed the other models in computational efficiency by using about the same number of trainable parameters as a single-task CNN.

CONCLUSIONS

The hard parameter sharing MTCNN offers superior classification accuracy for automated coding support of pathology documents across a wide range of cancers and multiple information extraction tasks while maintaining similar training and inference time as those of a single task-specific model.

Collapse

Automated Machine Learning for Healthcare and Clinical Notes Analysis. COMPUTERS 2021. [DOI: 10.3390/computers10020024] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Abstract Machine learning (ML) has been slowly entering every aspect of our lives and its positive impact has been astonishing. To accelerate embedding ML in more applications and incorporating it in real-world scenarios, automated machine learning (AutoML) is emerging. The main purpose of AutoML is to provide seamless integration of ML in various industries, which will facilitate better outcomes in everyday tasks. In healthcare, AutoML has been already applied to easier settings with structured data such as tabular lab data. However, there is still a need for applying AutoML for interpreting medical text, which is being generated at a tremendous rate. For this to happen, a promising method is AutoML for clinical notes analysis, which is an unexplored research area representing a gap in ML research. The main objective of this paper is to fill this gap and provide a comprehensive survey and analytical study towards AutoML for clinical notes. To that end, we first introduce the AutoML technology and review its various tools and techniques. We then survey the literature of AutoML in the healthcare industry and discuss the developments specific to clinical settings, as well as those using general AutoML tools for healthcare applications. With this background, we then discuss challenges of working with clinical notes and highlight the benefits of developing AutoML for medical notes processing. Next, we survey relevant ML research for clinical notes and analyze the literature and the field of AutoML in the healthcare industry. Furthermore, we propose future research directions and shed light on the challenges and opportunities this emerging field holds. With this, we aim to assist the community with the implementation of an AutoML platform for medical notes, which if realized can revolutionize patient outcomes. Collapse

Integrating Speculation Detection and Deep Learning to Extract Lung Cancer Diagnosis from Clinical Notes. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11020865] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Karystianis G, Simpson A, Adily A, Schofield P, Greenberg D, Wand H, Nenadic G, Butler T. Prevalence of Mental Illnesses in Domestic Violence Police Records: Text Mining Study. J Med Internet Res 2020;22:e23725. [PMID: 33361056 PMCID: PMC7790609 DOI: 10.2196/23725] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 09/17/2020] [Accepted: 11/23/2020] [Indexed: 01/22/2023] Open

Esmaeili M, Ayyoubzadeh SM, Ahmadinejad N, Ghazisaeedi M, Nahvijou A, Maghooli K. A decision support system for mammography reports interpretation. Health Inf Sci Syst 2020;8:17. [PMID: 32257128 PMCID: PMC7113352 DOI: 10.1007/s13755-020-00109-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 03/30/2020] [Indexed: 02/06/2023] Open

Spasic I, Button K. Patient Triage by Topic Modeling of Referral Letters: Feasibility Study. JMIR Med Inform 2020;8:e21252. [PMID: 33155985 PMCID: PMC7679210 DOI: 10.2196/21252] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 09/17/2020] [Accepted: 10/05/2020] [Indexed: 01/22/2023] Open

Abstract

Background

Musculoskeletal conditions are managed within primary care, but patients can be referred to secondary care if a specialist opinion is required. The ever-increasing demand for health care resources emphasizes the need to streamline care pathways with the ultimate aim of ensuring that patients receive timely and optimal care. Information contained in referral letters underpins the referral decision-making process but is yet to be explored systematically for the purposes of treatment prioritization for musculoskeletal conditions.

Objective

This study aims to explore the feasibility of using natural language processing and machine learning to automate the triage of patients with musculoskeletal conditions by analyzing information from referral letters. Specifically, we aim to determine whether referral letters can be automatically assorted into latent topics that are clinically relevant, that is, considered relevant when prescribing treatments. Here, clinical relevance is assessed by posing 2 research questions. Can latent topics be used to automatically predict treatment? Can clinicians interpret latent topics as cohorts of patients who share common characteristics or experiences such as medical history, demographics, and possible treatments?

Methods

We used latent Dirichlet allocation to model each referral letter as a finite mixture over an underlying set of topics and model each topic as an infinite mixture over an underlying set of topic probabilities. The topic model was evaluated in the context of automating patient triage. Given a set of treatment outcomes, a binary classifier was trained for each outcome using previously extracted topics as the input features of the machine learning algorithm. In addition, a qualitative evaluation was performed to assess the human interpretability of topics.

Results

The prediction accuracy of binary classifiers outperformed the stratified random classifier by a large margin, indicating that topic modeling could be used to predict the treatment, thus effectively supporting patient triage. The qualitative evaluation confirmed the high clinical interpretability of the topic model.

Conclusions

The results established the feasibility of using natural language processing and machine learning to automate triage of patients with knee or hip pain by analyzing information from their referral letters.

Collapse

Quiroz JC, Laranjo L, Tufanaru C, Kocaballi AB, Rezazadegan D, Berkovsky S, Coiera E. Empirical analysis of Zipf's law, power law, and lognormal distributions in medical discharge reports. Int J Med Inform 2020;145:104324. [PMID: 33181446 DOI: 10.1016/j.ijmedinf.2020.104324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 10/04/2020] [Accepted: 10/29/2020] [Indexed: 11/15/2022]

Deshmukh PR, Phalnikar R. Anatomic stage extraction from medical reports of breast Cancer patients using natural language processing. HEALTH AND TECHNOLOGY 2020. [DOI: 10.1007/s12553-020-00479-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Bozkurt S, Paul R, Coquet J, Sun R, Banerjee I, Brooks JD, Hernandez-Boussard T. Phenotyping severity of patient-centered outcomes using clinical notes: A prostate cancer use case. Learn Health Syst 2020;4:e10237. [PMID: 33083539 PMCID: PMC7556418 DOI: 10.1002/lrh2.10237] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 06/15/2020] [Accepted: 06/23/2020] [Indexed: 01/12/2023] Open

Abstract

Introduction

A learning health system (LHS) must improve care in ways that are meaningful to patients, integrating patient‐centered outcomes (PCOs) into core infrastructure. PCOs are common following cancer treatment, such as urinary incontinence (UI) following prostatectomy. However, PCOs are not systematically recorded because they can only be described by the patient, are subjective and captured as unstructured text in the electronic health record (EHR). Therefore, PCOs pose significant challenges for phenotyping patients. Here, we present a natural language processing (NLP) approach for phenotyping patients with UI to classify their disease into severity subtypes, which can increase opportunities to provide precision‐based therapy and promote a value‐based delivery system.

Methods

Patients undergoing prostate cancer treatment from 2008 to 2018 were identified at an academic medical center. Using a hybrid NLP pipeline that combines rule‐based and deep learning methodologies, we classified positive UI cases as mild, moderate, and severe by mining clinical notes.

Results

The rule‐based model accurately classified UI into disease severity categories (accuracy: 0.86), which outperformed the deep learning model (accuracy: 0.73). In the deep learning model, the recall rates for mild and moderate group were higher than the precision rate (0.78 and 0.79, respectively). A hybrid model that combined both methods did not improve the accuracy of the rule‐based model but did outperform the deep learning model (accuracy: 0.75).

Conclusion

Phenotyping patients based on indication and severity of PCOs is essential to advance a patient centered LHS. EHRs contain valuable information on PCOs and by using NLP methods, it is feasible to accurately and efficiently phenotype PCO severity. Phenotyping must extend beyond the identification of disease to provide classification of disease severity that can be used to guide treatment and inform shared decision‐making. Our methods demonstrate a path to a patient centered LHS that could advance precision medicine.

Collapse

Identification of Malignancies from Free-Text Histopathology Reports Using a Multi-Model Supervised Machine Learning Approach. INFORMATION 2020. [DOI: 10.3390/info11090455] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Sanders C, Nahar P, Small N, Hodgson D, Ong BN, Dehghan A, Sharp CA, Dixon WG, Lewis S, Kontopantelis E, Daker-White G, Bower P, Davies L, Kayesh H, Spencer R, McAvoy A, Boaden R, Lovell K, Ainsworth J, Nowakowska M, Shepherd A, Cahoon P, Hopkins R, Allen D, Lewis A, Nenadic G. Digital methods to enhance the usefulness of patient experience data in services for long-term conditions: the DEPEND mixed-methods study. HEALTH SERVICES AND DELIVERY RESEARCH 2020. [DOI: 10.3310/hsdr08280] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Abstract Background Collecting NHS patient experience data is critical to ensure the delivery of high-quality services. Data are obtained from multiple sources, including service-specific surveys and widely used generic surveys. There are concerns about the timeliness of feedback, that some groups of patients and carers do not give feedback and that free-text feedback may be useful but is difficult to analyse. Objective To understand how to improve the collection and usefulness of patient experience data in services for people with long-term conditions using digital data capture and improved analysis of comments. Design The DEPEND study is a mixed-methods study with four parts: qualitative research to explore the perspectives of patients, carers and staff; use of computer science text-analytics methods to analyse comments; co-design of new tools to improve data collection and usefulness; and implementation and process evaluation to assess use of the tools and any impacts. Setting Services for people with severe mental illness and musculoskeletal conditions at four sites as exemplars to reflect both mental health and physical long-terms conditions: an acute trust (site A), a mental health trust (site B) and two general practices (sites C1 and C2). Participants A total of 100 staff members with diverse roles in patient experience management, clinical practice and information technology; 59 patients and 21 carers participated in the qualitative research components. Interventions The tools comprised a digital survey completed using a tablet device (kiosk) or a pen and paper/online version; guidance and information for patients, carers and staff; text-mining programs; reporting templates; and a process for eliciting and recording verbal feedback in community mental health services. Results We found a lack of understanding and experience of the process of giving feedback. People wanted more meaningful and informal feedback to suit local contexts. Text mining enabled systematic analysis, although challenges remained, and qualitative analysis provided additional insights. All sites managed to collect feedback digitally; however, there was a perceived need for additional resources, and engagement varied. Observation indicated that patients were apprehensive about using kiosks but often would participate with support. The process for collecting and recording verbal feedback in mental health services made sense to participants, but was not successfully adopted, with staff workload and technical problems often highlighted as barriers. Staff thought that new methods were insightful, but observation did not reveal changes in services during the testing period. Conclusions The use of digital methods can produce some improvements in the collection and usefulness of feedback. Context and flexibility are important, and digital methods need to be complemented with alternative methods. Text mining can provide useful analysis for reporting on large data sets within large organisations, but qualitative analysis may be more useful for small data sets and in small organisations. Limitations New practices need time and support to be adopted and this study had limited resources and a limited testing time. Future work Further research is needed to improve text-analysis methods for routine use in services and to evaluate the impact of methods (digital and non-digital) on service improvement in varied contexts and among diverse patients and carers. Funding This project was funded by the NIHR Health Services and Delivery Research programme and will be published in full in Health Services and Delivery Research; Vol. 8, No. 28. See the NIHR Journals Library website for further project information. Collapse

Affiliation(s)

Caroline Sanders National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
Papreen Nahar National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
Nicola Small National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
Damian Hodgson Alliance Manchester Business School, University of Manchester, Manchester, UK
Bie Nio Ong National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
Azad Dehghan Department of Computer Science, University of Manchester, Manchester, UK
Charlotte A Sharp National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
William G Dixon Centre for Epidemiology Versus Arthritis, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK
Shôn Lewis Division of Psychology and Mental Health, University of Manchester, Manchester, UK
Evangelos Kontopantelis National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
Gavin Daker-White National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
Peter Bower National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
Linda Davies Centre for Health Economics, University of Manchester, Manchester, UK
Humayun Kayesh Department of Computer Science, University of Manchester, Manchester, UK
Rebecca Spencer National Institute for Health Research Collaboration for Leadership in Applied Health Research and Care Greater Manchester, Salford Royal NHS Foundation Trust, Salford, UK
Aneela McAvoy National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK National Institute for Health Research Collaboration for Leadership in Applied Health Research and Care Greater Manchester, Salford Royal NHS Foundation Trust, Salford, UK
Ruth Boaden Alliance Manchester Business School, University of Manchester, Manchester, UK National Institute for Health Research Collaboration for Leadership in Applied Health Research and Care Greater Manchester, Salford Royal NHS Foundation Trust, Salford, UK
Karina Lovell National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK Division of Nursing, Midwifery and Social Work, University of Manchester, Manchester, UK
John Ainsworth Centre for Health Informatics, University of Manchester, Manchester, UK
Magdalena Nowakowska National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
Andrew Shepherd National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
Patrick Cahoon Greater Manchester Mental Health NHS Foundation Trust, Manchester, UK
Richard Hopkins Greater Manchester Mental Health NHS Foundation Trust, Manchester, UK
Dawn Allen Patient and public representative
Annmarie Lewis Patient and public representative
Goran Nenadic Department of Computer Science, University of Manchester, Manchester, UK

Collapse

Labrosse J, Lam T, Sebbag C, Benque M, Abdennebi I, Merckelbagh H, Osdoit M, Priour M, Guerin J, Balezeau T, Grandal B, Coussy F, Bobrie A, Ferrer L, Laas E, Feron JG, Reyal F, Hamy AS. Text Mining in Electronic Medical Records Enables Quick and Efficient Identification of Pregnancy Cases Occurring After Breast Cancer. JCO Clin Cancer Inform 2020;3:1-12. [PMID: 31626565 DOI: 10.1200/cci.19.00031] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open

Krsnik I, Glavaš G, Krsnik M, Miletić D, Štajduhar I. Automatic Annotation of Narrative Radiology Reports. Diagnostics (Basel) 2020;10:E196. [PMID: 32244833 PMCID: PMC7235892 DOI: 10.3390/diagnostics10040196] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 03/27/2020] [Accepted: 03/27/2020] [Indexed: 12/04/2022] Open

Abstract

Narrative texts in electronic health records can be efficiently utilized for building decision support systems in the clinic, only if they are correctly interpreted automatically in accordance with a specified standard. This paper tackles the problem of developing an automated method of labeling free-form radiology reports, as a precursor for building query-capable report databases in hospitals. The analyzed dataset consists of 1295 radiology reports concerning the condition of a knee, retrospectively gathered at the Clinical Hospital Centre Rijeka, Croatia. Reports were manually labeled with one or more labels from a set of 10 most commonly occurring clinical conditions. After primary preprocessing of the texts, two sets of text classification methods were compared: (1) traditional classification models-Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), and Random Forests (RF)-coupled with Bag-of-Words (BoW) features (i.e., symbolic text representation) and (2) Convolutional Neural Network (CNN) coupled with dense word vectors (i.e., word embeddings as a semantic text representation) as input features. We resorted to nested 10-fold cross-validation to evaluate the performance of competing methods using accuracy, precision, recall, and F 1 score. The CNN with semantic word representations as input yielded the overall best performance, having a micro-averaged F 1 score of 86 . 7 % . The CNN classifier yielded particularly encouraging results for the most represented conditions: degenerative disease ( 95 . 9 % ), arthrosis ( 93 . 3 % ), and injury ( 89 . 2 % ). As a data-hungry deep learning model, the CNN, however, performed notably worse than the competing models on underrepresented classes with fewer training instances such as multicausal disease or metabolic disease. LR, RF, and SVM performed comparably well, with the obtained micro-averaged F 1 scores of 84 . 6 % , 82 . 2 % , and 82 . 1 % , respectively.

Collapse

Spasic I, Nenadic G. Clinical Text Data in Machine Learning: Systematic Review. JMIR Med Inform 2020;8:e17984. [PMID: 32229465 PMCID: PMC7157505 DOI: 10.2196/17984] [Citation(s) in RCA: 110] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 02/24/2020] [Accepted: 02/24/2020] [Indexed: 12/22/2022] Open

Abstract

Background

Clinical narratives represent the main form of communication within health care, providing a personalized account of patient history and assessments, and offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data.

Objective

The main aim of this study was to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigated the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice.

Methods

Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified 110 relevant studies and extracted information about text data used to support machine learning, NLP tasks supported, and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation, and any relevant statistics.

Results

The majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents, with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing the predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free-text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable because of the sensitive nature of data considered. Besides the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The majority of studies focused on text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management, and surveillance.

Conclusions

We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which do not require data annotation.

Collapse

DES-ROD: Exploring Literature to Develop New Links between RNA Oxidation and Human Diseases. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2020;2020:5904315. [PMID: 32308806 PMCID: PMC7142358 DOI: 10.1155/2020/5904315] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 02/21/2020] [Indexed: 12/27/2022]

Abstract

Normal cellular physiology and biochemical processes require undamaged RNA molecules. However, RNAs are frequently subjected to oxidative damage. Overproduction of reactive oxygen species (ROS) leads to RNA oxidation and disturbs redox (oxidation-reduction reaction) homeostasis. When oxidation damage affects RNA carrying protein-coding information, this may result in the synthesis of aberrant proteins as well as a lower efficiency of translation. Both of these, as well as imbalanced redox homeostasis, may lead to numerous human diseases. The number of studies on the effects of RNA oxidative damage in mammals is increasing by year due to the understanding that this oxidation fundamentally leads to numerous human diseases. To enable researchers in this field to explore information relevant to RNA oxidation and effects on human diseases, we developed DES-ROD, an online knowledgebase that contains processed information from 298,603 relevant documents that consist of PubMed abstracts and PubMed Central full-text articles. The system utilizes concepts/terms from 38 curated thematic dictionaries mapped to the analyzed documents. Researchers can explore enriched concepts, as well as enriched pairs of putatively associated concepts. In this way, one can explore mutual relationships between any combinations of two concepts from used dictionaries. Dictionaries cover a wide range of biomedical topics, such as human genes and proteins, pathways, Gene Ontology categories, mutations, noncoding RNAs, enzymes, toxins, metabolites, and diseases. This makes insights into different facets of the effects of RNA oxidation and the control of this process possible. The usefulness of the DES-ROD system is demonstrated by case studies on some known information, as well as potentially novel information involving RNA oxidation and diseases. DES-ROD is the first knowledgebase based on text and data mining that focused on the exploration of RNA oxidation and human diseases.

Collapse

Hsiao YW, Lu TP. Text-mining in cancer research may help identify effective treatments. Transl Lung Cancer Res 2020;8:S460-S463. [PMID: 32038938 DOI: 10.21037/tlcr.2019.12.20] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

Karystianis G, Florez-Vargas O, Butler T, Nenadic G. A rule-based approach to identify patient eligibility criteria for clinical trials from narrative longitudinal records. JAMIA Open 2020;2:521-527. [PMID: 32025649 PMCID: PMC6993990 DOI: 10.1093/jamiaopen/ooz041] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 07/16/2019] [Accepted: 10/03/2019] [Indexed: 11/13/2022] Open

The evolution of Health & Place: Text mining papers published between 1995 and 2018. Health Place 2020;61:102207. [DOI: 10.1016/j.healthplace.2019.102207] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 09/13/2019] [Accepted: 09/13/2019] [Indexed: 01/26/2023]