1
|
Ahmad F, Muhmood T. Clinical translation of nanomedicine with integrated digital medicine and machine learning interventions. Colloids Surf B Biointerfaces 2024; 241:114041. [PMID: 38897022 DOI: 10.1016/j.colsurfb.2024.114041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 06/11/2024] [Accepted: 06/13/2024] [Indexed: 06/21/2024]
Abstract
Nanomaterials based therapeutics transform the ways of disease prevention, diagnosis and treatment with increasing sophistications in nanotechnology at a breakneck pace, but very few could reach to the clinic due to inconsistencies in preclinical studies followed by regulatory hinderances. To tackle this, integrating the nanomedicine discovery with digital medicine provide technologies as tools of specific biological activity measurement. Hence, overcome the redundancies in nanomedicine discovery by the on-site data acquisition and analytics through integrating intelligent sensors and artificial intelligence (AI) or machine learning (ML). Integrated AI/ML wearable sensors directly gather clinically relevant biochemical information from the subject's body and process data for physicians to make right clinical decision(s) in a time and cost-effective way. This review summarizes insights and recommend the infusion of actionable big data computation enabled sensors in burgeoning field of nanomedicine at academia, research institutes, and pharmaceutical industries, with a potential of clinical translation. Furthermore, many blind spots are present in modern clinically relevant computation, one of which could prevent ML-guided low-cost new nanomedicine development from being successfully translated into the clinic was also discussed.
Collapse
Affiliation(s)
- Farooq Ahmad
- State Key Laboratory of Chemistry and Utilization of Carbon Based Energy Resources, College of Chemistry, Xinjiang University, Urumqi 830017, China.
| | - Tahir Muhmood
- International Iberian Nanotechnology Laboratory (INL), Avenida Mestre José Veiga, Braga 4715-330, Portugal.
| |
Collapse
|
2
|
Kapoor R, Sleeman WC, Ghosh P, Palta J. Infrastructure tools to support an effective Radiation Oncology Learning Health System. J Appl Clin Med Phys 2023; 24:e14127. [PMID: 37624227 PMCID: PMC10562037 DOI: 10.1002/acm2.14127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/17/2023] [Accepted: 07/19/2023] [Indexed: 08/26/2023] Open
Abstract
PURPOSE Radiation Oncology Learning Health System (RO-LHS) is a promising approach to improve the quality of care by integrating clinical, dosimetry, treatment delivery, research data in real-time. This paper describes a novel set of tools to support the development of a RO-LHS and the current challenges they can address. METHODS We present a knowledge graph-based approach to map radiotherapy data from clinical databases to an ontology-based data repository using FAIR concepts. This strategy ensures that the data are easily discoverable, accessible, and can be used by other clinical decision support systems. It allows for visualization, presentation, and data analyses of valuable information to identify trends and patterns in patient outcomes. We designed a search engine that utilizes ontology-based keyword searching, synonym-based term matching that leverages the hierarchical nature of ontologies to retrieve patient records based on parent and children classes, connects to the Bioportal database for relevant clinical attributes retrieval. To identify similar patients, a method involving text corpus creation and vector embedding models (Word2Vec, Doc2Vec, GloVe, and FastText) are employed, using cosine similarity and distance metrics. RESULTS The data pipeline and tool were tested with 1660 patient clinical and dosimetry records resulting in 504 180 RDF (Resource Description Framework) tuples and visualized data relationships using graph-based representations. Patient similarity analysis using embedding models showed that the Word2Vec model had the highest mean cosine similarity, while the GloVe model exhibited more compact embeddings with lower Euclidean and Manhattan distances. CONCLUSIONS The framework and tools described support the development of a RO-LHS. By integrating diverse data sources and facilitating data discovery and analysis, they contribute to continuous learning and improvement in patient care. The tools enhance the quality of care by enabling the identification of cohorts, clinical decision support, and the development of clinical studies and machine learning programs in radiation oncology.
Collapse
Affiliation(s)
- Rishabh Kapoor
- Department of Radiation OncologyVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - William C Sleeman
- Department of Radiation OncologyVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Preetam Ghosh
- Department of Radiation OncologyVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Jatinder Palta
- Department of Radiation OncologyVirginia Commonwealth UniversityRichmondVirginiaUSA
| |
Collapse
|
3
|
Luo N, Zhong X, Su L, Cheng Z, Ma W, Hao P. Artificial intelligence-assisted dermatology diagnosis: From unimodal to multimodal. Comput Biol Med 2023; 165:107413. [PMID: 37703714 DOI: 10.1016/j.compbiomed.2023.107413] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/02/2023] [Accepted: 08/28/2023] [Indexed: 09/15/2023]
Abstract
Artificial Intelligence (AI) is progressively permeating medicine, notably in the realm of assisted diagnosis. However, the traditional unimodal AI models, reliant on large volumes of accurately labeled data and single data type usage, prove insufficient to assist dermatological diagnosis. Augmenting these models with text data from patient narratives, laboratory reports, and image data from skin lesions, dermoscopy, and pathologies could significantly enhance their diagnostic capacity. Large-scale pre-training multimodal models offer a promising solution, exploiting the burgeoning reservoir of clinical data and amalgamating various data types. This paper delves into unimodal models' methodologies, applications, and shortcomings while exploring how multimodal models can enhance accuracy and reliability. Furthermore, integrating cutting-edge technologies like federated learning and multi-party privacy computing with AI can substantially mitigate patient privacy concerns in dermatological datasets and further fosters a move towards high-precision self-diagnosis. Diagnostic systems underpinned by large-scale pre-training multimodal models can facilitate dermatology physicians in formulating effective diagnostic and treatment strategies and herald a transformative era in healthcare.
Collapse
Affiliation(s)
- Nan Luo
- Hospital of Chengdu University of Traditional Chinese Medicine, No. 39 Shi-er-qiao Road, Chengdu, 610075, Sichuan, China.
| | - Xiaojing Zhong
- Hospital of Chengdu University of Traditional Chinese Medicine, No. 39 Shi-er-qiao Road, Chengdu, 610075, Sichuan, China.
| | - Luxin Su
- Hospital of Chengdu University of Traditional Chinese Medicine, No. 39 Shi-er-qiao Road, Chengdu, 610075, Sichuan, China.
| | - Zilin Cheng
- Hospital of Chengdu University of Traditional Chinese Medicine, No. 39 Shi-er-qiao Road, Chengdu, 610075, Sichuan, China.
| | - Wenyi Ma
- Hospital of Chengdu University of Traditional Chinese Medicine, No. 39 Shi-er-qiao Road, Chengdu, 610075, Sichuan, China.
| | - Pingsheng Hao
- Hospital of Chengdu University of Traditional Chinese Medicine, No. 39 Shi-er-qiao Road, Chengdu, 610075, Sichuan, China.
| |
Collapse
|
4
|
Berloco F, Ciavarella S, Colucci S, Grieco LA, Guarini A, Zaccaria GM. ARGO 2.0: a Hybrid NLP/ML Framework for Diagnosis Standardization. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083100 DOI: 10.1109/embc40787.2023.10340022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
A relevant problem in medicine is the standardization of the diagnosis associated with a clinical case. Although diagnosis formulation is an intrinsically subjective and uncertain process, its standardization may take benefit from digital solutions automating the routines at the basis of such a decision. In this work, we propose ARGO 2.0: a framework for the development of decision support systems for diagnosis formulation. The framework can read free-text reports and store their clinically relevant information as personalized electronic Case Report Forms. A hybrid strategy, exploiting the synergy of Natural Language Processing and Machine Learning techniques, is used to automatically suggest a diagnosis in a standardized fashion. ARGO 2.0 has been designed to be template-independent and easily tailored to specific medical fields. We here demonstrate its feasibility in hemo lympho-pathology, by detailing its implementation, object of an ongoing validation campaign in a standing medical institute. ARGO 2.0 achieved an average Accuracy of 95.07%, an average precision of 94.85%, an average Recall of 96.31% and a F-Score of 95.32% onto the test set, outperforming both its embedded components, based on Natural Language Processing and Machine Learning.
Collapse
|
5
|
Honghong H, Xin Yi FL, Tianyu GG, Jiangchou MH, Hao Sen AF, Hui San EC, Yen Tze EB, Zhuling ST, Sun Sien HH, Shyi Peng JY, Aixin S, Kheng Sit JL. Natural language processing in urology: Automated extraction of clinical information from histopathology reports of uro-oncology procedures. Heliyon 2023; 9:e14793. [PMID: 37025805 PMCID: PMC10070081 DOI: 10.1016/j.heliyon.2023.e14793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 03/16/2023] [Accepted: 03/17/2023] [Indexed: 03/30/2023] Open
Abstract
Objectives We aimed to automate routine extraction of clinically relevant unstructured information from uro-oncological histopathology reports by applying rule-based and machine learning (ML)/deep learning (DL) methods to develop an oncology focused natural language processing (NLP) algorithm. Methods Our algorithm employs a combination of a rule-based approach and support vector machines/neural networks (BioBert/Clinical BERT), and is optimised for accuracy. We randomly extracted 5772 uro-oncological histology reports from 2008 to 2018 from electronic health records (EHRs) and split the data into training and validation datasets in an 80:20 ratio. The training dataset was annotated by medical professionals and reviewed by cancer registrars. The validation dataset was annotated by cancer registrars and defined as the gold standard with which the algorithm outcomes were compared. The accuracy of NLP-parsed data was matched against these human annotation results. We defined an accuracy rate of >95% as "acceptable" by professional human extraction, as per our cancer registry definition. Results There were 11 extraction variables in 268 free-text reports. We achieved an accuracy rate of between 61.2% and 99.0% using our algorithm. Of the 11 data fields, a total of 8 data fields met the acceptable accuracy standard, while another 3 data fields had an accuracy rate between 61.2% and 89.7%. Noticeably, the rule-based approach was shown to be more effective and robust in extracting variables of interest. On the other hand, ML/DL models had poorer predictive performances due to highly imbalanced data distribution and variable writing styles between different reports and data used for domain-specific pre-trained models. Conclusion We designed an NLP algorithm that can automate clinical information extraction accurately from histopathology reports with an overall average micro accuracy of 93.3%.
Collapse
|
6
|
Analysis of Risk Factors of Coal Chemical Enterprises Based on Text Mining. JOURNAL OF ENVIRONMENTAL AND PUBLIC HEALTH 2023; 2023:4181159. [PMID: 36747503 PMCID: PMC9899145 DOI: 10.1155/2023/4181159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 09/20/2022] [Accepted: 10/10/2022] [Indexed: 01/29/2023]
Abstract
Coal chemical enterprises have many risk factors, and the causes of accidents are complex. The traditional risk assessment methods rely on expert experience and previous literature to determine the causes of accidents, which has the problems such as lack of objectivity and low interpretation ability. Analyzing the accident report helps to identify typical accident risk factors and determines the accident evolution rule. However, experts usually judge this work manually, which is subjective and time-consuming. This paper developed an improved approach to identify safety risk factors from a volume of coal chemical accident reports using text mining (TM) technology. Firstly, the accident report was preprocessed, and the Term Frequency Inverse Document Frequency (TF-IDF) was used for feature extraction. Then, the K-means algorithm and apriori algorithm were developed to cluster and for the association rule analysis of the vectorized documents in the TF-IDF matrix, respectively to quickly identify the hidden risk factors and the relationship between risk factors in the accident report and to propose targeted safety management measures. Using the sample data of 505 accidents in a large coal chemical enterprise in Western China in the past seven years, the enterprise accident reports were analyzed by text clustering analysis and association rule analysis methods. Through the analysis, six accident clusters and 13 association rules were obtained, and the main risk factors of each accident cluster were further mined, and the corresponding management suggestions were put forward for the enterprise. This method provides a new idea for coal chemical enterprises to make safety management decisions and helps to prevent safety accidents.
Collapse
|
7
|
Classification and diagnostic prediction of breast cancer metastasis on clinical data using machine learning algorithms. Sci Rep 2023; 13:485. [PMID: 36627367 PMCID: PMC9831019 DOI: 10.1038/s41598-023-27548-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 01/04/2023] [Indexed: 01/12/2023] Open
Abstract
Metastatic Breast Cancer (MBC) is one of the primary causes of cancer-related deaths in women. Despite several limitations, histopathological information about the malignancy is used for the classification of cancer. The objective of our study is to develop a non-invasive breast cancer classification system for the diagnosis of cancer metastases. The anaconda-Jupyter notebook is used to develop various python programming modules for text mining, data processing, and Machine Learning (ML) methods. Utilizing classification model cross-validation criteria, including accuracy, AUC, and ROC, the prediction performance of the ML models is assessed. Welch Unpaired t-test was used to ascertain the statistical significance of the datasets. Text mining framework from the Electronic Medical Records (EMR) made it easier to separate the blood profile data and identify MBC patients. Monocytes revealed a noticeable mean difference between MBC patients as compared to healthy individuals. The accuracy of ML models was dramatically improved by removing outliers from the blood profile data. A Decision Tree (DT) classifier displayed an accuracy of 83% with an AUC of 0.87. Next, we deployed DT classifiers using Flask to create a web application for robust diagnosis of MBC patients. Taken together, we conclude that ML models based on blood profile data may assist physicians in selecting intensive-care MBC patients to enhance the overall survival outcome.
Collapse
|
8
|
Eggermont C, Wakkee M, Bruggink A, Voorham Q, Schreuder K, Louwman M, Mooyaart A, Hollestein L. Development and Validation of an Algorithm to Identify Patients with Advanced Cutaneous Squamous Cell Carcinoma from Pathology Reports. J Invest Dermatol 2023; 143:98-104.e5. [PMID: 35926654 DOI: 10.1016/j.jid.2022.07.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 06/22/2022] [Accepted: 07/11/2022] [Indexed: 10/16/2022]
Abstract
To facilitate nationwide epidemiological research on advanced cutaneous squamous cell carcinoma (cSCC), that is, locally advanced, recurrent, or metastatic cSCC, we sought to develop and validate a rule-based algorithm that identifies advanced cSCC from pathology reports. The algorithm was based on both hierarchical histopathological codes and free text from pathology reports recorded in the National Pathology Registry. Medical files from the Erasmus Medical Center of 186 patients with stage III/IV/recurrent cSCC and 184 patients with stage I/II cSCC were selected and served as the gold standard to assess the performance of the algorithm. The rule-based algorithm showed a sensitivity of 91.9% (95% confidence interval = 88.0‒95.9), a specificity of 96.7% (95% confidence interval = 94‒2-99.3), and a positive predictive value of 78.5% (95% confidence interval = 74.2‒82.8) for all advanced cSCC combined. The sensitivity was lower per subgroup: locally advanced (52.3‒86.2%), recurrent cSCC (23.3%), and metastatic cSCC (70.0%). The specificity per subgroup was above 97%, and the positive predictive value was above 78%, with the exception of metastatic cSCC, which had a positive predictive value of 62%. This algorithm can be used to identify advanced patients with cSCC from pathology reports and will facilitate large-scale epidemiological studies of advanced cSCC in the Netherlands and internationally after external validation.
Collapse
Affiliation(s)
- Celeste Eggermont
- Department of Dermatology, Erasmus MC Cancer Institute, University Medical Center, Rotterdam, The Netherlands
| | - Marlies Wakkee
- Department of Dermatology, Erasmus MC Cancer Institute, University Medical Center, Rotterdam, The Netherlands
| | - Annette Bruggink
- Nationwide Network and Registry of Histo- and Cytopathology (PALGA), Houten, The Netherlands
| | - Quirinus Voorham
- Nationwide Network and Registry of Histo- and Cytopathology (PALGA), Houten, The Netherlands
| | - Kay Schreuder
- Department of Research and Development, Netherlands Comprehensive Cancer Organization (IKNL), Utrecht, The Netherlands
| | - Marieke Louwman
- Department of Research and Development, Netherlands Comprehensive Cancer Organization (IKNL), Utrecht, The Netherlands
| | - Antien Mooyaart
- Department of Pathology, Erasmus MC Cancer Institute, University Medical Center, Rotterdam, The Netherlands
| | - Loes Hollestein
- Department of Dermatology, Erasmus MC Cancer Institute, University Medical Center, Rotterdam, The Netherlands; Department of Research and Development, Netherlands Comprehensive Cancer Organization (IKNL), Utrecht, The Netherlands.
| |
Collapse
|
9
|
Laurent G, Craynest F, Thobois M, Hajjaji N. Automatic Classification of Tumor Response From Radiology Reports With Rule-Based Natural Language Processing Integrated Into the Clinical Oncology Workflow. JCO Clin Cancer Inform 2023; 7:e2200139. [PMID: 36780606 DOI: 10.1200/cci.22.00139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023] Open
Abstract
PURPOSE Imaging reports in oncology provide critical information about the disease evolution that should be timely shared to tailor the clinical decision making and care coordination of patients with advanced cancer. However, tumor response stays unstructured in free-text and underexploited. Natural language processing (NLP) methods can help provide this critical information into the electronic health records (EHR) in real time to assist health care workers. METHODS A rule-based algorithm was developed using SAS tools to automatically extract and categorize tumor response within progression or no progression categories. 2,970 magnetic resonance imaging, computed tomography scan, and positron emission tomography French reports were extracted from the EHR of a large comprehensive cancer center to build a 2,637-document training set and a 603-document validation set. The model was also tested on 189 imaging reports from 46 different radiology centers. A tumor dashboard was created in the EHR using the Timeline tool of the vis.js javascript library. RESULTS An NLP methodology was applied to create an ontology of radiographic terms defining tumor response, mapping text to five main concepts, and application decision rules on the basis of clinical practice RECIST guidelines. The model achieved an overall accuracy of 0.88 (ranging from 0.87 to 0.94), with similar performance on both progression and no progression classification. The overall accuracy was 0.82 on reports from different radiology centers. Data were visualized and organized in a dynamic tumor response timeline. This tool was deployed successfully at our institution both retrospectively and prospectively as part of an automatic pipeline to screen reports and classify tumor response in real time for all metastatic patients. CONCLUSION Our approach provides an NLP-based framework to structure and classify tumor response from the EHR and integrate tumor response classification into the clinical oncology workflow.
Collapse
Affiliation(s)
- Gery Laurent
- Department of Information Systems, Oscar Lambret Cancer Center, Lille, France
| | - Franck Craynest
- Department of Information Systems, Oscar Lambret Cancer Center, Lille, France
| | - Maxime Thobois
- Department of Information Systems, Oscar Lambret Cancer Center, Lille, France
| | - Nawale Hajjaji
- Department of Medical Oncology, Oscar Lambret Cancer Center, Lille, France.,Inserm, U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), University of Lille, Lille, France
| |
Collapse
|
10
|
Nundloll V, Smail R, Stevens C, Blair G. Automating the extraction of information from a historical text and building a linked data model for the domain of ecology and conservation science. Heliyon 2022; 8:e10710. [PMID: 36262290 PMCID: PMC9573881 DOI: 10.1016/j.heliyon.2022.e10710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 08/05/2022] [Accepted: 09/15/2022] [Indexed: 11/06/2022] Open
Abstract
Data heterogeneity is a pressing issue and is further compounded if we have to deal with data from textual documents. The unstructured nature of such documents implies that collating, comparing and analysing the information contained therein can be a challenging task. Automating these processes can help to unleash insightful knowledge that otherwise remains buried in them. Moreover, integrating the extracted information from the documents with other related information can help to make more information-rich queries. In this context, the paper presents a comprehensive review of text extraction and data integration techniques to enable this automation process in an ecological context. The paper investigates into extracting valuable floristic information from a historical Botany journal. The purpose behind this extraction is to bring to light relevant pieces of information contained within the document. In addition, the paper also explores the need to integrate the extracted information together with other related information from disparate sources. All the information is then rendered into a query-able form in order to make unified queries. Hence, the paper makes use of a combination of Machine Learning, Natural Language Processing and Semantic Web techniques to achieve this. The proposed approach is demonstrated through the information extracted from the journal and the information-rich queries made through the integration process. The paper shows that the approach has a merit in extracting relevant information from the journal, discusses how the machine learning models have been designed to classify complex information and also gives a measure of their performance. The paper also shows that the approach has a merit in query time in regard to querying floristic information from a multi-source linked data model.
Collapse
Affiliation(s)
- Vatsala Nundloll
- School of Computing and Communications, Lancaster University, Lancaster, UK,Corresponding author.
| | - Robert Smail
- Lancaster Environment Centre, Lancaster University, UK1
Robert Smail worked at this organisation.
| | - Carly Stevens
- Lancaster Environment Centre, Lancaster University, UK1
Robert Smail worked at this organisation.
| | - Gordon Blair
- School of Computing and Communications, Lancaster University, Lancaster, UK
| |
Collapse
|
11
|
Abstract
In recent years, the evolution of technology has led to an increase in text data obtained from many sources. In the biomedical domain, text information has also evidenced this accelerated growth, and automatic text summarization systems play an essential role in optimizing physicians’ time resources and identifying relevant information. In this paper, we present a systematic review in recent research of text summarization for biomedical textual data, focusing mainly on the methods employed, type of input data text, areas of application, and evaluation metrics used to assess systems. The survey was limited to the period between 1st January 2014 and 15th March 2022. The data collected was obtained from WoS, IEEE, and ACM digital libraries, while the search strategies were developed with the help of experts in NLP techniques and previous systematic reviews. The four phases of a systematic review by PRISMA methodology were conducted, and five summarization factors were determined to assess the studies included: Input, Purpose, Output, Method, and Evaluation metric. Results showed that 3.5% of 801 studies met the inclusion criteria. Moreover, Single-document, Biomedical Literature, Generic, and Extractive summarization proved to be the most common approaches employed, while techniques based on Machine Learning were performed in 16 studies and Rouge (Recall-Oriented Understudy for Gisting Evaluation) was reported as the evaluation metric in 26 studies. This review found that in recent years, more transformer-based methodologies for summarization purposes have been implemented compared to a previous survey. Additionally, there are still some challenges in text summarization in different domains, especially in the biomedical field in terms of demand for further research.
Collapse
|
12
|
Automated medical chart review for breast cancer outcomes research: a novel natural language processing extraction system. BMC Med Res Methodol 2022; 22:136. [PMID: 35549854 PMCID: PMC9101856 DOI: 10.1186/s12874-022-01583-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 03/15/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Manually extracted data points from health records are collated on an institutional, provincial, and national level to facilitate clinical research. However, the labour-intensive clinical chart review process puts an increasing burden on healthcare system budgets. Therefore, an automated information extraction system is needed to ensure the timeliness and scalability of research data. METHODS We used a dataset of 100 synoptic operative and 100 pathology reports, evenly split into 50 reports in training and test sets for each report type. The training set guided our development of a Natural Language Processing (NLP) extraction pipeline system, which accepts scanned images of operative and pathology reports. The system uses a combination of rule-based and transfer learning methods to extract numeric encodings from text. We also developed visualization tools to compare the manual and automated extractions. The code for this paper was made available on GitHub. RESULTS A test set of 50 operative and 50 pathology reports were used to evaluate the extraction accuracies of the NLP pipeline. Gold standard, defined as manual extraction by expert reviewers, yielded accuracies of 90.5% for operative reports and 96.0% for pathology reports, while the NLP system achieved overall 91.9% (operative) and 95.4% (pathology) accuracy. The pipeline successfully extracted outcomes data pertinent to breast cancer tumor characteristics (e.g. presence of invasive carcinoma, size, histologic type), prognostic factors (e.g. number of lymph nodes with micro-metastases and macro-metastases, pathologic stage), and treatment-related variables (e.g. margins, neo-adjuvant treatment, surgical indication) with high accuracy. Out of the 48 variables across operative and pathology codebooks, NLP yielded 43 variables with F-scores of at least 0.90; in comparison, a trained human annotator yielded 44 variables with F-scores of at least 0.90. CONCLUSIONS The NLP system achieves near-human-level accuracy in both operative and pathology reports using a minimal curated dataset. This system uniquely provides a robust solution for transparent, adaptable, and scalable automation of data extraction from patient health records. It may serve to advance breast cancer clinical research by facilitating collection of vast amounts of valuable health data at a population level.
Collapse
|
13
|
Yoo S, Yoon E, Boo D, Kim B, Kim S, Paeng JC, Yoo IR, Choi IY, Kim K, Ryoo HG, Lee SJ, Song E, Joo YH, Kim J, Lee HY. Transforming Thyroid Cancer Diagnosis and Staging Information from Unstructured Reports to the Observational Medical Outcome Partnership Common Data Model. Appl Clin Inform 2022; 13:521-531. [PMID: 35705182 PMCID: PMC9200482 DOI: 10.1055/s-0042-1748144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
BACKGROUND Cancer staging information is an essential component of cancer research. However, the information is primarily stored as either a full or semistructured free-text clinical document which is limiting the data use. By transforming the cancer-specific data to the Observational Medical Outcome Partnership Common Data Model (OMOP CDM), the information can contribute to establish multicenter observational cancer studies. To the best of our knowledge, there have been no studies on OMOP CDM transformation and natural language processing (NLP) for thyroid cancer to date. OBJECTIVE We aimed to demonstrate the applicability of the OMOP CDM oncology extension module for thyroid cancer diagnosis and cancer stage information by processing free-text medical reports. METHODS Thyroid cancer diagnosis and stage-related modifiers were extracted with rule-based NLP from 63,795 thyroid cancer pathology reports and 56,239 Iodine whole-body scan reports from three medical institutions in the Observational Health Data Sciences and Informatics data network. The data were converted into the OMOP CDM v6.0 according to the OMOP CDM oncology extension module. The cancer staging group was derived and populated using the transformed CDM data. RESULTS The extracted thyroid cancer data were completely converted into the OMOP CDM. The distributions of histopathological types of thyroid cancer were approximately 95.3 to 98.8% of papillary carcinoma, 0.9 to 3.7% of follicular carcinoma, 0.04 to 0.54% of adenocarcinoma, 0.17 to 0.81% of medullary carcinoma, and 0 to 0.3% of anaplastic carcinoma. Regarding cancer staging, stage-I thyroid cancer accounted for 55 to 64% of the cases, while stage III accounted for 24 to 26% of the cases. Stage-II and -IV thyroid cancers were detected at a low rate of 2 to 6%. CONCLUSION As a first study on OMOP CDM transformation and NLP for thyroid cancer, this study will help other institutions to standardize thyroid cancer-specific data for retrospective observational research and participate in multicenter studies.
Collapse
Affiliation(s)
- Sooyoung Yoo
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Eunsil Yoon
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Dachung Boo
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Borham Kim
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Seok Kim
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Jin Chul Paeng
- Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea
| | - Ie Ryung Yoo
- Division of Nuclear Medicine, Department of Radiology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, South Korea
| | - In Young Choi
- Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea.,Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea
| | - Kwangsoo Kim
- Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Seoul, South Korea
| | - Hyun Gee Ryoo
- Department of Nuclear Medicine, Seoul National University Hospital, Seoul, South Korea.,Department of Nuclear Medicine, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Sun Jung Lee
- Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea.,Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea
| | - Eunhye Song
- Department of Data Science Research, Innovative Medical Technology Research Institute, Seoul National University Hospital, Seoul, South Korea
| | - Young-Hwan Joo
- Biomedical Research Institute, Seoul National University Hospital, Seoul, South Korea
| | - Junmo Kim
- Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, South Korea
| | - Ho-Young Lee
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea.,Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea
| |
Collapse
|
14
|
Dipnall JF, Lu J, Gabbe BJ, Cosic F, Edwards E, Page R, Du L. Comparison of state-of-the-art machine and deep learning algorithms to classify proximal humeral fractures using radiology text. Eur J Radiol 2022; 153:110366. [DOI: 10.1016/j.ejrad.2022.110366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/08/2022] [Accepted: 05/16/2022] [Indexed: 12/01/2022]
|
15
|
Viscosi C, Fidelbo P, Benedetto A, Varvarà M, Ferrante M. Selection of diagnosis with oncologic relevance information from histopathology free text reports: A machine learning approach. Int J Med Inform 2022; 160:104714. [DOI: 10.1016/j.ijmedinf.2022.104714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 01/22/2022] [Accepted: 02/03/2022] [Indexed: 10/19/2022]
|
16
|
Rule-Based Information Extraction from Free-Text Pathology Reports Reveals Trends in South African Female Breast Cancer Molecular Subtypes and Ki67 Expression. BIOMED RESEARCH INTERNATIONAL 2022; 2022:6157861. [PMID: 35355821 PMCID: PMC8960023 DOI: 10.1155/2022/6157861] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 12/29/2021] [Indexed: 12/23/2022]
Abstract
Clinical information on molecular subtypes and the Ki67 index is critical for breast cancer (BC) prognosis and personalised treatment plan. Extracting such information into structured data is essential for research, auditing, and cancer incidence reporting and underpins the potential for automated decision support. Herewith, we developed a rule-based natural language processing algorithm that retrieved and extracted important BC parameters from free-text pathology reports towards exploring molecular subtypes and Ki67-proliferation trends. We considered malignant BC pathology reports with different free-text narrative attributes from the South African National Health Laboratory Service. The reports were preprocessed and parsed through the algorithm. Parameters extracted by the algorithm were validated against manually extracted parameters. For all parameters extracted, we obtained accurate annotations of 83-100%, 93-100%, 91-100%, and 92-100% precision, recall, F1-score, and kappa, respectively. There was a significant trend in the proportion of each molecular subtype by patient age, histologic type, grade, Ki67, and race. The findings also showed significant association in the Ki67 trend with hormone receptors, human epidermal growth factors, age, grade, and race. Our approach bridges the gap between data availability and actionable knowledge and provides a framework that could be adapted and reused in other cancers and beyond cancer studies. Information extracted from these reports showed interesting trends that may be exploited for BC screening and treatment resources in South Africa. Finally, this study strongly encourages the implementation of a synoptic style pathology report in South Africa.
Collapse
|
17
|
Chinese named-entity recognition via self-attention mechanism and position-aware influence propagation embedding. DATA KNOWL ENG 2022. [DOI: 10.1016/j.datak.2022.101983] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
18
|
Musa IH, Afolabi LO, Zamit I, Musa TH, Musa HH, Tassang A, Akintunde TY, Li W. Artificial Intelligence and Machine Learning in Cancer Research: A Systematic and Thematic Analysis of the Top 100 Cited Articles Indexed in Scopus Database. Cancer Control 2022; 29:10732748221095946. [PMID: 35688650 PMCID: PMC9189515 DOI: 10.1177/10732748221095946] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
INTRODUCTION Cancer is a major public health problem and a global leading cause of death where the screening, diagnosis, prediction, survival estimation, and treatment of cancer and control measures are still a major challenge. The rise of Artificial Intelligence (AI) and Machine Learning (ML) techniques and their applications in various fields have brought immense value in providing insights into advancement in support of cancer control. METHODS A systematic and thematic analysis was performed on the Scopus database to identify the top 100 cited articles in cancer research. Data were analyzed using RStudio and VOSviewer.Var1.6.6. RESULTS The top 100 articles in AI and ML in cancer received a 33 920 citation score with a range of 108 to 5758 times. Doi Kunio from the USA was the most cited author with total number of citations (TNC = 663). Out of 43 contributed countries, 30% of the top 100 cited articles originated from the USA, and 10% originated from China. Among the 57 peer-reviewed journals, the "Expert Systems with Application" published 8% of the total articles. The results were presented in highlight technological advancement through AI and ML via the widespread use of Artificial Neural Network (ANNs), Deep Learning or machine learning techniques, Mammography-based Model, Convolutional Neural Networks (SC-CNN), and text mining techniques in the prediction, diagnosis, and prevention of various types of cancers towards cancer control. CONCLUSIONS This bibliometric study provides detailed overview of the most cited empirical evidence in AI and ML adoption in cancer research that could efficiently help in designing future research. The innovations guarantee greater speed by using AI and ML in the detection and control of cancer to improve patient experience.
Collapse
Affiliation(s)
- Ibrahim H. Musa
- Department of Software Engineering, School of Computer Science and Engineering, Southeast University, Nanjing, China
- Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing, China
| | - Lukman O. Afolabi
- Guangdong Immune Cell Therapy Engineering and Technology Research Center, Center for Protein and Cell-Based Drugs, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Ibrahim Zamit
- University of Chinese Academy of Sciences, Beijing, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Taha H. Musa
- Biomedical Research Institute, Darfur University College, Nyala, South Darfur, Sudan
- Key Laboratory of Environmental Medicine Engineering, Ministry of Education, Department of Epidemiology and Health Statistics, School of Public Health, Southeast University, Nanjing, Jiangsu Province, China
| | - Hassan H. Musa
- Faculty of Medical Laboratory Sciences, University of Khartoum, Khartoum, Sudan
| | - Andrew Tassang
- Faculty of Health Sciences, University of Buea, Cameroon
- Buea Regional Hospital, Annex, Cameroon
| | - Tosin Y. Akintunde
- Department of Sociology, School of Public Administration, Hohai University, Nanjing, China
| | - Wei Li
- Department of quality management, Children’s hospital of Nanjing Medical University, Nanjing, China
| |
Collapse
|
19
|
Zaccaria GM, Colella V, Colucci S, Clemente F, Pavone F, Vegliante MC, Esposito F, Opinto G, Scattone A, Loseto G, Minoia C, Rossini B, Quinto AM, Angiulli V, Grieco LA, Fama A, Ferrero S, Moia R, Di Rocco A, Quaglia FM, Tabanelli V, Guarini A, Ciavarella S. Electronic case report forms generation from pathology reports by ARGO, automatic record generator for onco-hematology. Sci Rep 2021; 11:23823. [PMID: 34893665 PMCID: PMC8664934 DOI: 10.1038/s41598-021-03204-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 11/23/2021] [Indexed: 12/04/2022] Open
Abstract
The unstructured nature of Real-World (RW) data from onco-hematological patients and the scarce accessibility to integrated systems restrain the use of RW information for research purposes. Natural Language Processing (NLP) might help in transposing unstructured reports into standardized electronic health records. We exploited NLP to develop an automated tool, named ARGO (Automatic Record Generator for Onco-hematology) to recognize information from pathology reports and populate electronic case report forms (eCRFs) pre-implemented by REDCap. ARGO was applied to hemo-lymphopathology reports of diffuse large B-cell, follicular, and mantle cell lymphomas, and assessed for accuracy (A), precision (P), recall (R) and F1-score (F) on internal (n = 239) and external (n = 93) report series. 326 (98.2%) reports were converted into corresponding eCRFs. Overall, ARGO showed high performance in capturing (1) identification report number (all metrics > 90%), (2) biopsy date (all metrics > 90% in both series), (3) specimen type (86.6% and 91.4% of A, 98.5% and 100.0% of P, 92.5% and 95.5% of F, and 87.2% and 91.4% of R for internal and external series, respectively), (4) diagnosis (100% of P with A, R and F of 90% in both series). We developed and validated a generalizable tool that generates structured eCRFs from real-life pathology reports.
Collapse
Affiliation(s)
- Gian Maria Zaccaria
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy.
| | - Vito Colella
- Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
| | - Simona Colucci
- Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
| | - Felice Clemente
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Fabio Pavone
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Maria Carmela Vegliante
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Flavia Esposito
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy.,Department of Mathematics, University of Bari Aldo Moro, Bari, Italy
| | - Giuseppina Opinto
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Anna Scattone
- Pathology Department, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| | - Giacomo Loseto
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Carla Minoia
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Bernardo Rossini
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Angela Maria Quinto
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Vito Angiulli
- Clinical Engineering Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| | - Luigi Alfredo Grieco
- Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
| | - Angelo Fama
- Hematology, Azienda USL - IRCCS Di Reggio Emilia, Reggio Emilia, Italy
| | - Simone Ferrero
- Division of Hematology 1, AOU "Città Della Salute e Della Scienza di Torino", Torino, Italy.,Department of Molecular Biotechnologies and Health Sciences, University of Torino, Torino, Italy
| | - Riccardo Moia
- Division of Hematology, Azienda Ospedaliero-Universitaria Maggiore Della Carità Di Novara, Novara, Italy
| | - Alice Di Rocco
- Unit of Hematology, Azienda Ospedaliero-Universitaria Policlinico Umberto I, Roma, Italy
| | | | - Valentina Tabanelli
- Division of Diagnostic Haematopathology, European Institute of Oncology, IRCCS, Milano, Italy
| | - Attilio Guarini
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Sabino Ciavarella
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| |
Collapse
|
20
|
Cassim N, Mapundu M, Olago V, Celik T, George JA, Glencross DK. Using text mining techniques to extract prostate cancer predictive information (Gleason score) from semi-structured narrative laboratory reports in the Gauteng province, South Africa. BMC Med Inform Decis Mak 2021; 21:330. [PMID: 34823522 PMCID: PMC8614040 DOI: 10.1186/s12911-021-01697-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 11/18/2021] [Indexed: 12/24/2022] Open
Abstract
Background Prostate cancer (PCa) is the leading male neoplasm in South Africa with an age-standardised incidence rate of 68.0 per 100,000 population in 2018. The Gleason score (GS) is the strongest predictive factor for PCa treatment and is embedded within semi-structured prostate biopsy narrative reports. The manual extraction of the GS is labour-intensive. The objective of our study was to explore the use of text mining techniques to automate the extraction of the GS from irregularly reported text-intensive patient reports. Methods We used the associated Systematized Nomenclature of Medicine clinical terms morphology and topography codes to identify prostate biopsies with a PCa diagnosis for men aged > 30 years between 2006 and 2016 in the Gauteng Province, South Africa. We developed a text mining algorithm to extract the GS from 1000 biopsy reports with a PCa diagnosis from the National Health Laboratory Service database and validated the algorithm using 1000 biopsies from the private sector. The logical steps for the algorithm were data acquisition, pre-processing, feature extraction, feature value representation, feature selection, information extraction, classification, and discovered knowledge. We evaluated the algorithm using precision, recall and F-score. The GS was manually coded by two experts for both datasets. The top five GS were reported, with the remaining scores categorised as “Other” for both datasets. The percentage of biopsies with a high-risk GS (≥ 8) was also reported. Results The first output reported an F-score of 0.99 that improved to 1.00 after the algorithm was amended (the GS reported in clinical history was ignored). For the validation dataset, an F-score of 0.99 was reported. The most commonly reported GS were 5 + 4 = 9 (17.6%), 3 + 3 = 6 (17.5%), 4 + 3 = 7 (16.4%), 3 + 4 = 7 (14.7%) and 4 + 4 = 8 (14.2%). For the validation dataset, the most commonly reported GS were: (i) 3 + 3 = 6 (37.7%), (ii) 3 + 4 = 7 (19.4%), (iii) 4 + 3 = 7 (14.9%), (iv) 4 + 4 = 8 (10.0%) and (v) 4 + 5 = 9 (7.4%). A high-risk GS was reported for 31.8% compared to 17.4% for the validation dataset. Conclusions We demonstrated reliable extraction of information about GS from narrative text-based patient reports using an in-house developed text mining algorithm. A secondary outcome was that late presentation could be assessed.
Collapse
Affiliation(s)
- Naseem Cassim
- Department of Molecular Medicine and Haematology, Faculty of Health Sciences, University of Witwatersrand and National Health Laboratory Service (NHLS), 7 York Road, Parktown, Johannesburg, South Africa.
| | - Michael Mapundu
- School of Public Health, Faculty of Health Sciences, University of Witwatersrand, 7 York Road, Parktown, Johannesburg, South Africa
| | - Victor Olago
- National Health Laboratory Service (NHLS), National Cancer Registry (NCR), 1 Modderfontein Road, Sandringham, Johannesburg, South Africa
| | - Turgay Celik
- School of Electrical & Information Engineering and Wits Institute of Data Science, University of Witwatersrand, 1 Jan Smuts Avenue, Braamfontein, Johannesburg, South Africa
| | - Jaya Anna George
- Department of Chemical Pathology, Faculty of Health Sciences, University of Witwatersrand and National Health Laboratory Service (NHLS), 7 York Road, Parktown, Johannesburg, South Africa
| | - Deborah Kim Glencross
- Department of Molecular Medicine and Haematology, Faculty of Health Sciences, University of Witwatersrand and National Health Laboratory Service (NHLS), 7 York Road, Parktown, Johannesburg, South Africa
| |
Collapse
|
21
|
Xiao Y, Zheng X, Song W, Tong F, Mao Y, Liu S, Zhao D. CIDO-COVID-19: An Ontology for COVID-19 Based on CIDO. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:2119-2122. [PMID: 34891707 DOI: 10.1109/embc46164.2021.9629555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
To realize integration, organization and reusability of knowledge related to COVID-19, an ontology for COVID-19 (CIDO-COVID-19) was constructed which extended the Coronavirus Infectious Disease Ontology (CIDO) by adding terms of COVID-19 related to symptoms, prevention, drugs and clinical domains. First, terms from the existing ontologies, literature, clinical guidelines and other resources about COVID-19 were merged. Then, the Stanford seven-step approach was used to define and organize the acquired terms. Finally, the CIDO-COVID-19 was built on basis of the terms mentioned above using Protégé. The CIDO-COVID-19 is a more comprehensive ontology for COVID-19, covering multiple areas in the domain of COVID-19, including disease, diagnosis, etiology, virus, transmission, symptom, treatment, drug and prevention.Clinical Relevance- The CIDO-COVID-19 covers multiple areas related to COVID-19, including diseases, diagnosis, etiology, virus, transmission, symptoms, treatment, drugs, prevention. Compared with the CIDO, it is expanded to cover drugs, prevention, and clinical domain. The definition of terms in CIDO-COVID-19 refers to biomedical ontologies, Clinical glossaries and clinical guidelines for COVID-19, which can provide clinicians with standard terminology in the clinical domain.
Collapse
|
22
|
Lin C, Lee YT, Wu FJ, Lin SA, Hsu CJ, Lee CC, Tsai DJ, Fang WH. The Application of Projection Word Embeddings on Medical Records Scoring System. Healthcare (Basel) 2021; 9:healthcare9101298. [PMID: 34682978 PMCID: PMC8544381 DOI: 10.3390/healthcare9101298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/24/2021] [Accepted: 09/28/2021] [Indexed: 11/16/2022] Open
Abstract
Medical records scoring is important in a health care system. Artificial intelligence (AI) with projection word embeddings has been validated in its performance disease coding tasks, which maintain the vocabulary diversity of open internet databases and the medical terminology understanding of electronic health records (EHRs). We considered that an AI-enhanced system might be also applied to automatically score medical records. This study aimed to develop a series of deep learning models (DLMs) and validated their performance in medical records scoring task. We also analyzed the practical value of the best model. We used the admission medical records from the Tri-Services General Hospital during January 2016 to May 2020, which were scored by our visiting staffs with different levels from different departments. The medical records were scored ranged 0 to 10. All samples were divided into a training set (n = 74,959) and testing set (n = 152,730) based on time, which were used to train and validate the DLMs, respectively. The mean absolute error (MAE) was used to evaluate each DLM performance. In original AI medical record scoring, the predicted score by BERT architecture is closer to the actual reviewer score than the projection word embedding and LSTM architecture. The original MAE is 0.84 ± 0.27 using the BERT model, and the MAE is 1.00 ± 0.32 using the LSTM model. Linear mixed model can be used to improve the model performance, and the adjusted predicted score was closer compared to the original score. However, the project word embedding with the LSTM model (0.66 ± 0.39) provided better performance compared to BERT (0.70 ± 0.33) after linear mixed model enhancement (p < 0.001). In addition to comparing different architectures to score the medical records, this study further uses a mixed linear model to successfully adjust the AI medical record score to make it closer to the actual physician's score.
Collapse
Affiliation(s)
- Chin Lin
- School of Medicine, National Defense Medical Center, Taipei 114, Taiwan;
- School of Public Health, National Defense Medical Center, Taipei 114, Taiwan
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei 114, Taiwan
- Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
| | - Yung-Tsai Lee
- Division of Cardiovascular Surgery, Cheng Hsin Rehabilitation and Medical Center, Taipei 112, Taiwan;
| | - Feng-Jen Wu
- Department of Informatics, Taoyuan Armed Forces General Hospital, Taoyuan 325, Taiwan;
| | - Shing-An Lin
- Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.)
| | - Chia-Jung Hsu
- Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.)
| | - Chia-Cheng Lee
- Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.)
- Division of Colorectal Surgery, Department of Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
| | - Dung-Jang Tsai
- School of Public Health, National Defense Medical Center, Taipei 114, Taiwan
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei 114, Taiwan
- Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
- Correspondence: (D.-J.T.); (W.-H.F.); Tel.: +886-2-8792-3100 (ext. #18305) (D.-J.T.); +886-2-8792-3100 (ext. #12322) (W.-H.F.); Fax: +886-2-8792-3147 (D.-J.T. & W.-H.F.)
| | - Wen-Hui Fang
- Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
- Department of Family and Community Medicine, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
- Correspondence: (D.-J.T.); (W.-H.F.); Tel.: +886-2-8792-3100 (ext. #18305) (D.-J.T.); +886-2-8792-3100 (ext. #12322) (W.-H.F.); Fax: +886-2-8792-3147 (D.-J.T. & W.-H.F.)
| |
Collapse
|
23
|
Conceição SIR, Couto FM. Text Mining for Building Biomedical Networks Using Cancer as a Case Study. Biomolecules 2021; 11:biom11101430. [PMID: 34680062 PMCID: PMC8533101 DOI: 10.3390/biom11101430] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/24/2021] [Accepted: 09/27/2021] [Indexed: 12/15/2022] Open
Abstract
In the assembly of biological networks it is important to provide reliable interactions in an effort to have the most possible accurate representation of real-life systems. Commonly, the data used to build a network comes from diverse high-throughput essays, however most of the interaction data is available through scientific literature. This has become a challenge with the notable increase in scientific literature being published, as it is hard for human curators to track all recent discoveries without using efficient tools to help them identify these interactions in an automatic way. This can be surpassed by using text mining approaches which are capable of extracting knowledge from scientific documents. One of the most important tasks in text mining for biological network building is relation extraction, which identifies relations between the entities of interest. Many interaction databases already use text mining systems, and the development of these tools will lead to more reliable networks, as well as the possibility to personalize the networks by selecting the desired relations. This review will focus on different approaches of automatic information extraction from biomedical text that can be used to enhance existing networks or create new ones, such as deep learning state-of-the-art approaches, focusing on cancer disease as a case-study.
Collapse
|
24
|
Rossi KR, Echeverria D, Carroll A, Luse T, Rennix C. Development and Evaluation of Perl-Based Algorithms to Classify Neoplasms From Pathology Records in Synoptic Report Format. JCO Clin Cancer Inform 2021; 5:295-303. [PMID: 33760628 DOI: 10.1200/cci.20.00152] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Synoptic reporting provides a mechanism for uniform and structured pathology diagnostics. This paper demonstrates the functionality of Perl alternation and grouping expressions to classify electronic pathology reports generated from military treatment facilities. Eight Perl-based algorithms are validated to classify malignant melanoma, Hodgkin lymphoma, non-Hodgkin lymphoma, leukemia, and malignant neoplasms of the breast, ovary, testis, and thyroid. METHODS Case finding cohorts were developed using diagnostic codes for neoplasm groups and matched by unique identifiers to obtain pathology records. Preprocessing techniques and Perl-based algorithms were applied to classify records as malignant, in situ, suspect, or nonapplicable, followed by a hand-review process to determine the accuracy of the algorithm classifications. Interrater reliability, sensitivity, specificity, positive predictive values, and negative predictive values were computed following abstractor adjudication. RESULTS The specificity of the Perl-based algorithms was consistently high, over 98%. Very few benign results were classified as malignant or in situ by the Perl-based algorithms; the leukemia algorithm classification was the only group to demonstrate a positive predictive value below 95%, at 91.9%. Three algorithm classification groups demonstrated a sensitivity of < 80%, including malignant neoplasm of the ovary (33.3%), leukemia (52.8%), and non-Hodgkin lymphoma (62.9%). The pathology records for these results included substantial linguistic variation. CONCLUSION This paper contextualizes the utility and value of an algorithm logic built around synoptic reporting to identify neoplasms from electronic pathology results. The major strength includes the application of Perl-based coding in SAS, an accessible software application, to develop highly specific algorithms across institutional variation in diagnostic documentation.
Collapse
Affiliation(s)
| | | | - Anna Carroll
- EpiData Center Department, Navy and Marine Corps Public Health Center, Portsmouth, VA
| | - Tina Luse
- EpiData Center Department, Navy and Marine Corps Public Health Center, Portsmouth, VA
| | | |
Collapse
|
25
|
Thompson BS, Hardy S, Pandeya N, Dusingize JC, Green AC, Millane A, Bourke D, Grande R, Bean CD, Olsen CM, Whiteman DC. Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers. JCO Clin Cancer Inform 2021; 4:711-723. [PMID: 32755460 PMCID: PMC7469600 DOI: 10.1200/cci.19.00152] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
PURPOSE Keratinocyte cancers are exceedingly common in high-risk populations, but accurate measures of incidence are seldom derived because the burden of manually reviewing pathology reports to extract relevant diagnostic information is excessive. Thus, we sought to develop supervised learning algorithms for classifying basal and squamous cell carcinomas and other diagnoses, as well as disease site, and incorporate these into a Web application capable of processing large numbers of pathology reports. METHODS Participants in the QSkin study were recruited in 2011 and comprised men and women age 40-69 years at baseline (N = 43,794) who were randomly selected from a population register in Queensland, Australia. Histologic data were manually extracted from free-text pathology reports for participants with histologically confirmed keratinocyte cancers for whom a pathology report was available (n = 25,786 reports). This provided a training data set for the development of algorithms capable of deriving diagnosis and site from free-text pathology reports. We calculated agreement statistics between algorithm-derived classifications and 3 independent validation data sets of manually abstracted pathology reports. RESULTS The agreement for classifications of basal cell carcinoma (κ = 0.97 and κ = 0.96) and squamous cell carcinoma (κ = 0.93 for both) was almost perfect in 2 validation data sets but was slightly lower for a third (κ = 0.82 and κ = 0.90, respectively). Agreement for total counts of specific diagnoses was also high (κ > 0.8). Similar levels of agreement between algorithm-derived and manually extracted data were observed for classifications of keratoacanthoma and intraepidermal carcinoma. CONCLUSION Supervised learning methods were used to develop a Web application capable of accurately and rapidly classifying large numbers of pathology reports for keratinocyte cancers and related diagnoses. Such tools may provide the means to accurately measure subtype-specific skin cancer incidence.
Collapse
Affiliation(s)
- Bridie S Thompson
- Department of Population Health, QIMR Berghofer Medical Research Institute, Brisbane Queensland, Australia
| | - Sam Hardy
- Otso, Brisbane, Queensland, Australia
| | - Nirmala Pandeya
- Department of Population Health, QIMR Berghofer Medical Research Institute, Brisbane Queensland, Australia.,School of Public Health, University of Queensland, Brisbane, Queensland, Australia
| | - Jean Claude Dusingize
- Department of Population Health, QIMR Berghofer Medical Research Institute, Brisbane Queensland, Australia
| | - Adele C Green
- Department of Population Health, QIMR Berghofer Medical Research Institute, Brisbane Queensland, Australia.,Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom
| | - Athon Millane
- School of Public Health, University of Queensland, Brisbane, Queensland, Australia
| | | | | | | | - Catherine M Olsen
- Department of Population Health, QIMR Berghofer Medical Research Institute, Brisbane Queensland, Australia.,Faculty of Medicine, University of Queensland, Brisbane, Queensland, Australia
| | - David C Whiteman
- Department of Population Health, QIMR Berghofer Medical Research Institute, Brisbane Queensland, Australia.,Faculty of Medicine, University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
26
|
Torous VF, Simpson RW, Balani JP, Baras AS, Berman MA, Birdsong GG, Giannico GA, Paner GP, Pettus JR, Sessions Z, Sirintrapun SJ, Srigley JR, Spencer S. College of American Pathologists Cancer Protocols: From Optimizing Cancer Patient Care to Facilitating Interoperable Reporting and Downstream Data Use. JCO Clin Cancer Inform 2021; 5:47-55. [PMID: 33439728 PMCID: PMC8140812 DOI: 10.1200/cci.20.00104] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
The College of American Pathologists Cancer Protocols have offered guidance to pathologists for standard cancer pathology reporting for more than 35 years. The adoption of computer readable versions of these protocols by electronic health record and laboratory information system (LIS) vendors has provided a mechanism for pathologists to report within their LIS workflow, in addition to enabling standardized structured data capture and reporting to downstream consumers of these data such as the cancer surveillance community. This paper reviews the history of the Cancer Protocols and electronic Cancer Checklists, outlines the current use of these critically important cancer case reporting tools, and examines future directions, including plans to help improve the integration of the Cancer Protocols into clinical, public health, research, and other workflows.
Collapse
Affiliation(s)
| | | | - Jyoti P Balani
- University of Texas Southwestern Medical Center, Dallas, TX
| | | | - Michael A Berman
- Jefferson Hospital, Allegheny Health Network, Jefferson Hills, PA
| | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Jung E, Jain H, Sinha AP, Gaudioso C. Building a specialized lexicon for breast cancer clinical trial subject eligibility analysis. Health Informatics J 2021; 27:1460458221989392. [PMID: 33535885 DOI: 10.1177/1460458221989392] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
A natural language processing (NLP) application requires sophisticated lexical resources to support its processing goals. Different solutions, such as dictionary lookup and MetaMap, have been proposed in the healthcare informatics literature to identify disease terms with more than one word (multi-gram disease named entities). Although a lot of work has been done in the identification of protein- and gene-named entities in the biomedical field, not much research has been done on the recognition and resolution of terminologies in the clinical trial subject eligibility analysis. In this study, we develop a specialized lexicon for improving NLP and text mining analysis in the breast cancer domain, and evaluate it by comparing it with the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). We use a hybrid methodology, which combines the knowledge of domain experts, terms from multiple online dictionaries, and the mining of text from sample clinical trials. Use of our methodology introduces 4243 unique lexicon items, which increase bigram entity match by 38.6% and trigram entity match by 41%. Our lexicon, which adds a significant number of new terms, is very useful for matching patients to clinical trials automatically based on eligibility matching. Beyond clinical trial matching, the specialized lexicon developed in this study could serve as a foundation for future healthcare text mining applications.
Collapse
Affiliation(s)
- Euisung Jung
- Information Operations and Technology Management, John B. and Lillian E. Neff College of Business and Innovation, The University of Toledo, USA
| | - Hemant Jain
- Gary W. Rollins College of Business, The University of Tennessee at Chattanooga, USA
| | - Atish P Sinha
- Lubar School of Business, University of Wisconsin-Milwaukee, USA
| | | |
Collapse
|
28
|
Deshmukh PR, Phalnikar R. Prognostic elements extraction from documents to detect prognostic stage. Comput Methods Biomech Biomed Engin 2021; 25:371-386. [PMID: 34319178 DOI: 10.1080/10255842.2021.1955359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
For cancer prediction, the prognostic stage is the main factor that helps medical experts to decide the optimal treatment for a patient. The main objective of this study is to predict prognostic stage from the medical records of various health institutions. Total 465 pathological and clinical reports of people living with breast cancer has been collected from India's reputed treatment institutions. Different anatomic and biologic factors are extracted from unstructured medical records using a novel combination of natural language processing (NLP) and fuzzy decision tree (FDT) for prognostic stage detection. This study has extracted the anatomic and biologic factors from medical reports with high accuracy. The average accuracy of the prognostic stage prediction found 93% and 83% in rural and urban regions, respectively. A generalized method for cancer staging with great accuracy in a different medical institution from dissimilar regional areas suggest that the proposed research improves the prognosis of breast cancer.
Collapse
Affiliation(s)
- Pratiksha R Deshmukh
- School of Computer Engineering and Technology, MIT World Peace University, Pune, India.,Department of Computer Science and Information Technology, College of Engineering Pune, Pune, India
| | - Rashmi Phalnikar
- School of Computer Engineering and Technology, MIT World Peace University, Pune, India
| |
Collapse
|
29
|
Deshmukh PR, Phalnikar R. Information extraction for prognostic stage prediction from breast cancer medical records using NLP and ML. Med Biol Eng Comput 2021; 59:1751-1772. [PMID: 34297300 DOI: 10.1007/s11517-021-02399-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 07/01/2021] [Indexed: 11/24/2022]
Abstract
For cancer prediction, the prognostic stage is the main factor that helps medical experts to decide the optimal treatment for a patient. Specialists study prognostic stage information from medical reports, often in an unstructured form, and take a larger review time. The main objective of this study is to suggest a generic clinical decision-unifying staging method to extract the most reliable prognostic stage information of breast cancer from medical records of various health institutions. Additional prognostic elements should be extracted from medical reports to identify the cancer stage for getting an exact measure of cancer and improving care quality. This study has collected 465 pathological and clinical reports of breast cancer sufferers from India's reputed medical institutions. The unstructured records were found distinct from each institute. Anatomic and biologic factors are extracted from medical records using the natural language processing, machine learning and rule-based method for prognostic stage detection. This study has extracted anatomic stage, grade, estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) from medical reports with high accuracy and predicted prognostic stage for both regions. The prognostic stage prediction's average accuracy is found 92% and 82% in rural and urban areas, respectively. It was essential to combine biological and anatomical elements under a single prognostic staging method. A generic clinical decision-unifying staging method for prognostic stage detection with great accuracy in various institutions of different regional areas suggests that the proposed research improves the prognosis of breast cancer.
Collapse
Affiliation(s)
- Pratiksha R Deshmukh
- School of Computer Engineering and Technology, MIT World Peace University, Pune, India, 411029. .,Department of Computer Engineering and Information Technology, College of Engineering, Pune, 411005, India.
| | - Rashmi Phalnikar
- School of Computer Engineering and Technology, MIT World Peace University, Pune, India, 411029
| |
Collapse
|
30
|
Kapoor R, Sleeman WC, Nalluri JJ, Turner P, Bose P, Cherevko A, Srinivasan S, Syed K, Ghosh P, Hagan M, Palta JR. Automated data abstraction for quality surveillance and outcome assessment in radiation oncology. J Appl Clin Med Phys 2021; 22:177-187. [PMID: 34101349 PMCID: PMC8292697 DOI: 10.1002/acm2.13308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/22/2021] [Accepted: 05/10/2021] [Indexed: 11/24/2022] Open
Abstract
Rigorous radiotherapy quality surveillance and comprehensive outcome assessment require electronic capture and automatic abstraction of clinical, radiation treatment planning, and delivery data. We present the design and implementation framework of an integrated data abstraction, aggregation, and storage, curation, and analytics software: the Health Information Gateway and Exchange (HINGE), which collates data for cancer patients receiving radiotherapy. The HINGE software abstracts structured DICOM‐RT data from the treatment planning system (TPS), treatment data from the treatment management system (TMS), and clinical data from the electronic health records (EHRs). HINGE software has disease site‐specific “Smart” templates that facilitate the entry of relevant clinical information by physicians and clinical staff in a discrete manner as part of the routine clinical documentation. Radiotherapy data abstracted from these disparate sources and the smart templates are processed for quality and outcome assessment. The predictive data analyses are done on using well‐defined clinical and dosimetry quality measures defined by disease site experts in radiation oncology. HINGE application software connects seamlessly to the local IT/medical infrastructure via interfaces and cloud services and performs data extraction and aggregation functions without human intervention. It provides tools to assess variations in radiation oncology practices and outcomes and determines gaps in radiotherapy quality delivered by each provider.
Collapse
Affiliation(s)
- Rishabh Kapoor
- Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA
| | - William C Sleeman
- Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA
| | - Joseph J Nalluri
- Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA
| | - Paul Turner
- Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA
| | - Priyankar Bose
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Andrii Cherevko
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Sriram Srinivasan
- Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA
| | - Khajamoinuddin Syed
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Preetam Ghosh
- Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Michael Hagan
- Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA
| | - Jatinder R Palta
- Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA, USA.,National Radiation Oncology Program, US Veterans Healthcare Administration, Richmond, VA, USA
| |
Collapse
|
31
|
Withall A, Karystianis G, Duncan D, Hwang YI, Hagos Kidane A, Butler T. Domestic Violence in Residential Care Facilities in New South Wales, Australia: A Text Mining Study. THE GERONTOLOGIST 2021; 62:223-231. [PMID: 34023902 DOI: 10.1093/geront/gnab068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND AND OBJECTIVES The police are often the first to attend domestic violence events in New South Wales (NSW), Australia, recording related details as structured information (e.g., date of the event, type of incident, premises type) and as text narratives which contain important information (e.g., mental health status, abuse types) for victims and perpetrators. This study examined the characteristics of victims and persons of interest (POIs) suspected and/or charged with perpetrating a domestic violence related crime in residential care facilities. RESEARCH DESIGN AND METHODS The study employed a text mining method that extracted key information from 700 police recorded domestic violence events in NSW residential care facilities. RESULTS Victims were mostly female (65.4%) and older adults (median age 80.3). POIs were predominantly male (67.0%) and were younger than the victims (median age 57.0). While low rates of mental illnesses were recorded (29.1% in victims; 17.4% in POIs), 'dementia' was the most common condition among POIs (55.7%) and victims (73.0%). 'Physical abuse' was the most common abuse type (80.2%) with 'bruising' the most common injury (36.8%). The most common relationship between perpetrator and victim was 'carer' (76.6%). DISCUSSION AND IMPLICATIONS These findings highlight the opportunity provided by police text-based data to provide insights into elder abuse within residential care facilities.
Collapse
Affiliation(s)
- Adrienne Withall
- School of Population Health, Faculty of Medicine, University of New South Wales, Kensington, Sydney, New South Wales, Australia
| | - George Karystianis
- School of Population Health, Faculty of Medicine, University of New South Wales, Kensington, Sydney, New South Wales, Australia
| | - Dayna Duncan
- School of Population Health, Faculty of Medicine, University of New South Wales, Kensington, Sydney, New South Wales, Australia
| | - Ye In Hwang
- School of Population Health, Faculty of Medicine, University of New South Wales, Kensington, Sydney, New South Wales, Australia
| | - Amanuel Hagos Kidane
- School of Population Health, Faculty of Medicine, University of New South Wales, Kensington, Sydney, New South Wales, Australia
| | - Tony Butler
- School of Population Health, Faculty of Medicine, University of New South Wales, Kensington, Sydney, New South Wales, Australia
| |
Collapse
|
32
|
Turchin A, Florez Builes LF. Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review. J Diabetes Sci Technol 2021; 15:553-560. [PMID: 33736486 PMCID: PMC8120048 DOI: 10.1177/19322968211000831] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
BACKGROUND Real-world evidence research plays an increasingly important role in diabetes care. However, a large fraction of real-world data are "locked" in narrative format. Natural language processing (NLP) technology offers a solution for analysis of narrative electronic data. METHODS We conducted a systematic review of studies of NLP technology focused on diabetes. Articles published prior to June 2020 were included. RESULTS We included 38 studies in the analysis. The majority (24; 63.2%) described only development of NLP tools; the remainder used NLP tools to conduct clinical research. A large fraction (17; 44.7%) of studies focused on identification of patients with diabetes; the rest covered a broad range of subjects that included hypoglycemia, lifestyle counseling, diabetic kidney disease, insulin therapy and others. The mean F1 score for all studies where it was available was 0.882. It tended to be lower (0.817) in studies of more linguistically complex concepts. Seven studies reported findings with potential implications for improving delivery of diabetes care. CONCLUSION Research in NLP technology to study diabetes is growing quickly, although challenges (e.g. in analysis of more linguistically complex concepts) remain. Its potential to deliver evidence on treatment and improving quality of diabetes care is demonstrated by a number of studies. Further growth in this area would be aided by deeper collaboration between developers and end-users of natural language processing tools as well as by broader sharing of the tools themselves and related resources.
Collapse
Affiliation(s)
- Alexander Turchin
- Brigham and Women’s Hospital, Boston,
MA, USA
- Alexander Turchin, MD, MS, Brigham and
Women’s Hospital, 221 Longwood Avenue, Boston, MA 02115, USA.
| | | |
Collapse
|
33
|
Alawad M, Gao S, Qiu JX, Yoon HJ, Blair Christian J, Penberthy L, Mumphrey B, Wu XC, Coyle L, Tourassi G. Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks. J Am Med Inform Assoc 2021; 27:89-98. [PMID: 31710668 PMCID: PMC7489089 DOI: 10.1093/jamia/ocz153] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 07/09/2019] [Accepted: 07/22/2019] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE We implement 2 different multitask learning (MTL) techniques, hard parameter sharing and cross-stitch, to train a word-level convolutional neural network (CNN) specifically designed for automatic extraction of cancer data from unstructured text in pathology reports. We show the importance of learning related information extraction (IE) tasks leveraging shared representations across the tasks to achieve state-of-the-art performance in classification accuracy and computational efficiency. MATERIALS AND METHODS Multitask CNN (MTCNN) attempts to tackle document information extraction by learning to extract multiple key cancer characteristics simultaneously. We trained our MTCNN to perform 5 information extraction tasks: (1) primary cancer site (65 classes), (2) laterality (4 classes), (3) behavior (3 classes), (4) histological type (63 classes), and (5) histological grade (5 classes). We evaluated the performance on a corpus of 95 231 pathology documents (71 223 unique tumors) obtained from the Louisiana Tumor Registry. We compared the performance of the MTCNN models against single-task CNN models and 2 traditional machine learning approaches, namely support vector machine (SVM) and random forest classifier (RFC). RESULTS MTCNNs offered superior performance across all 5 tasks in terms of classification accuracy as compared with the other machine learning models. Based on retrospective evaluation, the hard parameter sharing and cross-stitch MTCNN models correctly classified 59.04% and 57.93% of the pathology reports respectively across all 5 tasks. The baseline models achieved 53.68% (CNN), 46.37% (RFC), and 36.75% (SVM). Based on prospective evaluation, the percentages of correctly classified cases across the 5 tasks were 60.11% (hard parameter sharing), 58.13% (cross-stitch), 51.30% (single-task CNN), 42.07% (RFC), and 35.16% (SVM). Moreover, hard parameter sharing MTCNNs outperformed the other models in computational efficiency by using about the same number of trainable parameters as a single-task CNN. CONCLUSIONS The hard parameter sharing MTCNN offers superior classification accuracy for automated coding support of pathology documents across a wide range of cancers and multiple information extraction tasks while maintaining similar training and inference time as those of a single task-specific model.
Collapse
Affiliation(s)
- Mohammed Alawad
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Shang Gao
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - John X Qiu
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Hong Jun Yoon
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - J Blair Christian
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Lynne Penberthy
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, Maryland, USA
| | - Brent Mumphrey
- Louisiana Tumor Registry, Louisiana State University Health Sciences Center School of Public Health, New Orleans, Louisiana, USA
| | - Xiao-Cheng Wu
- Louisiana Tumor Registry, Louisiana State University Health Sciences Center School of Public Health, New Orleans, Louisiana, USA
| | - Linda Coyle
- Information Management Services Inc, Calverton, Maryland, USA
| | - Georgia Tourassi
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| |
Collapse
|
34
|
Abstract
Machine learning (ML) has been slowly entering every aspect of our lives and its positive impact has been astonishing. To accelerate embedding ML in more applications and incorporating it in real-world scenarios, automated machine learning (AutoML) is emerging. The main purpose of AutoML is to provide seamless integration of ML in various industries, which will facilitate better outcomes in everyday tasks. In healthcare, AutoML has been already applied to easier settings with structured data such as tabular lab data. However, there is still a need for applying AutoML for interpreting medical text, which is being generated at a tremendous rate. For this to happen, a promising method is AutoML for clinical notes analysis, which is an unexplored research area representing a gap in ML research. The main objective of this paper is to fill this gap and provide a comprehensive survey and analytical study towards AutoML for clinical notes. To that end, we first introduce the AutoML technology and review its various tools and techniques. We then survey the literature of AutoML in the healthcare industry and discuss the developments specific to clinical settings, as well as those using general AutoML tools for healthcare applications. With this background, we then discuss challenges of working with clinical notes and highlight the benefits of developing AutoML for medical notes processing. Next, we survey relevant ML research for clinical notes and analyze the literature and the field of AutoML in the healthcare industry. Furthermore, we propose future research directions and shed light on the challenges and opportunities this emerging field holds. With this, we aim to assist the community with the implementation of an AutoML platform for medical notes, which if realized can revolutionize patient outcomes.
Collapse
|
35
|
Integrating Speculation Detection and Deep Learning to Extract Lung Cancer Diagnosis from Clinical Notes. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11020865] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Despite efforts to develop models for extracting medical concepts from clinical notes, there are still some challenges in particular to be able to relate concepts to dates. The high number of clinical notes written for each single patient, the use of negation, speculation, and different date formats cause ambiguity that has to be solved to reconstruct the patient’s natural history. In this paper, we concentrate on extracting from clinical narratives the cancer diagnosis and relating it to the diagnosis date. To address this challenge, a hybrid approach that combines deep learning-based and rule-based methods is proposed. The approach integrates three steps: (i) lung cancer named entity recognition, (ii) negation and speculation detection, and (iii) relating the cancer diagnosis to a valid date. In particular, we apply the proposed approach to extract the lung cancer diagnosis and its diagnosis date from clinical narratives written in Spanish. Results obtained show an F-score of 90% in the named entity recognition task, and a 89% F-score in the task of relating the cancer diagnosis to the diagnosis date. Our findings suggest that speculation detection is together with negation detection a key component to properly extract cancer diagnosis from clinical notes.
Collapse
|
36
|
Karystianis G, Simpson A, Adily A, Schofield P, Greenberg D, Wand H, Nenadic G, Butler T. Prevalence of Mental Illnesses in Domestic Violence Police Records: Text Mining Study. J Med Internet Res 2020; 22:e23725. [PMID: 33361056 PMCID: PMC7790609 DOI: 10.2196/23725] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 09/17/2020] [Accepted: 11/23/2020] [Indexed: 01/22/2023] Open
Abstract
Background The New South Wales Police Force (NSWPF) records details of significant numbers of domestic violence (DV) events they attend each year as both structured quantitative data and unstructured free text. Accessing information contained in the free text such as the victim’s and persons of interest (POI's) mental health status could be useful in the better management of DV events attended by the police and thus improve health, justice, and social outcomes. Objective The aim of this study is to present the prevalence of extracted mental illness mentions for POIs and victims in police-recorded DV events. Methods We applied a knowledge-driven text mining method to recognize mental illness mentions for victims and POIs from police-recorded DV events. Results In 416,441 police-recorded DV events with single POIs and single victims, we identified 64,587 events (15.51%) with at least one mental illness mention versus 4295 (1.03%) recorded in the structured fixed fields. Two-thirds (67,582/85,880, 78.69%) of mental illnesses were associated with POIs versus 21.30% (18,298/85,880) with victims; depression was the most common condition in both victims (2822/12,589, 22.42%) and POIs (7496/39,269, 19.01%). Mental illnesses were most common among POIs aged 0-14 years (623/1612, 38.65%) and in victims aged over 65 years (1227/22,873, 5.36%). Conclusions A wealth of mental illness information exists within police-recorded DV events that can be extracted using text mining. The results showed mood-related illnesses were the most common in both victims and POIs. Further investigation is required to determine the reliability of the mental illness mentions against sources of diagnostic information.
Collapse
Affiliation(s)
- George Karystianis
- School of Population Health, University of New South Wales, Sydney, Australia
| | | | - Armita Adily
- School of Population Health, University of New South Wales, Sydney, Australia
| | - Peter Schofield
- Neuropsychiatry Service, Hunter New England Health, Newcastle, Australia
| | - David Greenberg
- School of Psychiatry, University of New South Wales, Sydney, Australia
| | - Handan Wand
- Kirby Institute, University of New South Wales, Sydney, Australia
| | - Goran Nenadic
- School of Computer Science, University of Manchester, Manchester, United Kingdom
| | - Tony Butler
- School of Population Health, University of New South Wales, Sydney, Australia
| |
Collapse
|
37
|
Esmaeili M, Ayyoubzadeh SM, Ahmadinejad N, Ghazisaeedi M, Nahvijou A, Maghooli K. A decision support system for mammography reports interpretation. Health Inf Sci Syst 2020; 8:17. [PMID: 32257128 PMCID: PMC7113352 DOI: 10.1007/s13755-020-00109-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 03/30/2020] [Indexed: 02/06/2023] Open
Abstract
PURPOSE Mammography plays a key role in the diagnosis of breast cancer; however, decision-making based on mammography reports is still challenging. This paper aims to addresses the challenges regarding decision-making based on mammography reports and propose a Clinical Decision Support System (CDSS) using data mining methods to help clinicians to interpret mammography reports. METHODS For this purpose, 2441 mammography reports were collected from Imam Khomeini Hospital from March 21, 2018, to March 20, 2019. In the first step, these mammography reports are analyzed and program code is developed to transform the reports into a dataset. Then, the weight of every feature of the dataset is calculated. Random Forest, Naïve Bayes, K-nearest neighbor (K-NN), Deep Learning classifiers are applied to the dataset to build a model capable of predicting the need for referral to biopsy. Afterward, the models are evaluated using cross-validation with measuring Area Under Curve (AUC), accuracy, sensitivity, specificity indices. RESULTS The mammography type (diagnostic or screening), mass and calcification features mentioned in the reports are the most important features for decision-making. Results reveal that the K-NN model is the most accurate and specific classifier with the accuracy and specificity values of 84.06% and 84.72% respectively. The Random Forest classifier has the best sensitivity and AUC with the sensitivity and AUC values of 87.74% and 0.905 respectively. CONCLUSIONS Accordingly, data mining approaches are proved to be a helpful tool to make the final decision as to whether patients should be referred to biopsy or not based on mammography reports. The developed CDSS may also be helpful especially for less experienced radiologists.
Collapse
Affiliation(s)
- Marzieh Esmaeili
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, 3rd Floor, No #17, Farredanesh Alley, Ghods St, Enghelab Ave, Tehran, Iran
- Scientific Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Seyed Mohammad Ayyoubzadeh
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, 3rd Floor, No #17, Farredanesh Alley, Ghods St, Enghelab Ave, Tehran, Iran
- Scientific Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Nasrin Ahmadinejad
- Medical Imaging Cancer, Imam Khomeini Hospital, Cancer Research Institute, Tehran, Iran
- Advanced Diagnostic and Interventional Radiology Research Cancer (ADIR), Tehran University of Medical Sciences, Tehran, Iran
| | - Marjan Ghazisaeedi
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, 3rd Floor, No #17, Farredanesh Alley, Ghods St, Enghelab Ave, Tehran, Iran
| | - Azin Nahvijou
- Cancer Research Center, Cancer Institute of Iran, Tehran University of Medical Sciences, Tehran, Iran
| | - Keivan Maghooli
- Department of Biomedical Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
| |
Collapse
|
38
|
Spasic I, Button K. Patient Triage by Topic Modeling of Referral Letters: Feasibility Study. JMIR Med Inform 2020; 8:e21252. [PMID: 33155985 PMCID: PMC7679210 DOI: 10.2196/21252] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 09/17/2020] [Accepted: 10/05/2020] [Indexed: 01/22/2023] Open
Abstract
Background Musculoskeletal conditions are managed within primary care, but patients can be referred to secondary care if a specialist opinion is required. The ever-increasing demand for health care resources emphasizes the need to streamline care pathways with the ultimate aim of ensuring that patients receive timely and optimal care. Information contained in referral letters underpins the referral decision-making process but is yet to be explored systematically for the purposes of treatment prioritization for musculoskeletal conditions. Objective This study aims to explore the feasibility of using natural language processing and machine learning to automate the triage of patients with musculoskeletal conditions by analyzing information from referral letters. Specifically, we aim to determine whether referral letters can be automatically assorted into latent topics that are clinically relevant, that is, considered relevant when prescribing treatments. Here, clinical relevance is assessed by posing 2 research questions. Can latent topics be used to automatically predict treatment? Can clinicians interpret latent topics as cohorts of patients who share common characteristics or experiences such as medical history, demographics, and possible treatments? Methods We used latent Dirichlet allocation to model each referral letter as a finite mixture over an underlying set of topics and model each topic as an infinite mixture over an underlying set of topic probabilities. The topic model was evaluated in the context of automating patient triage. Given a set of treatment outcomes, a binary classifier was trained for each outcome using previously extracted topics as the input features of the machine learning algorithm. In addition, a qualitative evaluation was performed to assess the human interpretability of topics. Results The prediction accuracy of binary classifiers outperformed the stratified random classifier by a large margin, indicating that topic modeling could be used to predict the treatment, thus effectively supporting patient triage. The qualitative evaluation confirmed the high clinical interpretability of the topic model. Conclusions The results established the feasibility of using natural language processing and machine learning to automate triage of patients with knee or hip pain by analyzing information from their referral letters.
Collapse
Affiliation(s)
- Irena Spasic
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Kate Button
- School of Healthcare Sciences, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
39
|
Quiroz JC, Laranjo L, Tufanaru C, Kocaballi AB, Rezazadegan D, Berkovsky S, Coiera E. Empirical analysis of Zipf's law, power law, and lognormal distributions in medical discharge reports. Int J Med Inform 2020; 145:104324. [PMID: 33181446 DOI: 10.1016/j.ijmedinf.2020.104324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 10/04/2020] [Accepted: 10/29/2020] [Indexed: 11/15/2022]
Abstract
BACKGROUND Bayesian modelling and statistical text analysis rely on informed probability priors to encourage good solutions. OBJECTIVE This paper empirically analyses whether text in medical discharge reports follow Zipf's law, a commonly assumed statistical property of language where word frequency follows a discrete power-law distribution. METHOD We examined 20,000 medical discharge reports from the MIMIC-III dataset. Methods included splitting the discharge reports into tokens, counting token frequency, fitting power-law distributions to the data, and testing whether alternative distributions-lognormal, exponential, stretched exponential, and truncated power-law-provided superior fits to the data. RESULT Discharge reports are best fit by the truncated power-law and lognormal distributions. Discharge reports appear to be near-Zipfian by having the truncated power-law provide superior fits over a pure power-law. CONCLUSION Our findings suggest that Bayesian modelling and statistical text analysis of discharge report text would benefit from using truncated power-law and lognormal probability priors and non-parametric models that capture power-law behavior.
Collapse
Affiliation(s)
- Juan C Quiroz
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia; Centre for Big Data Research in Health, UNSW, Sydney, Australia.
| | - Liliana Laranjo
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia; Westmead Applied Research Centre, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - Catalin Tufanaru
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Ahmet Baki Kocaballi
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia; Faculty of Engineering and IT, University of Technology Sydney, Australia
| | - Dana Rezazadegan
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia; Swinburne University of Technology, Department of Computer Science and Software Engineering, Melbourne, Australia
| | - Shlomo Berkovsky
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Enrico Coiera
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| |
Collapse
|
40
|
Deshmukh PR, Phalnikar R. Anatomic stage extraction from medical reports of breast Cancer patients using natural language processing. HEALTH AND TECHNOLOGY 2020. [DOI: 10.1007/s12553-020-00479-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
41
|
Bozkurt S, Paul R, Coquet J, Sun R, Banerjee I, Brooks JD, Hernandez-Boussard T. Phenotyping severity of patient-centered outcomes using clinical notes: A prostate cancer use case. Learn Health Syst 2020; 4:e10237. [PMID: 33083539 PMCID: PMC7556418 DOI: 10.1002/lrh2.10237] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 06/15/2020] [Accepted: 06/23/2020] [Indexed: 01/12/2023] Open
Abstract
Introduction A learning health system (LHS) must improve care in ways that are meaningful to patients, integrating patient‐centered outcomes (PCOs) into core infrastructure. PCOs are common following cancer treatment, such as urinary incontinence (UI) following prostatectomy. However, PCOs are not systematically recorded because they can only be described by the patient, are subjective and captured as unstructured text in the electronic health record (EHR). Therefore, PCOs pose significant challenges for phenotyping patients. Here, we present a natural language processing (NLP) approach for phenotyping patients with UI to classify their disease into severity subtypes, which can increase opportunities to provide precision‐based therapy and promote a value‐based delivery system. Methods Patients undergoing prostate cancer treatment from 2008 to 2018 were identified at an academic medical center. Using a hybrid NLP pipeline that combines rule‐based and deep learning methodologies, we classified positive UI cases as mild, moderate, and severe by mining clinical notes. Results The rule‐based model accurately classified UI into disease severity categories (accuracy: 0.86), which outperformed the deep learning model (accuracy: 0.73). In the deep learning model, the recall rates for mild and moderate group were higher than the precision rate (0.78 and 0.79, respectively). A hybrid model that combined both methods did not improve the accuracy of the rule‐based model but did outperform the deep learning model (accuracy: 0.75). Conclusion Phenotyping patients based on indication and severity of PCOs is essential to advance a patient centered LHS. EHRs contain valuable information on PCOs and by using NLP methods, it is feasible to accurately and efficiently phenotype PCO severity. Phenotyping must extend beyond the identification of disease to provide classification of disease severity that can be used to guide treatment and inform shared decision‐making. Our methods demonstrate a path to a patient centered LHS that could advance precision medicine.
Collapse
Affiliation(s)
- Selen Bozkurt
- Department of Medicine, Biomedical Informatics Research Stanford University Stanford California USA
| | - Rohan Paul
- Department of Biomedical Data Sciences Stanford University Stanford California USA
| | - Jean Coquet
- Department of Medicine, Biomedical Informatics Research Stanford University Stanford California USA
| | - Ran Sun
- Department of Medicine, Biomedical Informatics Research Stanford University Stanford California USA
| | - Imon Banerjee
- Department of Biomedical Data Sciences Stanford University Stanford California USA.,Department of Radiology Stanford University Stanford California USA
| | - James D Brooks
- Department of Urology Stanford University Stanford California USA
| | - Tina Hernandez-Boussard
- Department of Medicine, Biomedical Informatics Research Stanford University Stanford California USA.,Department of Biomedical Data Sciences Stanford University Stanford California USA.,Department of Surgery Stanford University Stanford California USA
| |
Collapse
|
42
|
Identification of Malignancies from Free-Text Histopathology Reports Using a Multi-Model Supervised Machine Learning Approach. INFORMATION 2020. [DOI: 10.3390/info11090455] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
We explored various Machine Learning (ML) models to evaluate how each model performs in the task of classifying histopathology reports. We trained, optimized, and performed classification with Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Adaptive Boosting (AB), Decision Trees (DT), Gaussian Naïve Bayes (GNB), Logistic Regression (LR), and Dummy classifier. We started with 60,083 histopathology reports, which reduced to 60,069 after pre-processing. The F1-scores for SVM, SGD KNN, RF, DT, LR, AB, and GNB were 97%, 96%, 96%, 96%, 92%, 96%, 84%, and 88%, respectively, while the misclassification rates were 3.31%, 5.25%, 4.39%, 1.75%, 3.5%, 4.26%, 23.9%, and 19.94%, respectively. The approximate run times were 2 h, 20 min, 40 min, 8 h, 40 min, 10 min, 50 min, and 4 min, respectively. RF had the longest run time but the lowest misclassification rate on the labeled data. Our study demonstrated the possibility of applying ML techniques in the processing of free-text pathology reports for cancer registries for cancer incidence reporting in a Sub-Saharan Africa setting. This is an important consideration for the resource-constrained environments to leverage ML techniques to reduce workloads and improve the timeliness of reporting of cancer statistics.
Collapse
|
43
|
Sanders C, Nahar P, Small N, Hodgson D, Ong BN, Dehghan A, Sharp CA, Dixon WG, Lewis S, Kontopantelis E, Daker-White G, Bower P, Davies L, Kayesh H, Spencer R, McAvoy A, Boaden R, Lovell K, Ainsworth J, Nowakowska M, Shepherd A, Cahoon P, Hopkins R, Allen D, Lewis A, Nenadic G. Digital methods to enhance the usefulness of patient experience data in services for long-term conditions: the DEPEND mixed-methods study. HEALTH SERVICES AND DELIVERY RESEARCH 2020. [DOI: 10.3310/hsdr08280] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Background
Collecting NHS patient experience data is critical to ensure the delivery of high-quality services. Data are obtained from multiple sources, including service-specific surveys and widely used generic surveys. There are concerns about the timeliness of feedback, that some groups of patients and carers do not give feedback and that free-text feedback may be useful but is difficult to analyse.
Objective
To understand how to improve the collection and usefulness of patient experience data in services for people with long-term conditions using digital data capture and improved analysis of comments.
Design
The DEPEND study is a mixed-methods study with four parts: qualitative research to explore the perspectives of patients, carers and staff; use of computer science text-analytics methods to analyse comments; co-design of new tools to improve data collection and usefulness; and implementation and process evaluation to assess use of the tools and any impacts.
Setting
Services for people with severe mental illness and musculoskeletal conditions at four sites as exemplars to reflect both mental health and physical long-terms conditions: an acute trust (site A), a mental health trust (site B) and two general practices (sites C1 and C2).
Participants
A total of 100 staff members with diverse roles in patient experience management, clinical practice and information technology; 59 patients and 21 carers participated in the qualitative research components.
Interventions
The tools comprised a digital survey completed using a tablet device (kiosk) or a pen and paper/online version; guidance and information for patients, carers and staff; text-mining programs; reporting templates; and a process for eliciting and recording verbal feedback in community mental health services.
Results
We found a lack of understanding and experience of the process of giving feedback. People wanted more meaningful and informal feedback to suit local contexts. Text mining enabled systematic analysis, although challenges remained, and qualitative analysis provided additional insights. All sites managed to collect feedback digitally; however, there was a perceived need for additional resources, and engagement varied. Observation indicated that patients were apprehensive about using kiosks but often would participate with support. The process for collecting and recording verbal feedback in mental health services made sense to participants, but was not successfully adopted, with staff workload and technical problems often highlighted as barriers. Staff thought that new methods were insightful, but observation did not reveal changes in services during the testing period.
Conclusions
The use of digital methods can produce some improvements in the collection and usefulness of feedback. Context and flexibility are important, and digital methods need to be complemented with alternative methods. Text mining can provide useful analysis for reporting on large data sets within large organisations, but qualitative analysis may be more useful for small data sets and in small organisations.
Limitations
New practices need time and support to be adopted and this study had limited resources and a limited testing time.
Future work
Further research is needed to improve text-analysis methods for routine use in services and to evaluate the impact of methods (digital and non-digital) on service improvement in varied contexts and among diverse patients and carers.
Funding
This project was funded by the NIHR Health Services and Delivery Research programme and will be published in full in Health Services and Delivery Research; Vol. 8, No. 28. See the NIHR Journals Library website for further project information.
Collapse
Affiliation(s)
- Caroline Sanders
- National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
| | - Papreen Nahar
- National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
| | - Nicola Small
- National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
| | - Damian Hodgson
- Alliance Manchester Business School, University of Manchester, Manchester, UK
| | - Bie Nio Ong
- National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
| | - Azad Dehghan
- Department of Computer Science, University of Manchester, Manchester, UK
| | - Charlotte A Sharp
- National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
| | - William G Dixon
- Centre for Epidemiology Versus Arthritis, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK
| | - Shôn Lewis
- Division of Psychology and Mental Health, University of Manchester, Manchester, UK
| | - Evangelos Kontopantelis
- National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
| | - Gavin Daker-White
- National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
| | - Peter Bower
- National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
| | - Linda Davies
- Centre for Health Economics, University of Manchester, Manchester, UK
| | - Humayun Kayesh
- Department of Computer Science, University of Manchester, Manchester, UK
| | - Rebecca Spencer
- National Institute for Health Research Collaboration for Leadership in Applied Health Research and Care Greater Manchester, Salford Royal NHS Foundation Trust, Salford, UK
| | - Aneela McAvoy
- National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
- National Institute for Health Research Collaboration for Leadership in Applied Health Research and Care Greater Manchester, Salford Royal NHS Foundation Trust, Salford, UK
| | - Ruth Boaden
- Alliance Manchester Business School, University of Manchester, Manchester, UK
- National Institute for Health Research Collaboration for Leadership in Applied Health Research and Care Greater Manchester, Salford Royal NHS Foundation Trust, Salford, UK
| | - Karina Lovell
- National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
- Division of Nursing, Midwifery and Social Work, University of Manchester, Manchester, UK
| | - John Ainsworth
- Centre for Health Informatics, University of Manchester, Manchester, UK
| | - Magdalena Nowakowska
- National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
| | - Andrew Shepherd
- National Institute for Health Research School for Primary Care Research, University of Manchester, Manchester, UK
| | - Patrick Cahoon
- Greater Manchester Mental Health NHS Foundation Trust, Manchester, UK
| | - Richard Hopkins
- Greater Manchester Mental Health NHS Foundation Trust, Manchester, UK
| | | | | | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, UK
| |
Collapse
|
44
|
Labrosse J, Lam T, Sebbag C, Benque M, Abdennebi I, Merckelbagh H, Osdoit M, Priour M, Guerin J, Balezeau T, Grandal B, Coussy F, Bobrie A, Ferrer L, Laas E, Feron JG, Reyal F, Hamy AS. Text Mining in Electronic Medical Records Enables Quick and Efficient Identification of Pregnancy Cases Occurring After Breast Cancer. JCO Clin Cancer Inform 2020; 3:1-12. [PMID: 31626565 DOI: 10.1200/cci.19.00031] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
PURPOSE To apply text mining (TM) technology on electronic medical records (EMRs) of patients with breast cancer (BC) to retrieve the occurrence of a pregnancy after BC diagnosis and compare its performance to manual curation. MATERIALS AND METHODS The training cohort (Cohort A) comprised 344 patients with BC age ≤ 40 years old treated at Institut Curie between 2005 and 2007. Manual curation consisted in manually reviewing each EMR to retrieve pregnancies. TM consisted of first applying a keyword filter ("accouch*" or "enceinte," French terms for "deliver*" and "pregnant," respectively) to select a subset of EMRs, and, second, checking manually EMRs to confirm the pregnancy. Then, we applied our TM algorithm on an independent cohort of patients with BC treated between 2008 and 2012 (Cohort B). RESULTS In Cohort A, 36 pregnancies were identified among 344 patients (10.5%; 2,829 person-years of EMR). Thirty were identified by manual review versus 35 by TM. TM resulted in a lower percentage of manual checking (26.7% v 100%, respectively) and substantial time gains (time to identify a pregnancy: 13 minutes for TM v 244 minutes for manual curation, respectively). Presence of any of the two TM filters showed excellent sensitivity (97%) and negative predictive value (100%). In Cohort B, 67 pregnancies were identified among 1,226 patients (5.5%; 7,349 person-years of EMR). Similarly, for Cohort B, TM spared 904 (73.7%) EMRs from manual review and quickly generated a cohort of 67 pregnancies after BC. Incidence rate of pregnancy after BC was 0.01 pregnancy per person-year of EMR in both cohorts. CONCLUSION TM is highly efficient to quickly identify rare events and is a promising tool to improve rapidity, efficiency, and costs of medical research.
Collapse
Affiliation(s)
| | - Thanh Lam
- Geneva University Hospitals, Geneva, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | | - Loïc Ferrer
- Institut Curie, U900, Hôpital René Huguenin, Saint-Cloud, France
| | | | | | - Fabien Reyal
- Paris 5 Research University, INSERM U932, Institut Curie, Paris, France
| | - Anne-Sophie Hamy
- Paris 5 Research University, INSERM U932, Institut Curie, Paris, France
| |
Collapse
|
45
|
Krsnik I, Glavaš G, Krsnik M, Miletić D, Štajduhar I. Automatic Annotation of Narrative Radiology Reports. Diagnostics (Basel) 2020; 10:E196. [PMID: 32244833 PMCID: PMC7235892 DOI: 10.3390/diagnostics10040196] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 03/27/2020] [Accepted: 03/27/2020] [Indexed: 12/04/2022] Open
Abstract
Narrative texts in electronic health records can be efficiently utilized for building decision support systems in the clinic, only if they are correctly interpreted automatically in accordance with a specified standard. This paper tackles the problem of developing an automated method of labeling free-form radiology reports, as a precursor for building query-capable report databases in hospitals. The analyzed dataset consists of 1295 radiology reports concerning the condition of a knee, retrospectively gathered at the Clinical Hospital Centre Rijeka, Croatia. Reports were manually labeled with one or more labels from a set of 10 most commonly occurring clinical conditions. After primary preprocessing of the texts, two sets of text classification methods were compared: (1) traditional classification models-Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), and Random Forests (RF)-coupled with Bag-of-Words (BoW) features (i.e., symbolic text representation) and (2) Convolutional Neural Network (CNN) coupled with dense word vectors (i.e., word embeddings as a semantic text representation) as input features. We resorted to nested 10-fold cross-validation to evaluate the performance of competing methods using accuracy, precision, recall, and F 1 score. The CNN with semantic word representations as input yielded the overall best performance, having a micro-averaged F 1 score of 86 . 7 % . The CNN classifier yielded particularly encouraging results for the most represented conditions: degenerative disease ( 95 . 9 % ), arthrosis ( 93 . 3 % ), and injury ( 89 . 2 % ). As a data-hungry deep learning model, the CNN, however, performed notably worse than the competing models on underrepresented classes with fewer training instances such as multicausal disease or metabolic disease. LR, RF, and SVM performed comparably well, with the obtained micro-averaged F 1 scores of 84 . 6 % , 82 . 2 % , and 82 . 1 % , respectively.
Collapse
Affiliation(s)
- Ivan Krsnik
- Department of Computer Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, 51000 Rijeka, Croatia;
| | - Goran Glavaš
- School of Business Informatics and Mathematics, University of Mannheim, 68159 Mannheim, Germany;
| | - Marina Krsnik
- Faculty of Veterinary Medicine, University of Zagreb, Heinzelova 55, 10000 Zagreb, Croatia;
| | - Damir Miletić
- Clinical Hospital Centre Rijeka, University of Rijeka, Krešimirova 42, 51000 Rijeka, Croatia;
| | - Ivan Štajduhar
- Department of Computer Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, 51000 Rijeka, Croatia;
- Center for Artificial Intelligence and Cybersecurity, University of Rijeka, Radmile Matejčić 2, 51000 Rijeka, Croatia
| |
Collapse
|
46
|
Spasic I, Nenadic G. Clinical Text Data in Machine Learning: Systematic Review. JMIR Med Inform 2020; 8:e17984. [PMID: 32229465 PMCID: PMC7157505 DOI: 10.2196/17984] [Citation(s) in RCA: 110] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 02/24/2020] [Accepted: 02/24/2020] [Indexed: 12/22/2022] Open
Abstract
Background Clinical narratives represent the main form of communication within health care, providing a personalized account of patient history and assessments, and offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. Objective The main aim of this study was to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigated the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. Methods Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified 110 relevant studies and extracted information about text data used to support machine learning, NLP tasks supported, and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation, and any relevant statistics. Results The majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents, with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing the predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free-text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable because of the sensitive nature of data considered. Besides the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The majority of studies focused on text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management, and surveillance. Conclusions We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which do not require data annotation.
Collapse
Affiliation(s)
- Irena Spasic
- School of Computer Science and Informatics, Cardiff University, Cardiff, United Kingdom
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
47
|
DES-ROD: Exploring Literature to Develop New Links between RNA Oxidation and Human Diseases. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2020; 2020:5904315. [PMID: 32308806 PMCID: PMC7142358 DOI: 10.1155/2020/5904315] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 02/21/2020] [Indexed: 12/27/2022]
Abstract
Normal cellular physiology and biochemical processes require undamaged RNA molecules. However, RNAs are frequently subjected to oxidative damage. Overproduction of reactive oxygen species (ROS) leads to RNA oxidation and disturbs redox (oxidation-reduction reaction) homeostasis. When oxidation damage affects RNA carrying protein-coding information, this may result in the synthesis of aberrant proteins as well as a lower efficiency of translation. Both of these, as well as imbalanced redox homeostasis, may lead to numerous human diseases. The number of studies on the effects of RNA oxidative damage in mammals is increasing by year due to the understanding that this oxidation fundamentally leads to numerous human diseases. To enable researchers in this field to explore information relevant to RNA oxidation and effects on human diseases, we developed DES-ROD, an online knowledgebase that contains processed information from 298,603 relevant documents that consist of PubMed abstracts and PubMed Central full-text articles. The system utilizes concepts/terms from 38 curated thematic dictionaries mapped to the analyzed documents. Researchers can explore enriched concepts, as well as enriched pairs of putatively associated concepts. In this way, one can explore mutual relationships between any combinations of two concepts from used dictionaries. Dictionaries cover a wide range of biomedical topics, such as human genes and proteins, pathways, Gene Ontology categories, mutations, noncoding RNAs, enzymes, toxins, metabolites, and diseases. This makes insights into different facets of the effects of RNA oxidation and the control of this process possible. The usefulness of the DES-ROD system is demonstrated by case studies on some known information, as well as potentially novel information involving RNA oxidation and diseases. DES-ROD is the first knowledgebase based on text and data mining that focused on the exploration of RNA oxidation and human diseases.
Collapse
|
48
|
Hsiao YW, Lu TP. Text-mining in cancer research may help identify effective treatments. Transl Lung Cancer Res 2020; 8:S460-S463. [PMID: 32038938 DOI: 10.21037/tlcr.2019.12.20] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Affiliation(s)
- Yi-Wen Hsiao
- Institute of Epidemiology and Preventive Medicine, Department of Public Health, College of Public Health, National Taiwan University, Taipei
| | - Tzu-Pin Lu
- Institute of Epidemiology and Preventive Medicine, Department of Public Health, College of Public Health, National Taiwan University, Taipei
| |
Collapse
|
49
|
Karystianis G, Florez-Vargas O, Butler T, Nenadic G. A rule-based approach to identify patient eligibility criteria for clinical trials from narrative longitudinal records. JAMIA Open 2020; 2:521-527. [PMID: 32025649 PMCID: PMC6993990 DOI: 10.1093/jamiaopen/ooz041] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 07/16/2019] [Accepted: 10/03/2019] [Indexed: 11/13/2022] Open
Abstract
Objective Achieving unbiased recognition of eligible patients for clinical trials from their narrative longitudinal clinical records can be time consuming. We describe and evaluate a knowledge-driven method that identifies whether a patient meets a selected set of 13 eligibility clinical trial criteria from their longitudinal clinical records, which was one of the tasks of the 2018 National NLP Clinical Challenges. Materials and Methods The approach developed uses rules combined with manually crafted dictionaries that characterize the domain. The rules are based on common syntactical patterns observed in text indicating or describing explicitly a criterion. Certain criteria were classified as "met" only when they occurred within a designated time period prior to the most recent narrative of a patient record and were dealt through their position in text. Results The system was applied to an evaluation set of 86 unseen clinical records and achieved a microaverage F1-score of 89.1% (with a micro F1-score of 87.0% and 91.2% for the patients that met and did not meet the criteria, respectively). Most criteria returned reliable results (drug abuse, 92.5%; Hba1c, 91.3%) while few (eg, advanced coronary artery disease, 72.0%; myocardial infarction within 6 months of the most recent narrative, 47.5%) proved challenging enough. Conclusion Overall, the results are encouraging and indicate that automated text mining methods can be used to process clinical records to recognize whether a patient meets a set of clinical trial criteria and could be leveraged to reduce the workload of humans screening patients for trials.
Collapse
Affiliation(s)
| | - Oscar Florez-Vargas
- Laboratory of Translational Genomics, Department of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, USA
| | - Tony Butler
- Kirby Institute, University of New South Wales, Sydney, Australia
| | - Goran Nenadic
- School of Computer Science, University of Manchester, Manchester, UK
| |
Collapse
|
50
|
The evolution of Health & Place: Text mining papers published between 1995 and 2018. Health Place 2020; 61:102207. [DOI: 10.1016/j.healthplace.2019.102207] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 09/13/2019] [Accepted: 09/13/2019] [Indexed: 01/26/2023]
|