1
|
Hicks-Courant K, Ko EM, Matsuo K, Melamed A, Nasioudis D, Rauh-Hain JA, Uppal S, Wright JD, Ramirez PT. Secondary databases in gynecologic cancer research. Int J Gynecol Cancer 2024; 34:1619-1629. [PMID: 39043573 DOI: 10.1136/ijgc-2024-005677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/25/2024] Open
Abstract
Observational and cohort studies using large databases have made important contributions to gynecologic oncology. Knowledge of the advantages and potential limitations of commonly used databases benefits both readers and reviewers. In this review, researchers familiar with National Cancer Database (NCDB), Surveillance, Epidemiology, and End Results Program (SEER), SEER-Medicare, MarketScan, Healthcare Cost and Utilization Project (HCUP), National Surgical Quality Improvement Program (NSQIP), and Premier, describe each database, its included data, access, management, storage, highlights, and limitations. A better understanding of these commonly used datasets can help readers, reviewers, and researchers to more effectively interpret and apply study results, evaluate new research studies, and develop compelling and practice-changing research.
Collapse
Affiliation(s)
- Katherine Hicks-Courant
- Ann B. Barshinger Cancer Institute, Lancaster General Health, Lancaster, Pennsylvania, USA
- Division of Gynecologic Oncology, University of Pennsylvania Health System, Philadelphia, Pennsylvania, USA
| | - Emily Meichun Ko
- Division of Gynecologic Oncology, University of Pennsylvania Health System, Philadelphia, Pennsylvania, USA
- Leonard Davis Institute of Health Economics, Philadelphia, Pennsylvania, USA
- Penn Center for Cancer Care Innovation, Abramson Cancer Center, University of Pennsylvania, Philadelphia, PA, USA
| | - Koji Matsuo
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, University of Southern California, Los Angeles, California, USA
- USC Norris Comprehensive Cancer Center, Los Angeles, California, USA
| | - Alexander Melamed
- Vincent Department of Obstetrics and Gynecology, Massachusetts General Hospital, Boston, Massachusetts, USA
- Department of Obstetrics Gynecology and Reproductive Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Dimitrios Nasioudis
- Division of Gynecologic Oncology, University of Pennsylvania Health System, Philadelphia, Pennsylvania, USA
| | - Jose Alejandro Rauh-Hain
- Department of Gynecologic Oncology and Reproductive Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Shitanshu Uppal
- Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, Michigan, USA
| | - Jason D Wright
- Department of Obstetrics and Gynecology, Columbia University, New York, New York, USA
| | - Pedro T Ramirez
- Department of Obstetrics and Gynecology, Houston Methodist Hospital, Houston, Texas, USA
| |
Collapse
|
2
|
Hamdoune M, Jounaidi K, Ammari N, Gantare A. Digital health for cancer symptom management in palliative medicine: systematic review. BMJ Support Palliat Care 2024:spcare-2024-005107. [PMID: 39317426 DOI: 10.1136/spcare-2024-005107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 09/08/2024] [Indexed: 09/26/2024]
Abstract
BACKGROUND Digital health technologies (DHTs) play a crucial role in symptom management, particularly in palliative care, by providing patients with accessible tools to monitor and manage their symptoms effectively. The aim of this systematic review was to examine and synthesise the scientific literature on DHTs for symptom management in palliative oncology care. METHODS A systematic review was conducted in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for systematic reviews and meta-analyses from 2 June to 20 June 2024. Databases including Scopus, Web of Science, ScienceDirect, PubMed and the Cochrane Library were searched. Data were extracted using a standardised form based on the PICOTT (Population, Intervention, Comparison, Outcome, Type and Technology) framework. The quality of the included studies was assessed using the Appraisal of Guidelines for Research & Evaluation (AGREE) II tool during the selection process. RESULTS The systematic review included seven articles describing six DHTs from five countries: the UK, Kenya, Tanzania, the Netherlands and the USA. The findings of this comprehensive literature review elucidate four principal themes: the specific types of DHTs used for symptom management in palliative cancer care, their roles and advantages, as well as the factors that limit or promote their adoption by patients and healthcare professionals. CONCLUSION The findings of this review give valuable insights into the ongoing discourse on integrating digital health solutions into palliative care practices, highlighting its potential role in enhancing symptom management within palliative cancer care and showcasing its possible benefits while also identifying key factors influencing their adoption among patients and healthcare professionals.
Collapse
Affiliation(s)
- Meryem Hamdoune
- Hassan First University of Settat, Higher Institute of Health Sciences, Laboratory of Health Sciences and Technologies, Settat, Morocco
| | - Khaoula Jounaidi
- Hassan First University of Settat, Higher Institute of Health Sciences, Laboratory of Health Sciences and Technologies, Settat, Morocco
| | - Nada Ammari
- Hassan First University of Settat, Higher Institute of Health Sciences, Laboratory of Health Sciences and Technologies, Settat, Morocco
| | - Abdellah Gantare
- Hassan First University of Settat, Higher Institute of Health Sciences, Laboratory of Health Sciences and Technologies, Settat, Morocco
| |
Collapse
|
3
|
Mapundu MT, Kabudula CW, Musenge E, Olago V, Celik T. Text mining of verbal autopsy narratives to extract mortality causes and most prevalent diseases using natural language processing. PLoS One 2024; 19:e0308452. [PMID: 39298425 DOI: 10.1371/journal.pone.0308452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 07/24/2024] [Indexed: 09/21/2024] Open
Abstract
Verbal autopsy (VA) narratives play a crucial role in understanding and documenting the causes of mortality, especially in regions lacking robust medical infrastructure. In this study, we propose a comprehensive approach to extract mortality causes and identify prevalent diseases from VA narratives utilizing advanced text mining techniques, so as to better understand the underlying health issues leading to mortality. Our methodology integrates n-gram-based language processing, Latent Dirichlet Allocation (LDA), and BERTopic, offering a multi-faceted analysis to enhance the accuracy and depth of information extraction. This is a retrospective study that uses secondary data analysis. We used data from the Agincourt Health and Demographic Surveillance Site (HDSS), which had 16338 observations collected between 1993 and 2015. Our text mining steps entailed data acquisition, pre-processing, feature extraction, topic segmentation, and discovered knowledge. The results suggest that the HDSS population may have died from mortality causes such as vomiting, chest/stomach pain, fever, coughing, loss of weight, low energy, headache. Additionally, we discovered that the most prevalent diseases entailed human immunodeficiency virus (HIV), tuberculosis (TB), diarrhoea, cancer, neurological disorders, malaria, diabetes, high blood pressure, chronic ailments (kidney, heart, lung, liver), maternal and accident related deaths. This study is relevant in that it avails valuable insights regarding mortality causes and most prevalent diseases using novel text mining approaches. These results can be integrated in the diagnosis pipeline for ease of human annotation and interpretation. As such, this will help with effective informed intervention programmes that can improve primary health care systems and chronic based delivery, thus increasing life expectancy.
Collapse
Affiliation(s)
- Michael Tonderai Mapundu
- Department of Epidemiology and Biostatistics, School of Public Health, University of the Witwatersrand, Johannesburg, South Africa
| | - Chodziwadziwa Whiteson Kabudula
- Department of Epidemiology and Biostatistics, School of Public Health, University of the Witwatersrand, Johannesburg, South Africa
- MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), Johannesburg, South Africa
| | - Eustasius Musenge
- Department of Epidemiology and Biostatistics, School of Public Health, University of the Witwatersrand, Johannesburg, South Africa
| | - Victor Olago
- National Health Laboratory Service (NHLS), National Cancer Registry, Johannesburg, South Africa
| | - Turgay Celik
- Wits Institute of Data Science, University of The Witwatersrand, Johannesburg, South Africa
- School of Electrical and Information Engineering, University of The Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
4
|
Martín-Noguerol T, López-Úbeda P, Pons-Escoda A, Luna A. Natural language processing deep learning models for the differential between high-grade gliomas and metastasis: what if the key is how we report them? Eur Radiol 2024; 34:2113-2120. [PMID: 37665389 DOI: 10.1007/s00330-023-10202-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/10/2023] [Accepted: 07/20/2023] [Indexed: 09/05/2023]
Abstract
OBJECTIVES The differential between high-grade glioma (HGG) and metastasis remains challenging in common radiological practice. We compare different natural language processing (NLP)-based deep learning models to assist radiologists based on data contained in radiology reports. METHODS This retrospective study included 185 MRI reports between 2010 and 2022 from two different institutions. A total of 117 reports were used for the training and 21 were reserved for the validation set, while the rest were used as a test set. A comparison of the performance of different deep learning models for HGG and metastasis classification has been carried out. Specifically, Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), a hybrid version of BiLSTM and CNN, and a radiology-specific Bidirectional Encoder Representations from Transformers (RadBERT) model were used. RESULTS For the classification of MRI reports, the CNN network provided the best results among all tested, showing a macro-avg precision of 87.32%, a sensitivity of 87.45%, and an F1 score of 87.23%. In addition, our NLP algorithm detected keywords such as tumor, temporal, and lobe to positively classify a radiological report as HGG or metastasis group. CONCLUSIONS A deep learning model based on CNN enables radiologists to discriminate between HGG and metastasis based on MRI reports with high-precision values. This approach should be considered an additional tool in diagnosing these central nervous system lesions. CLINICAL RELEVANCE STATEMENT The use of our NLP model enables radiologists to differentiate between patients with high-grade glioma and metastasis based on their MRI reports and can be used as an additional tool to the conventional image-based approach for this challenging task. KEY POINTS • Differential between high-grade glioma and metastasis is still challenging in common radiological practice. • Natural language processing (NLP)-based deep learning models can assist radiologists based on data contained in radiology reports. • We have developed and tested a natural language processing model for discriminating between high-grade glioma and metastasis based on MRI reports that show high precision for this task.
Collapse
Affiliation(s)
| | | | - Albert Pons-Escoda
- Radiology Department, Hospital Universitari de Bellvitge, Barcelona, Spain
| | - Antonio Luna
- Radiology Department, MRI Unit, HT Medica, Carmelo Torres 2, 23007, Jaén, Spain
| |
Collapse
|
5
|
Yang E, Li MD, Raghavan S, Deng F, Lang M, Succi MD, Huang AJ, Kalpathy-Cramer J. Transformer versus traditional natural language processing: how much data is enough for automated radiology report classification? Br J Radiol 2023; 96:20220769. [PMID: 37162253 PMCID: PMC10461267 DOI: 10.1259/bjr.20220769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 04/21/2023] [Accepted: 04/26/2023] [Indexed: 05/11/2023] Open
Abstract
OBJECTIVES Current state-of-the-art natural language processing (NLP) techniques use transformer deep-learning architectures, which depend on large training datasets. We hypothesized that traditional NLP techniques may outperform transformers for smaller radiology report datasets. METHODS We compared the performance of BioBERT, a deep-learning-based transformer model pre-trained on biomedical text, and three traditional machine-learning models (gradient boosted tree, random forest, and logistic regression) on seven classification tasks given free-text radiology reports. Tasks included detection of appendicitis, diverticulitis, bowel obstruction, and enteritis/colitis on abdomen/pelvis CT reports, ischemic infarct on brain CT/MRI reports, and medial and lateral meniscus tears on knee MRI reports (7,204 total annotated reports). The performance of NLP models on held-out test sets was compared after training using the full training set, and 2.5%, 10%, 25%, 50%, and 75% random subsets of the training data. RESULTS In all tested classification tasks, BioBERT performed poorly at smaller training sample sizes compared to non-deep-learning NLP models. Specifically, BioBERT required training on approximately 1,000 reports to perform similarly or better than non-deep-learning models. At around 1,250 to 1,500 training samples, the testing performance for all models began to plateau, where additional training data yielded minimal performance gain. CONCLUSIONS With larger sample sizes, transformer NLP models achieved superior performance in radiology report binary classification tasks. However, with smaller sizes (<1000) and more imbalanced training data, traditional NLP techniques performed better. ADVANCES IN KNOWLEDGE Our benchmarks can help guide clinical NLP researchers in selecting machine-learning models according to their dataset characteristics.
Collapse
Affiliation(s)
| | - Matthew D Li
- Department of Radiology and Diagnostic Imaging, University of Alberta, Edmonton, Alberta, Canada
| | - Shruti Raghavan
- Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Francis Deng
- Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Min Lang
- Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Marc D Succi
- Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Ambrose J Huang
- Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | | |
Collapse
|
6
|
Laurent G, Craynest F, Thobois M, Hajjaji N. Automatic Classification of Tumor Response From Radiology Reports With Rule-Based Natural Language Processing Integrated Into the Clinical Oncology Workflow. JCO Clin Cancer Inform 2023; 7:e2200139. [PMID: 36780606 DOI: 10.1200/cci.22.00139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023] Open
Abstract
PURPOSE Imaging reports in oncology provide critical information about the disease evolution that should be timely shared to tailor the clinical decision making and care coordination of patients with advanced cancer. However, tumor response stays unstructured in free-text and underexploited. Natural language processing (NLP) methods can help provide this critical information into the electronic health records (EHR) in real time to assist health care workers. METHODS A rule-based algorithm was developed using SAS tools to automatically extract and categorize tumor response within progression or no progression categories. 2,970 magnetic resonance imaging, computed tomography scan, and positron emission tomography French reports were extracted from the EHR of a large comprehensive cancer center to build a 2,637-document training set and a 603-document validation set. The model was also tested on 189 imaging reports from 46 different radiology centers. A tumor dashboard was created in the EHR using the Timeline tool of the vis.js javascript library. RESULTS An NLP methodology was applied to create an ontology of radiographic terms defining tumor response, mapping text to five main concepts, and application decision rules on the basis of clinical practice RECIST guidelines. The model achieved an overall accuracy of 0.88 (ranging from 0.87 to 0.94), with similar performance on both progression and no progression classification. The overall accuracy was 0.82 on reports from different radiology centers. Data were visualized and organized in a dynamic tumor response timeline. This tool was deployed successfully at our institution both retrospectively and prospectively as part of an automatic pipeline to screen reports and classify tumor response in real time for all metastatic patients. CONCLUSION Our approach provides an NLP-based framework to structure and classify tumor response from the EHR and integrate tumor response classification into the clinical oncology workflow.
Collapse
Affiliation(s)
- Gery Laurent
- Department of Information Systems, Oscar Lambret Cancer Center, Lille, France
| | - Franck Craynest
- Department of Information Systems, Oscar Lambret Cancer Center, Lille, France
| | - Maxime Thobois
- Department of Information Systems, Oscar Lambret Cancer Center, Lille, France
| | - Nawale Hajjaji
- Department of Medical Oncology, Oscar Lambret Cancer Center, Lille, France.,Inserm, U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), University of Lille, Lille, France
| |
Collapse
|
7
|
Binsfeld Gonçalves L, Nesic I, Obradovic M, Stieltjes B, Weikert T, Bremerich J. Natural Language Processing and Graph Theory: Making Sense of Imaging Records in a Novel Representation Frame. JMIR Med Inform 2022; 10:e40534. [PMID: 36542426 PMCID: PMC9813822 DOI: 10.2196/40534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/13/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND A concise visualization framework of related reports would increase readability and improve patient management. To this end, temporal referrals to prior comparative exams are an essential connection to previous exams in written reports. Due to unstructured narrative texts' variable structure and content, their extraction is hampered by poor computer readability. Natural language processing (NLP) permits the extraction of structured information from unstructured texts automatically and can serve as an essential input for such a novel visualization framework. OBJECTIVE This study proposes and evaluates an NLP-based algorithm capable of extracting the temporal referrals in written radiology reports, applies it to all the radiology reports generated for 10 years, introduces a graphical representation of imaging reports, and investigates its benefits for clinical and research purposes. METHODS In this single-center, university hospital, retrospective study, we developed a convolutional neural network capable of extracting the date of referrals from imaging reports. The model's performance was assessed by calculating precision, recall, and F1-score using an independent test set of 149 reports. Next, the algorithm was applied to our department's radiology reports generated from 2011 to 2021. Finally, the reports and their metadata were represented in a modulable graph. RESULTS For extracting the date of referrals, the named-entity recognition (NER) model had a high precision of 0.93, a recall of 0.95, and an F1-score of 0.94. A total of 1,684,635 reports were included in the analysis. Temporal reference was mentioned in 53.3% (656,852/1,684,635), explicitly stated as not available in 21.0% (258,386/1,684,635), and omitted in 25.7% (317,059/1,684,635) of the reports. Imaging records can be visualized in a directed and modulable graph, in which the referring links represent the connecting arrows. CONCLUSIONS Automatically extracting the date of referrals from unstructured radiology reports using deep learning NLP algorithms is feasible. Graphs refined the selection of distinct pathology pathways, facilitated the revelation of missing comparisons, and enabled the query of specific referring exam sequences. Further work is needed to evaluate its benefits in clinics, research, and resource planning.
Collapse
Affiliation(s)
- Laurent Binsfeld Gonçalves
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Ivan Nesic
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Marko Obradovic
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Bram Stieltjes
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Thomas Weikert
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Jens Bremerich
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| |
Collapse
|
8
|
Nandish S, R J P, N M N. Natural Language Processing Approaches for Automated Multilevel and Multiclass Classification of Breast Lesions on Free-Text Cytopathology Reports. JCO Clin Cancer Inform 2022; 6:e2200036. [PMID: 36103641 DOI: 10.1200/cci.22.00036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE The extensive growth and use of electronic health records (EHRs) and extending medical literature have led to huge opportunities to automate the extraction of relevant clinical information that helps in concise and effective clinical decision support. However, processing such information has traditionally been dependent on labor-intensive processes with human errors such as fatigue, oversight, and interobserver variability. Hence, this study aims at the processing of EHRs and performing multilevel and multiclass classification by fetching dominant characteristic features that are sufficient to detect and differentiate various types of breast lesions. PATIENTS AND METHODS In this study, unstructured EHRs on breast lesions obtained through fine-needle aspiration cytology technique are considered. The raw text was normalized into structured tabular form and converted to scores by performing sentiment analysis that helps to decide the total polarity or class label of the EHR. Supervised machine learning approaches, namely random forest and feed-forward neural network trained using Levenberg-Marquardt training function, are used for classification of the collected EHR data set containing 2,879 records that are split in the ratio of 80:20 as training and testing data sets, respectively. RESULTS Random forest and feed-forward neural network classifiers gave the best performance with an accuracy of 99.36%, an overall receiver operating characteristic-area under the curve of 99.2%, a correlation with ground truth of 98.3%, and a histopathologic correlation of 98.6%. CONCLUSION Natural language processing has huge potential to automate the extraction of clinical features from breast lesions. The proposed multilevel and multiclass classification approach is used to classify 13 different types of breast lesions with 20 different labels into five classes to decide the type of treatment that should be given to patients by a physician or oncologist.
Collapse
Affiliation(s)
- Sonali Nandish
- Department of Computer Science and Engineering, JSS Science and Technology University, Mysuru, Karnataka, India
| | - Prathibha R J
- Department of Information Science and Engineering, JSS Science and Technology University, Mysuru, Karnataka, India
| | - Nandini N M
- Department of Pathology, JSS Academy of Higher Education and Research, Mysuru, Karnataka, India
| |
Collapse
|
9
|
Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022; 6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open
Abstract
PURPOSE The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.
Collapse
Affiliation(s)
- Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Xiaoyang Ruan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sungrim Moon
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Michelle Mai
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Irbaz B. Riaz
- Department of Hematology/Oncology, Mayo Clinic, Scottsdale, AZ
| | - Nan Wang
- Department of Computer Science and Engineering, College of Science and Engineering, University of Minnesota, Minneapolis, MN
| | - Ping Yang
- Department of Quantitative Health Sciences, Mayo Clinic, Scottsdale, AZ
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX
| | - Jeremy L. Warner
- Departments of Medicine (Hematology/Oncology), Vanderbilt University, Nashville, TN
- Department Biomedical Informatics, Vanderbilt University, Nashville, TN
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| |
Collapse
|
10
|
Batch KE, Yue J, Darcovich A, Lupton K, Liu CC, Woodlock DP, El Amine MAK, Causa-Andrieu PI, Gazit L, Nguyen GH, Zulkernine F, Do RKG, Simpson AL. Developing a Cancer Digital Twin: Supervised Metastases Detection From Consecutive Structured Radiology Reports. Front Artif Intell 2022; 5:826402. [PMID: 35310959 PMCID: PMC8924403 DOI: 10.3389/frai.2022.826402] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 01/27/2022] [Indexed: 11/13/2022] Open
Abstract
The development of digital cancer twins relies on the capture of high-resolution representations of individual cancer patients throughout the course of their treatment. Our research aims to improve the detection of metastatic disease over time from structured radiology reports by exposing prediction models to historical information. We demonstrate that Natural language processing (NLP) can generate better weak labels for semi-supervised classification of computed tomography (CT) reports when it is exposed to consecutive reports through a patient's treatment history. Around 714,454 structured radiology reports from Memorial Sloan Kettering Cancer Center adhering to a standardized departmental structured template were used for model development with a subset of the reports included for validation. To develop the models, a subset of the reports was curated for ground-truth: 7,732 total reports in the lung metastases dataset from 867 individual patients; 2,777 reports in the liver metastases dataset from 315 patients; and 4,107 reports in the adrenal metastases dataset from 404 patients. We use NLP to extract and encode important features from the structured text reports, which are then used to develop, train, and validate models. Three models—a simple convolutional neural network (CNN), a CNN augmented with an attention layer, and a recurrent neural network (RNN)—were developed to classify the type of metastatic disease and validated against the ground truth labels. The models use features from consecutive structured text radiology reports of a patient to predict the presence of metastatic disease in the reports. A single-report model, previously developed to analyze one report instead of multiple past reports, is included and the results from all four models are compared based on accuracy, precision, recall, and F1-score. The best model is used to label all 714,454 reports to generate metastases maps. Our results suggest that NLP models can extract cancer progression patterns from multiple consecutive reports and predict the presence of metastatic disease in multiple organs with higher performance when compared with a single-report-based prediction. It demonstrates a promising automated approach to label large numbers of radiology reports without involving human experts in a time- and cost-effective manner and enables tracking of cancer progression over time.
Collapse
Affiliation(s)
- Karen E. Batch
- School of Computing, Queen's University, Kingston, ON, Canada
- *Correspondence: Karen E. Batch
| | - Jianwei Yue
- School of Computing, Queen's University, Kingston, ON, Canada
| | - Alex Darcovich
- School of Computing, Queen's University, Kingston, ON, Canada
| | - Kaelan Lupton
- School of Computing, Queen's University, Kingston, ON, Canada
| | - Corinne C. Liu
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - David P. Woodlock
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - Mohammad Ali K. El Amine
- Department of Graduate Medical Education, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - Pamela I. Causa-Andrieu
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - Lior Gazit
- Department of Strategy and Innovation, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - Gary H. Nguyen
- Department of Strategy and Innovation, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | | | - Richard K. G. Do
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - Amber L. Simpson
- School of Computing, Queen's University, Kingston, ON, Canada
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada
| |
Collapse
|
11
|
Lin S, Lin Y, Wu K, Wang Y, Feng Z, Duan M, Liu S, Fan Y, Huang L, Zhou F. FeCO3, constructing the network biomarkers using the inter-feature correlation coefficients and its application in detecting high-order breast cancer biomarkers. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220124123303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aims:
This study aims to formulate the inter-feature correlation as the engineered features.
Background:
Modern biotechnologies tend to generate a huge number of characteristics of a sample, while an OMIC dataset usually has a few dozens or hundreds of samples due to the high costs of generating the OMIC data. So many bio-OMIC studies assumed the inter-feature independence and selected a feature with a high phenotype-association.
Objective:
However, many features are closely associated with each other due to their physical or functional interactions, which may be utilized as a new view of features.
Method:
This study proposed a feature engineering algorithm based on the correlation coefficients (FeCO3) by utilizing the correlations between a given sample and a few reference samples. A comprehensive evaluation was carried out for the proposed FeCO3 network features using 24 bio-OMIC datasets.
Result:
The experimental data suggested that the newly calculated FeCO3 network features tended to achieve better classification performances than the original features, using the same popular feature selection and classification algorithms. The FeCO3 network features were also consistently supported by the literature. FeCO3 was utilized to investigate the high-order engineered biomarkers of breast cancer, and detected the PBX2 gene (Pre-B-Cell Leukemia Transcription Factor 2) as one of the candidate breast cancer biomarkers. Although the two methylated residues cg14851325 (Pvalue=8.06e-2) and cg16602460 (Pvalue=1.19e-1) within PBX2 did not have statistically significant association with breast cancers, the high-order inter-feature correlations showed a significant association with breast cancers.
Conclusion:
The proposed FeCO3 network features calculated the high-order inter-feature correlations as novel features, and may facilitate the investigations of complex diseases from this new perspective. The source code is available in FigShare at 10.6084/m9.figshare.13550051 or the web site http://www.healthinformaticslab.org/supp/ .
Collapse
Affiliation(s)
- Shenggeng Lin
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yuqi Lin
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Kexin Wu
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Yueying Wang
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, Jilin Province, China
| | - Zixuan Feng
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Meiyu Duan
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Shuai Liu
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Yusi Fan
- College of Software, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Lan Huang
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Fengfeng Zhou
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| |
Collapse
|
12
|
Foundations of Machine Learning-Based Clinical Prediction Modeling: Part V-A Practical Approach to Regression Problems. ACTA NEUROCHIRURGICA. SUPPLEMENT 2021; 134:43-50. [PMID: 34862526 DOI: 10.1007/978-3-030-85292-4_6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
This chapter goes through the steps required to train and validate a simple, machine learning-based clinical prediction model for any continuous outcome. We supply fully structured code for the readers to download and execute in parallel to this section, as well as a simulated database of 10,000 glioblastoma patients who underwent microsurgery, and predict survival from diagnosis in months. We walk the reader through each step, including import, checking, splitting of data. In terms of pre-processing, we focus on how to practically implement imputation using a k-nearest neighbor algorithm. We also illustrate how to select features based on recursive feature elimination and how to use k-fold cross validation. We demonstrate a generalized linear model, a generalized additive model, a random forest, a ridge regressor, and a Least Absolute Shrinkage and Selection Operator (LASSO) regressor. Specifically for regression, we discuss how to evaluate root mean square error (RMSE), mean average error (MAE), and the R2 statistic, as well as how a quantile-quantile plot can be used to assess the performance of the regressor along the spectrum of the outcome variable, similarly to calibration when dealing with binary outcomes. Finally, we explain how to arrive at a measure of variable importance using a universal, nonparametric method.
Collapse
|
13
|
Foundations of Machine Learning-Based Clinical Prediction Modeling: Part IV-A Practical Approach to Binary Classification Problems. ACTA NEUROCHIRURGICA. SUPPLEMENT 2021; 134:33-41. [PMID: 34862525 DOI: 10.1007/978-3-030-85292-4_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
We illustrate the steps required to train and validate a simple, machine learning-based clinical prediction model for any binary outcome, such as, for example, the occurrence of a complication, in the statistical programming language R. To illustrate the methods applied, we supply a simulated database of 10,000 glioblastoma patients who underwent microsurgery, and predict the occurrence of 12-month survival. We walk the reader through each step, including import, checking, and splitting of datasets. In terms of pre-processing, we focus on how to practically implement imputation using a k-nearest neighbor algorithm, and how to perform feature selection using recursive feature elimination. When it comes to training models, we apply the theory discussed in Parts I-III. We show how to implement bootstrapping and to evaluate and select models based on out-of-sample error. Specifically for classification, we discuss how to counteract class imbalance by using upsampling techniques. We discuss how the reporting of a minimum of accuracy, area under the curve (AUC), sensitivity, and specificity for discrimination, as well as slope and intercept for calibration-if possible alongside a calibration plot-is paramount. Finally, we explain how to arrive at a measure of variable importance using a universal, AUC-based method. We provide the full, structured code, as well as the complete glioblastoma survival database for the readers to download and execute in parallel to this section.
Collapse
|
14
|
Feghali J, Jimenez AE, Schilling AT, Azad TD. Overview of Algorithms for Natural Language Processing and Time Series Analyses. ACTA NEUROCHIRURGICA. SUPPLEMENT 2021; 134:221-242. [PMID: 34862546 DOI: 10.1007/978-3-030-85292-4_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
A host of machine learning algorithms have been used to perform several different tasks in NLP and TSA. Prior to implementing these algorithms, some degree of data preprocessing is required. Deep learning approaches utilizing multilayer perceptrons, recurrent neural networks (RNNs), and convolutional neural networks (CNNs) represent commonly used techniques. In supervised learning applications, all these models map inputs into a predicted output and then model the discrepancy between predicted values and the real output according to a loss function. The parameters of the mapping function are then optimized through the process of gradient descent and backward propagation in order to minimize this loss. This is the main premise behind many supervised learning algorithms. As experience with these algorithms grows, increased applications in the fields of medicine and neuroscience are anticipated.
Collapse
Affiliation(s)
- James Feghali
- Department of Neurosurgery, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Adrian E Jimenez
- Department of Neurosurgery, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Andrew T Schilling
- Department of Neurosurgery, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Tej D Azad
- Department of Neurosurgery, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
15
|
Staartjes VE, Regli L, Serra C. Machine Intelligence in Clinical Neuroscience: Taming the Unchained Prometheus. ACTA NEUROCHIRURGICA. SUPPLEMENT 2021; 134:1-4. [PMID: 34862521 DOI: 10.1007/978-3-030-85292-4_1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
The democratization of machine learning (ML) through availability of open-source learning libraries, the availability of datasets in the "big data" era, increasing computing power even on mobile devices, and online training resources have both led to an explosion in applications and publications of ML in the clinical neurosciences, but has also enabled a dangerous amount of flawed analyses and cardinal methodological errors committed by benevolent authors. While powerful ML methods are nowadays available to almost anyone and can be applied after just few minutes of familiarizing oneself with these methods, that does not imply that one has mastered these techniques. This textbook for clinicians aims to demystify ML by illustrating its methodological foundations, as well as some specific applications throughout clinical neuroscience, and its limitations. While our mind can recognize, abstract, and deal with the many uncertainties in clinical practice, algorithms cannot. Algorithms must remain tools of our own mind, tools that we should be able to master, control, and apply to our advantage in an adjunctive manner. Our hope is that this book inspires and instructs physician-scientists to continue to develop the seeds that have been planted for machine intelligence in clinical neuroscience, not forgetting their inherent limitations.
Collapse
Affiliation(s)
- Victor E Staartjes
- Machine Intelligence in Clinical Neuroscience (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland.
| | - Luca Regli
- Machine Intelligence in Clinical Neuroscience (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Carlo Serra
- Machine Intelligence in Clinical Neuroscience (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| |
Collapse
|
16
|
Do RKG, Lupton K, Causa Andrieu PI, Luthra A, Taya M, Batch K, Nguyen H, Rahurkar P, Gazit L, Nicholas K, Fong CJ, Gangai N, Schultz N, Zulkernine F, Sevilimedu V, Juluru K, Simpson A, Hricak H. Patterns of Metastatic Disease in Patients with Cancer Derived from Natural Language Processing of Structured CT Radiology Reports over a 10-year Period. Radiology 2021; 301:115-122. [PMID: 34342503 DOI: 10.1148/radiol.2021210043] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Background Patterns of metastasis in cancer are increasingly relevant to prognostication and treatment planning but have historically been documented by means of autopsy series. Purpose To show the feasibility of using natural language processing (NLP) to gather accurate data from radiology reports for assessing spatial and temporal patterns of metastatic spread in a large patient cohort. Materials and Methods In this retrospective longitudinal study, consecutive patients who underwent CT from July 2009 to April 2019 and whose CT reports followed a departmental structured template were included. Three radiologists manually curated a sample of 2219 reports for the presence or absence of metastases across 13 organs; these manually curated reports were used to develop three NLP models with an 80%-20% split for training and test sets. A separate random sample of 448 manually curated reports was used for validation. Model performance was measured by accuracy, precision, and recall for each organ. The best-performing NLP model was used to generate a final database of metastatic disease across all patients. For each cancer type, statistical descriptive reports were provided by analyzing the frequencies of metastatic disease at the report and patient levels. Results In 91 665 patients (mean age ± standard deviation, 61 years ± 15; 46 939 women), 387 359 reports were labeled. The best-performing NLP model achieved accuracies from 90% to 99% across all organs. Metastases were most frequently reported in abdominopelvic (23.6% of all reports) and thoracic (17.6%) nodes, followed by lungs (14.7%), liver (13.7%), and bones (9.9%). Metastatic disease tropism is distinct among common cancers, with the most common first site being bones in prostate and breast cancers and liver among pancreatic and colorectal cancers. Conclusion Natural language processing may be applied to cancer patients' CT reports to generate a large database of metastatic phenotypes. Such a database could be combined with genomic studies and used to explore prognostic imaging phenotypes with relevance to treatment planning. © RSNA, 2021 Online supplemental material is available for this article.
Collapse
Affiliation(s)
- Richard K G Do
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Kaelan Lupton
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Pamela I Causa Andrieu
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Anisha Luthra
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Michio Taya
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Karen Batch
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Huy Nguyen
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Prachi Rahurkar
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Lior Gazit
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Kevin Nicholas
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Christopher J Fong
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Natalie Gangai
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Nikolaus Schultz
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Farhana Zulkernine
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Varadan Sevilimedu
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Krishna Juluru
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Amber Simpson
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| | - Hedvig Hricak
- From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.)
| |
Collapse
|
17
|
Wood DA, Kafiabadi S, Al Busaidi A, Guilhem EL, Lynch J, Townend MK, Montvila A, Kiik M, Siddiqui J, Gadapa N, Benger MD, Mazumder A, Barker G, Ourselin S, Cole JH, Booth TC. Deep learning to automate the labelling of head MRI datasets for computer vision applications. Eur Radiol 2021; 32:725-736. [PMID: 34286375 PMCID: PMC8660736 DOI: 10.1007/s00330-021-08132-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 06/02/2021] [Accepted: 06/14/2021] [Indexed: 02/07/2023]
Abstract
Objectives The purpose of this study was to build a deep learning model to derive labels from neuroradiology reports and assign these to the corresponding examinations, overcoming a bottleneck to computer vision model development. Methods Reference-standard labels were generated by a team of neuroradiologists for model training and evaluation. Three thousand examinations were labelled for the presence or absence of any abnormality by manually scrutinising the corresponding radiology reports (‘reference-standard report labels’); a subset of these examinations (n = 250) were assigned ‘reference-standard image labels’ by interrogating the actual images. Separately, 2000 reports were labelled for the presence or absence of 7 specialised categories of abnormality (acute stroke, mass, atrophy, vascular abnormality, small vessel disease, white matter inflammation, encephalomalacia), with a subset of these examinations (n = 700) also assigned reference-standard image labels. A deep learning model was trained using labelled reports and validated in two ways: comparing predicted labels to (i) reference-standard report labels and (ii) reference-standard image labels. The area under the receiver operating characteristic curve (AUC-ROC) was used to quantify model performance. Accuracy, sensitivity, specificity, and F1 score were also calculated. Results Accurate classification (AUC-ROC > 0.95) was achieved for all categories when tested against reference-standard report labels. A drop in performance (ΔAUC-ROC > 0.02) was seen for three categories (atrophy, encephalomalacia, vascular) when tested against reference-standard image labels, highlighting discrepancies in the original reports. Once trained, the model assigned labels to 121,556 examinations in under 30 min. Conclusions Our model accurately classifies head MRI examinations, enabling automated dataset labelling for downstream computer vision applications. Key Points • Deep learning is poised to revolutionise image recognition tasks in radiology; however, a barrier to clinical adoption is the difficulty of obtaining large labelled datasets for model training. • We demonstrate a deep learning model which can derive labels from neuroradiology reports and assign these to the corresponding examinations at scale, facilitating the development of downstream computer vision models. • We rigorously tested our model by comparing labels predicted on the basis of neuroradiology reports with two sets of reference-standard labels: (1) labels derived by manually scrutinising each radiology report and (2) labels derived by interrogating the actual images. Supplementary Information The online version contains supplementary material available at 10.1007/s00330-021-08132-0.
Collapse
Affiliation(s)
- David A Wood
- School of Biomedical Engineering & Imaging Sciences, Kings College London, Rayne Institute, 4th Floor, Lambeth Wing, London, SE1 7EH, UK
| | - Sina Kafiabadi
- Department of Neuroradiology, Ruskin Wing, King's College Hospital NHS Foundation Trust, London, SE5 9RS, UK
| | - Aisha Al Busaidi
- Department of Neuroradiology, Ruskin Wing, King's College Hospital NHS Foundation Trust, London, SE5 9RS, UK
| | - Emily L Guilhem
- Department of Neuroradiology, Ruskin Wing, King's College Hospital NHS Foundation Trust, London, SE5 9RS, UK
| | - Jeremy Lynch
- Department of Neuroradiology, Ruskin Wing, King's College Hospital NHS Foundation Trust, London, SE5 9RS, UK
| | | | - Antanas Montvila
- Department of Neuroradiology, Ruskin Wing, King's College Hospital NHS Foundation Trust, London, SE5 9RS, UK.,Hospital of Lithuanian University of Health Sciences, Kaunas Clinics, Kaunas, Lithuania
| | - Martin Kiik
- School of Biomedical Engineering & Imaging Sciences, Kings College London, Rayne Institute, 4th Floor, Lambeth Wing, London, SE1 7EH, UK
| | - Juveria Siddiqui
- Department of Neuroradiology, Ruskin Wing, King's College Hospital NHS Foundation Trust, London, SE5 9RS, UK
| | - Naveen Gadapa
- Department of Neurology, Ruskin Wing, King's College Hospital NHS Foundation Trust, London, SE5 9RS, UK
| | - Matthew D Benger
- Department of Neuroradiology, Ruskin Wing, King's College Hospital NHS Foundation Trust, London, SE5 9RS, UK
| | - Asif Mazumder
- Guy's and St Thomas' NHS Foundation Trust, Westminster Bridge Road, London, SE1 7EH, UK
| | - Gareth Barker
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, SE5 8AF, UK
| | - Sebastian Ourselin
- School of Biomedical Engineering & Imaging Sciences, Kings College London, Rayne Institute, 4th Floor, Lambeth Wing, London, SE1 7EH, UK
| | - James H Cole
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, SE5 8AF, UK.,Centre for Medical Image Computing, Department of Computer Science, University College London, London, WC1V 6LJ, UK.,Dementia Research Centre, University College London, London, WC1N 3BG, UK
| | - Thomas C Booth
- School of Biomedical Engineering & Imaging Sciences, Kings College London, Rayne Institute, 4th Floor, Lambeth Wing, London, SE1 7EH, UK. .,Department of Neuroradiology, Ruskin Wing, King's College Hospital NHS Foundation Trust, London, SE5 9RS, UK.
| |
Collapse
|
18
|
Senders JT, Cho LD, Calvachi P, McNulty JJ, Ashby JL, Schulte IS, Almekkawi AK, Mehrtash A, Gormley WB, Smith TR, Broekman MLD, Arnaout O. Automating Clinical Chart Review: An Open-Source Natural Language Processing Pipeline Developed on Free-Text Radiology Reports From Patients With Glioblastoma. JCO Clin Cancer Inform 2021; 4:25-34. [PMID: 31977252 DOI: 10.1200/cci.19.00060] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
PURPOSE The aim of this study was to develop an open-source natural language processing (NLP) pipeline for text mining of medical information from clinical reports. We also aimed to provide insight into why certain variables or reports are more suitable for clinical text mining than others. MATERIALS AND METHODS Various NLP models were developed to extract 15 radiologic characteristics from free-text radiology reports for patients with glioblastoma. Ten-fold cross-validation was used to optimize the hyperparameter settings and estimate model performance. We examined how model performance was associated with quantitative attributes of the radiologic characteristics and reports. RESULTS In total, 562 unique brain magnetic resonance imaging reports were retrieved. NLP extracted 15 radiologic characteristics with high to excellent discrimination (area under the curve, 0.82 to 0.98) and accuracy (78.6% to 96.6%). Model performance was correlated with the inter-rater agreement of the manually provided labels (ρ = 0.904; P < .001) but not with the frequency distribution of the variables of interest (ρ = 0.179; P = .52). All variables labeled with a near perfect inter-rater agreement were classified with excellent performance (area under the curve > 0.95). Excellent performance could be achieved for variables with only 50 to 100 observations in the minority group and class imbalances up to a 9:1 ratio. Report-level classification accuracy was not associated with the number of words or the vocabulary size in the distinct text documents. CONCLUSION This study provides an open-source NLP pipeline that allows for text mining of narratively written clinical reports. Small sample sizes and class imbalance should not be considered as absolute contraindications for text mining in clinical research. However, future studies should report measures of inter-rater agreement whenever ground truth is based on a consensus label and use this measure to identify clinical variables eligible for text mining.
Collapse
Affiliation(s)
- Joeky T Senders
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.,Department of Neurosurgery, Leiden University Medical Center, Leiden, the Netherlands
| | - Logan D Cho
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.,Department of Neuroscience, Brown University, Providence, RI
| | - Paola Calvachi
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - John J McNulty
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.,Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
| | - Joanna L Ashby
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Isabelle S Schulte
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Ahmad Kareem Almekkawi
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Alireza Mehrtash
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - William B Gormley
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Timothy R Smith
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Marike L D Broekman
- Department of Neurosurgery, Leiden University Medical Center, Leiden, the Netherlands.,Department of Neurosurgery, Haaglanden Medical Center, The Hague, the Netherlands
| | - Omar Arnaout
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| |
Collapse
|
19
|
Decker BM, Hill CE, Baldassano SN, Khankhanian P. Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches. Seizure 2021; 85:138-144. [PMID: 33461032 DOI: 10.1016/j.seizure.2020.11.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/16/2020] [Accepted: 11/17/2020] [Indexed: 12/16/2022] Open
Abstract
As automated data extraction and natural language processing (NLP) are rapidly evolving, improving healthcare delivery by harnessing large data is garnering great interest. Assessing antiepileptic drug (AED) efficacy and other epilepsy variables pertinent to healthcare delivery remain a critical barrier to improving patient care. In this systematic review, we examined automatic electronic health record (EHR) extraction methodologies pertinent to epilepsy. We also reviewed more generalizable NLP pipelines to extract other critical patient variables. Our review found varying reports of performance measures. Whereas automated data extraction pipelines are a crucial advancement, this review calls attention to standardizing NLP methodology and accuracy reporting for greater generalizability. Moreover, the use of crowdsourcing competitions to spur innovative NLP pipelines would further advance this field.
Collapse
Affiliation(s)
- Barbara M Decker
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States.
| | - Chloé E Hill
- Department of Neurology, University of Michigan, 1500 East Medical Center Drive, Ann Arbor, MI, 48109, United States
| | - Steven N Baldassano
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States
| | - Pouya Khankhanian
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States
| |
Collapse
|
20
|
Heo TS, Kim YS, Choi JM, Jeong YS, Seo SY, Lee JH, Jeon JP, Kim C. Prediction of Stroke Outcome Using Natural Language Processing-Based Machine Learning of Radiology Report of Brain MRI. J Pers Med 2020; 10:jpm10040286. [PMID: 33339385 PMCID: PMC7766032 DOI: 10.3390/jpm10040286] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 12/09/2020] [Accepted: 12/15/2020] [Indexed: 01/28/2023] Open
Abstract
Brain magnetic resonance imaging (MRI) is useful for predicting the outcome of patients with acute ischemic stroke (AIS). Although deep learning (DL) using brain MRI with certain image biomarkers has shown satisfactory results in predicting poor outcomes, no study has assessed the usefulness of natural language processing (NLP)-based machine learning (ML) algorithms using brain MRI free-text reports of AIS patients. Therefore, we aimed to assess whether NLP-based ML algorithms using brain MRI text reports could predict poor outcomes in AIS patients. This study included only English text reports of brain MRIs examined during admission of AIS patients. Poor outcome was defined as a modified Rankin Scale score of 3-6, and the data were captured by trained nurses and physicians. We only included MRI text report of the first MRI scan during the admission. The text dataset was randomly divided into a training and test dataset with a 7:3 ratio. Text was vectorized to word, sentence, and document levels. In the word level approach, which did not consider the sequence of words, and the "bag-of-words" model was used to reflect the number of repetitions of text token. The "sent2vec" method was used in the sensation-level approach considering the sequence of words, and the word embedding was used in the document level approach. In addition to conventional ML algorithms, DL algorithms such as the convolutional neural network (CNN), long short-term memory, and multilayer perceptron were used to predict poor outcomes using 5-fold cross-validation and grid search techniques. The performance of each ML classifier was compared with the area under the receiver operating characteristic (AUROC) curve. Among 1840 subjects with AIS, 645 patients (35.1%) had a poor outcome 3 months after the stroke onset. Random forest was the best classifier (0.782 of AUROC) using a word-level approach. Overall, the document-level approach exhibited better performance than did the word- or sentence-level approaches. Among all the ML classifiers, the multi-CNN algorithm demonstrated the best classification performance (0.805), followed by the CNN (0.799) algorithm. When predicting future clinical outcomes using NLP-based ML of radiology free-text reports of brain MRI, DL algorithms showed superior performance over the other ML algorithms. In particular, the prediction of poor outcomes in document-level NLP DL was improved more by multi-CNN and CNN than by recurrent neural network-based algorithms. NLP-based DL algorithms can be used as an important digital marker for unstructured electronic health record data DL prediction.
Collapse
Affiliation(s)
- Tak Sung Heo
- Department of Convergence Software, Hallym University, Chuncheon 24252, Korea; (T.S.H.); (Y.S.K.); (J.M.C.); (Y.S.J.); (S.Y.S.)
| | - Yu Seop Kim
- Department of Convergence Software, Hallym University, Chuncheon 24252, Korea; (T.S.H.); (Y.S.K.); (J.M.C.); (Y.S.J.); (S.Y.S.)
| | - Jeong Myeong Choi
- Department of Convergence Software, Hallym University, Chuncheon 24252, Korea; (T.S.H.); (Y.S.K.); (J.M.C.); (Y.S.J.); (S.Y.S.)
| | - Yeong Seok Jeong
- Department of Convergence Software, Hallym University, Chuncheon 24252, Korea; (T.S.H.); (Y.S.K.); (J.M.C.); (Y.S.J.); (S.Y.S.)
| | - Soo Young Seo
- Department of Convergence Software, Hallym University, Chuncheon 24252, Korea; (T.S.H.); (Y.S.K.); (J.M.C.); (Y.S.J.); (S.Y.S.)
| | - Jun Ho Lee
- Department of Otorhinolaryngology and Head and Neck Surgery, Chuncheon Sacred Heart Hospital, Chuncheon 24253, Korea;
| | - Jin Pyeong Jeon
- Department of Neurosurgery, Chuncheon Sacred Heart Hospital, Chuncheon 24253, Korea;
| | - Chulho Kim
- Department of Neurology, Chuncheon Sacred Heart Hospital, Chuncheon 24253, Korea
- Correspondence: ; Tel.: +82-332-405-255; Fax: +82-332-5562-44
| |
Collapse
|
21
|
Abstract
PURPOSE OF REVIEW To discuss recent applications of artificial intelligence within the field of neuro-oncology and highlight emerging challenges in integrating artificial intelligence within clinical practice. RECENT FINDINGS In the field of image analysis, artificial intelligence has shown promise in aiding clinicians with incorporating an increasing amount of data in genomics, detection, diagnosis, classification, risk stratification, prognosis, and treatment response. Artificial intelligence has also been applied in epigenetics, pathology, and natural language processing. SUMMARY Although nascent, applications of artificial intelligence within neuro-oncology show significant promise. Artificial intelligence algorithms will likely improve our understanding of brain tumors and help drive future innovations in neuro-oncology.
Collapse
|
22
|
Tsai CC, Lin YC, Ng SH, Chen YL, Cheng JS, Lu CS, Weng YH, Lin SH, Chen PY, Wu YM, Wang JJ. A Method for the Prediction of Clinical Outcome Using Diffusion Magnetic Resonance Imaging: Application on Parkinson's Disease. J Clin Med 2020; 9:jcm9030647. [PMID: 32121190 PMCID: PMC7141247 DOI: 10.3390/jcm9030647] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 02/10/2020] [Accepted: 02/18/2020] [Indexed: 01/06/2023] Open
Abstract
Robust early prediction of clinical outcomes in Parkinson's disease (PD) is paramount for implementing appropriate management interventions. We propose a method that uses the baseline MRI, measuring diffusion parameters from multiple parcellated brain regions, to predict the 2-year clinical outcome in Parkinson's disease. Diffusion tensor imaging was obtained from 82 patients (males/females = 45/37, mean age: 60.9 ± 7.3 years, baseline and after 23.7 ± 0.7 months) using a 3T MR scanner, which was normalized and parcellated according to the Automated Anatomical Labelling template. All patients were diagnosed with probable Parkinson's disease by the National Institute of Neurological Disorders and Stroke criteria. Clinical outcome was graded using disease severity (Unified Parkinson's Disease Rating Scale and Modified Hoehn and Yahr staging), drug administration (levodopa equivalent daily dose), and quality of life (39-item PD Questionnaire). Selection and regularization of diffusion parameters, the mean diffusivity and fractional anisotropy, were performed using least absolute shrinkage and selection operator (LASSO) between baseline diffusion index and clinical outcome over 2 years. Identified features were entered into a stepwise multivariate regression model, followed by a leave-one-out/5-fold cross validation and additional blind validation using an independent dataset. The predicted Unified Parkinson's Disease Rating Scale for each individual was consistent with the observed values at blind validation (adjusted R2 0.76) by using 13 features, such as mean diffusivity in lingual, nodule lobule of cerebellum vermis and fractional anisotropy in rolandic operculum, and quadrangular lobule of cerebellum. We conclude that baseline diffusion MRI is potentially capable of predicting 2-year clinical outcomes in patients with Parkinson's disease on an individual basis.
Collapse
Affiliation(s)
- Chih-Chien Tsai
- Healthy Aging Research Center, Chang Gung University, Taoyuan 33302, Taiwan;
| | - Yu-Chun Lin
- Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital, Linkou, Taoyuan 33375, Taiwan; (Y.-C.L.); (S.-H.N.); (Y.-L.C.); (Y.-M.W.)
- Department of Medical Imaging and Radiological Sciences, Chang Gung University, Taoyuan 33302, Taiwan; (S.-H.L.); (P.-Y.C.)
| | - Shu-Hang Ng
- Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital, Linkou, Taoyuan 33375, Taiwan; (Y.-C.L.); (S.-H.N.); (Y.-L.C.); (Y.-M.W.)
- Department of Medical Imaging and Radiological Sciences, Chang Gung University, Taoyuan 33302, Taiwan; (S.-H.L.); (P.-Y.C.)
| | - Yao-Liang Chen
- Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital, Linkou, Taoyuan 33375, Taiwan; (Y.-C.L.); (S.-H.N.); (Y.-L.C.); (Y.-M.W.)
- Department of Diagnostic Radiology, Chang Gung Memorial Hospital, Keelung City 20401, Taiwan
| | - Jur-Shan Cheng
- Clinical Informatics and Medical Statistics Research Center, College of Medicine, Chang Gung University, Taoyuan 33302, Taiwan;
- Department of Emergency Medicine, Chang Gung Memorial Hospital, Keelung City 20401, Taiwan
| | - Chin-Song Lu
- Professor Lu Neurological Clinic, Taoyuan 33375, Taiwan;
- Division of Movement Disorders, Department of Neurology, Chang Gung Memorial Hospital, Linkou, Taoyuan 33375, Taiwan;
- Neuroscience Research Center, Chang Gung Memorial Hospital, Linkou, Taoyuan 33375, Taiwan
| | - Yi-Hsin Weng
- Division of Movement Disorders, Department of Neurology, Chang Gung Memorial Hospital, Linkou, Taoyuan 33375, Taiwan;
- Neuroscience Research Center, Chang Gung Memorial Hospital, Linkou, Taoyuan 33375, Taiwan
- School of Medicine, Chang Gung University, Taoyuan 33302, Taiwan
| | - Sung-Han Lin
- Department of Medical Imaging and Radiological Sciences, Chang Gung University, Taoyuan 33302, Taiwan; (S.-H.L.); (P.-Y.C.)
| | - Po-Yuan Chen
- Department of Medical Imaging and Radiological Sciences, Chang Gung University, Taoyuan 33302, Taiwan; (S.-H.L.); (P.-Y.C.)
| | - Yi-Ming Wu
- Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital, Linkou, Taoyuan 33375, Taiwan; (Y.-C.L.); (S.-H.N.); (Y.-L.C.); (Y.-M.W.)
- Department of Medical Imaging and Radiological Sciences, Chang Gung University, Taoyuan 33302, Taiwan; (S.-H.L.); (P.-Y.C.)
| | - Jiun-Jie Wang
- Healthy Aging Research Center, Chang Gung University, Taoyuan 33302, Taiwan;
- Department of Medical Imaging and Radiological Sciences, Chang Gung University, Taoyuan 33302, Taiwan; (S.-H.L.); (P.-Y.C.)
- Department of Diagnostic Radiology, Chang Gung Memorial Hospital, Keelung City 20401, Taiwan
- Medical Imaging Research Center, Institute for Radiological Research, Chang Gung University/Chang Gung Memorial Hospital, Linkou 33375, Taoyuan, Taiwan
- Correspondence: ; Tel.: +886-3-211-8800 (ext. 5391); Fax: +886-3-397-1936
| |
Collapse
|
23
|
Chen Z, Pang M, Zhao Z, Li S, Miao R, Zhang Y, Feng X, Feng X, Zhang Y, Duan M, Huang L, Zhou F. Feature selection may improve deep neural networks for the bioinformatics problems. Bioinformatics 2019; 36:1542-1552. [DOI: 10.1093/bioinformatics/btz763] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 09/03/2019] [Accepted: 10/02/2019] [Indexed: 12/22/2022] Open
Abstract
Abstract
Motivation
Deep neural network (DNN) algorithms were utilized in predicting various biomedical phenotypes recently, and demonstrated very good prediction performances without selecting features. This study proposed a hypothesis that the DNN models may be further improved by feature selection algorithms.
Results
A comprehensive comparative study was carried out by evaluating 11 feature selection algorithms on three conventional DNN algorithms, i.e. convolution neural network (CNN), deep belief network (DBN) and recurrent neural network (RNN), and three recent DNNs, i.e. MobilenetV2, ShufflenetV2 and Squeezenet. Five binary classification methylomic datasets were chosen to calculate the prediction performances of CNN/DBN/RNN models using feature selected by the 11 feature selection algorithms. Seventeen binary classification transcriptome and two multi-class transcriptome datasets were also utilized to evaluate how the hypothesis may generalize to different data types. The experimental data supported our hypothesis that feature selection algorithms may improve DNN models, and the DBN models using features selected by SVM-RFE usually achieved the best prediction accuracies on the five methylomic datasets.
Availability and implementation
All the algorithms were implemented and tested under the programming environment Python version 3.6.6.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zheng Chen
- BioKnow Health Informatics Lab, College of Computer Science and Technology
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Meng Pang
- BioKnow Health Informatics Lab, College of Computer Science and Technology
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Zixin Zhao
- BioKnow Health Informatics Lab, College of Computer Science and Technology
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Shuainan Li
- BioKnow Health Informatics Lab, College of Computer Science and Technology
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Rui Miao
- BioKnow Health Informatics Lab, College of Computer Science and Technology
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Yifan Zhang
- BioKnow Health Informatics Lab, College of Computer Science and Technology
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Xiaoyue Feng
- BioKnow Health Informatics Lab, College of Computer Science and Technology
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Xin Feng
- BioKnow Health Informatics Lab, College of Computer Science and Technology
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Yexian Zhang
- BioKnow Health Informatics Lab, College of Computer Science and Technology
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Meiyu Duan
- BioKnow Health Informatics Lab, College of Computer Science and Technology
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Lan Huang
- BioKnow Health Informatics Lab, College of Computer Science and Technology
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Fengfeng Zhou
- BioKnow Health Informatics Lab, College of Computer Science and Technology
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| |
Collapse
|