1
|
Sushil M, Zack T, Mandair D, Zheng Z, Wali A, Yu YN, Quan Y, Lituiev D, Butte AJ. A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports. J Am Med Inform Assoc 2024; 31:2315-2327. [PMID: 38900207 DOI: 10.1093/jamia/ocae146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/27/2024] [Accepted: 06/03/2024] [Indexed: 06/21/2024] Open
Abstract
OBJECTIVE Although supervised machine learning is popular for information extraction from clinical notes, creating large annotated datasets requires extensive domain expertise and is time-consuming. Meanwhile, large language models (LLMs) have demonstrated promising transfer learning capability. In this study, we explored whether recent LLMs could reduce the need for large-scale data annotations. MATERIALS AND METHODS We curated a dataset of 769 breast cancer pathology reports, manually labeled with 12 categories, to compare zero-shot classification capability of the following LLMs: GPT-4, GPT-3.5, Starling, and ClinicalCamel, with task-specific supervised classification performance of 3 models: random forests, long short-term memory networks with attention (LSTM-Att), and the UCSF-BERT model. RESULTS Across all 12 tasks, the GPT-4 model performed either significantly better than or as well as the best supervised model, LSTM-Att (average macro F1-score of 0.86 vs 0.75), with advantage on tasks with high label imbalance. Other LLMs demonstrated poor performance. Frequent GPT-4 error categories included incorrect inferences from multiple samples and from history, and complex task design, and several LSTM-Att errors were related to poor generalization to the test set. DISCUSSION On tasks where large annotated datasets cannot be easily collected, LLMs can reduce the burden of data labeling. However, if the use of LLMs is prohibitive, the use of simpler models with large annotated datasets can provide comparable results. CONCLUSIONS GPT-4 demonstrated the potential to speed up the execution of clinical NLP studies by reducing the need for large annotated datasets. This may increase the utilization of NLP-based variables and outcomes in clinical studies.
Collapse
Affiliation(s)
- Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Travis Zack
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Divneet Mandair
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Zhiwei Zheng
- University of California, Berkeley, Berkeley, CA 94720, United States
| | - Ahmed Wali
- University of California, Berkeley, Berkeley, CA 94720, United States
| | - Yan-Ning Yu
- University of California, Berkeley, Berkeley, CA 94720, United States
| | - Yuwei Quan
- University of California, Berkeley, Berkeley, CA 94720, United States
| | - Dmytro Lituiev
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94158, United States
- Center for Data-driven Insights and Innovation, University of California, Office of the President, Oakland, CA 94607, United States
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94158, United States
| |
Collapse
|
2
|
Maeda-Minami A, Yoshino T, Yumoto T, Sato K, Sagara A, Inaba K, Kominato H, Kimura T, Takishita T, Watanabe G, Nakamura T, Mano Y, Horiba Y, Watanabe K, Kamei J. Development of a novel drug information provision system for Kampo medicine using natural language processing technology. BMC Med Inform Decis Mak 2023; 23:119. [PMID: 37442993 PMCID: PMC10347708 DOI: 10.1186/s12911-023-02230-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 07/07/2023] [Indexed: 07/15/2023] Open
Abstract
BACKGROUND Kampo medicine is widely used in Japan; however, most physicians and pharmacists have insufficient knowledge and experience in it. Although a chatbot-style system using machine learning and natural language processing has been used in some clinical settings and proven useful, the system developed specifically for the Japanese language using this method has not been validated by research. The purpose of this study is to develop a novel drug information provision system for Kampo medicines using a natural language classifier® (NLC®) based on IBM Watson. METHODS The target Kampo formulas were 33 formulas listed in the 17th revision of the Japanese Pharmacopoeia. The information included in the system comes from the package inserts of Kampo medicines, Manuals for Management of Individual Serious Adverse Drug Reactions, and data on off-label usage. The system developed in this study classifies questions about the drug information of Kampo formulas input by natural language into preset questions and outputs preset answers for the questions. The system uses morphological analysis, synonym conversion by thesaurus, and NLC®. We fine-tuned the information registered into NLC® and increased the thesaurus. To validate the system, 900 validation questions were provided by six pharmacists who were classified into high or low levels of knowledge and experience of Kampo medicines and three pharmacy students. RESULTS The precision, recall, and F-measure of the system performance were 0.986, 0.915, and 0.949, respectively. The results were stable even with differences in the amount of expertise of the question authors. CONCLUSIONS We developed a system using natural language classification that can give appropriate answers to most of the validation questions.
Collapse
Affiliation(s)
- Ayako Maeda-Minami
- Faculty of Pharmaceutical Sciences, Tokyo University of Science, Noda, Yamazaki, Chiba, 2641, Japan.
- Center for Kampo Medicine, Keio University School of Medicine, 35, Shinanomachi, Shinjuku-ku, Tokyo, Japan.
- Hoshi University, 2-4-41 Ebara, Shinagawa-ku, Tokyo, Japan.
| | - Tetsuhiro Yoshino
- Center for Kampo Medicine, Keio University School of Medicine, 35, Shinanomachi, Shinjuku-ku, Tokyo, Japan
| | - Tetsuro Yumoto
- Hoshi University, 2-4-41 Ebara, Shinagawa-ku, Tokyo, Japan
| | - Kayoko Sato
- Hoshi University, 2-4-41 Ebara, Shinagawa-ku, Tokyo, Japan
| | | | - Kenjiro Inaba
- Department of Pharmacy, General Sagami Kosei Hospital, Oyama, Chuou-ku, Sagami, Kanagawa, 3429, Japan
| | | | - Takao Kimura
- Kimura Information Technology Co. Ltd, 6-1 Oroshihonmachi, Saga, Saga, Japan
| | - Tetsuya Takishita
- Kimura Information Technology Co. Ltd, 6-1 Oroshihonmachi, Saga, Saga, Japan
| | - Gen Watanabe
- Kimura Information Technology Co. Ltd, 6-1 Oroshihonmachi, Saga, Saga, Japan
| | - Tomonori Nakamura
- Division of Pharmaceutical Care Sciences, Center for Social Pharmacy and Pharmaceutical Care Science, Faculty of Pharmacy, Keio University, 1-5-30, Shibakoen, Minato-ku, Tokyo, Japan
| | - Yasunari Mano
- Faculty of Pharmaceutical Sciences, Tokyo University of Science, Noda, Yamazaki, Chiba, 2641, Japan
| | - Yuko Horiba
- Center for Kampo Medicine, Keio University School of Medicine, 35, Shinanomachi, Shinjuku-ku, Tokyo, Japan
| | - Kenji Watanabe
- Center for Kampo Medicine, Keio University School of Medicine, 35, Shinanomachi, Shinjuku-ku, Tokyo, Japan
| | - Junzo Kamei
- Juntendo Advanced Research Institute for Health Science, Juntendo University, 2-1-1, Hongou, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
3
|
Zhang J, Mazurowski MA, Allen BC, Wildman-Tobriner B. Multistep Automated Data Labelling Procedure (MADLaP) for thyroid nodules on ultrasound: An artificial intelligence approach for automating image annotation. Artif Intell Med 2023; 141:102553. [PMID: 37295897 DOI: 10.1016/j.artmed.2023.102553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 02/14/2023] [Accepted: 04/11/2023] [Indexed: 06/12/2023]
Abstract
Machine learning (ML) for diagnosis of thyroid nodules on ultrasound is an active area of research. However, ML tools require large, well-labeled datasets, the curation of which is time-consuming and labor-intensive. The purpose of our study was to develop and test a deep-learning-based tool to facilitate and automate the data annotation process for thyroid nodules; we named our tool Multistep Automated Data Labelling Procedure (MADLaP). MADLaP was designed to take multiple inputs including pathology reports, ultrasound images, and radiology reports. Using multiple step-wise 'modules' including rule-based natural language processing, deep-learning-based imaging segmentation, and optical character recognition, MADLaP automatically identified images of a specific thyroid nodule and correctly assigned a pathology label. The model was developed using a training set of 378 patients across our health system and tested on a separate set of 93 patients. Ground truths for both sets were selected by an experienced radiologist. Performance metrics including yield (how many labeled images the model produced) and accuracy (percentage correct) were measured using the test set. MADLaP achieved a yield of 63 % and an accuracy of 83 %. The yield progressively increased as the input data moved through each module, while accuracy peaked part way through. Error analysis showed that inputs from certain examination sites had lower accuracy (40 %) than the other sites (90 %, 100 %). MADLaP successfully created curated datasets of labeled ultrasound images of thyroid nodules. While accurate, the relatively suboptimal yield of MADLaP exposed some challenges when trying to automatically label radiology images from heterogeneous sources. The complex task of image curation and annotation could be automated, allowing for enrichment of larger datasets for use in machine learning development.
Collapse
Affiliation(s)
- Jikai Zhang
- Department of Electrical and Computer Engineering, Duke University, Room 10070, 2424 Erwin Rd, Durham, NC 27705, United States.
| | - Maciej A Mazurowski
- Department of Radiology, Duke University Medical Center, Durham, NC, United States; Department of Electrical and Computer Engineering, Department of Biostatistics and Bioinformatics, Department of Computer Science, Duke University, Room 9044, 2424 Erwin Rd, Durham, NC 27705, United States
| | - Brian C Allen
- Department of Radiology, Duke University Medical Center, Duke University, Dept of Radiology, Box 3808, Durham, NC 27710, United States
| | - Benjamin Wildman-Tobriner
- Department of Radiology, Duke University Medical Center, Duke University, Dept of Radiology, Box 3808, Durham, NC 27710, United States
| |
Collapse
|
4
|
Yadav N, Dass R, Virmani J. Assessment of encoder-decoder-based segmentation models for thyroid ultrasound images. Med Biol Eng Comput 2023:10.1007/s11517-023-02849-4. [PMID: 37353695 DOI: 10.1007/s11517-023-02849-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 05/17/2023] [Indexed: 06/25/2023]
Abstract
Encoder-decoder-based semantic segmentation models classify image pixels into the corresponding class, such as the ROI (region of interest) or background. In the present study, simple / dilated convolution / series / directed acyclic graph (DAG)-based encoder-decoder semantic segmentation models have been implemented, i.e., SegNet (VGG16), SegNet (VGG19), U-Net, mobileNetv2, ResNet18, ResNet50, Xception and Inception networks for the segment TTUS(Thyroid Tumor Ultrasound) images. Transfer learning has been used to train these segmentation networks using original and despeckled TTUS images. The performance of the networks has been calculated using mIoU and mDC metrics. Based on the exhaustive experiments, it has been observed that ResNet50-based segmentation model obtained the best results objectively with values 0.87 for mIoU, 0.94 for mDC, and also according to radiologist opinion on shape, margin, and echogenicity characteristics of segmented lesions. It is noted that the segmentation model, namely ResNet50, provides better segmentation based on objective and subjective assessment. It may be used in the healthcare system to identify thyroid nodules accurately in real time.
Collapse
Affiliation(s)
- Niranjan Yadav
- Department of Electronics and Communication Engineering, Deenbandhu Chhotu Ram University of Science and Technology Murthal, Sonepat, 131039, India.
| | - Rajeshwar Dass
- Department of Electronics and Communication Engineering, Deenbandhu Chhotu Ram University of Science and Technology Murthal, Sonepat, 131039, India
| | - Jitendra Virmani
- Central Scientific Instruments Organization, Council of Scientific and Industrial Research, Chandigarh, 160030, India
| |
Collapse
|
5
|
Meng M, Li H, Zhang M, He G, Wang L, Shen D. Reducing the number of unnecessary biopsies for mammographic BI-RADS 4 lesions through a deep transfer learning method. BMC Med Imaging 2023; 23:82. [PMID: 37312026 DOI: 10.1186/s12880-023-01023-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Accepted: 05/23/2023] [Indexed: 06/15/2023] Open
Abstract
BACKGROUND In clinical practice, reducing unnecessary biopsies for mammographic BI-RADS 4 lesions is crucial. The objective of this study was to explore the potential value of deep transfer learning (DTL) based on the different fine-tuning strategies for Inception V3 to reduce the number of unnecessary biopsies that residents need to perform for mammographic BI-RADS 4 lesions. METHODS A total of 1980 patients with breast lesions were included, including 1473 benign lesions (185 women with bilateral breast lesions), and 692 malignant lesions collected and confirmed by clinical pathology or biopsy. The breast mammography images were randomly divided into three subsets, a training set, testing set, and validation set 1, at a ratio of 8:1:1. We constructed a DTL model for the classification of breast lesions based on Inception V3 and attempted to improve its performance with 11 fine-tuning strategies. The mammography images from 362 patients with pathologically confirmed BI-RADS 4 breast lesions were employed as validation set 2. Two images from each lesion were tested, and trials were categorized as correct if the judgement (≥ 1 image) was correct. We used precision (Pr), recall rate (Rc), F1 score (F1), and the area under the receiver operating characteristic curve (AUROC) as the performance metrics of the DTL model with validation set 2. RESULTS The S5 model achieved the best fit for the data. The Pr, Rc, F1 and AUROC of S5 were 0.90, 0.90, 0.90, and 0.86, respectively, for Category 4. The proportions of lesions downgraded by S5 were 90.73%, 84.76%, and 80.19% for categories 4 A, 4B, and 4 C, respectively. The overall proportion of BI-RADS 4 lesions downgraded by S5 was 85.91%. There was no significant difference between the classification results of the S5 model and pathological diagnosis (P = 0.110). CONCLUSION The S5 model we proposed here can be used as an effective approach for reducing the number of unnecessary biopsies that residents need to conduct for mammographic BI-RADS 4 lesions and may have other important clinical uses.
Collapse
Affiliation(s)
- Mingzhu Meng
- Department of Radiology, The Affiliated Changzhou No 2 People's Hospital of Nanjing Medical University, Changzhou, 213164, Jiangsu Province, P. R. China
| | - Hong Li
- Department of Radiology, The Second Affiliated Hospital of Soochow University, Suzhou, 215004, Jiangsu Province, P.R. China
| | - Ming Zhang
- Department of Radiology, The Affiliated Changzhou No 2 People's Hospital of Nanjing Medical University, Changzhou, 213164, Jiangsu Province, P. R. China
| | - Guangyuan He
- Department of Radiology, The Affiliated Changzhou No 2 People's Hospital of Nanjing Medical University, Changzhou, 213164, Jiangsu Province, P. R. China
| | - Long Wang
- Department of Radiology, The Affiliated Changzhou No 2 People's Hospital of Nanjing Medical University, Changzhou, 213164, Jiangsu Province, P. R. China.
| | - Dong Shen
- Department of Radiology, The Affiliated Changzhou No 2 People's Hospital of Nanjing Medical University, Changzhou, 213164, Jiangsu Province, P. R. China.
| |
Collapse
|
6
|
Eysenbach G, Kleib M, Norris C, O'Rourke HM, Montgomery C, Douma M. The Use and Structure of Emergency Nurses' Triage Narrative Data: Scoping Review. JMIR Nurs 2023; 6:e41331. [PMID: 36637881 PMCID: PMC9883744 DOI: 10.2196/41331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 11/24/2022] [Accepted: 11/28/2022] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Emergency departments use triage to ensure that patients with the highest level of acuity receive care quickly and safely. Triage is typically a nursing process that is documented as structured and unstructured (free text) data. Free-text triage narratives have been studied for specific conditions but never reviewed in a comprehensive manner. OBJECTIVE The objective of this paper was to identify and map the academic literature that examines triage narratives. The paper described the types of research conducted, identified gaps in the research, and determined where additional review may be warranted. METHODS We conducted a scoping review of unstructured triage narratives. We mapped the literature, described the use of triage narrative data, examined the information available on the form and structure of narratives, highlighted similarities among publications, and identified opportunities for future research. RESULTS We screened 18,074 studies published between 1990 and 2022 in CINAHL, MEDLINE, Embase, Cochrane, and ProQuest Central. We identified 0.53% (96/18,074) of studies that directly examined the use of triage nurses' narratives. More than 12 million visits were made to 2438 emergency departments included in the review. In total, 82% (79/96) of these studies were conducted in the United States (43/96, 45%), Australia (31/96, 32%), or Canada (5/96, 5%). Triage narratives were used for research and case identification, as input variables for predictive modeling, and for quality improvement. Overall, 31% (30/96) of the studies offered a description of the triage narrative, including a list of the keywords used (27/96, 28%) or more fulsome descriptions (such as word counts, character counts, abbreviation, etc; 7/96, 7%). We found limited use of reporting guidelines (8/96, 8%). CONCLUSIONS The breadth of the identified studies suggests that there is widespread routine collection and research use of triage narrative data. Despite the use of triage narratives as a source of data in studies, the narratives and nurses who generate them are poorly described in the literature, and data reporting is inconsistent. Additional research is needed to describe the structure of triage narratives, determine the best use of triage narratives, and improve the consistent use of triage-specific data reporting guidelines. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) RR2-10.1136/bmjopen-2021-055132.
Collapse
Affiliation(s)
| | - Manal Kleib
- Faculty of Nursing, University of Alberta, Edmonton, AB, Canada
| | - Colleen Norris
- Faculty of Nursing, University of Alberta, Edmonton, AB, Canada
| | | | | | - Matthew Douma
- School of Nursing, Midwifery and Health Systems, University College Dublin, Dublin, Ireland
| |
Collapse
|
7
|
Cheng J. Neural Network Assisted Pathology Case Identification. J Pathol Inform 2022; 13:100008. [PMID: 35242447 PMCID: PMC8860736 DOI: 10.1016/j.jpi.2022.100008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 01/03/2022] [Indexed: 12/02/2022] Open
Abstract
Background Traditionally, cases for cohort selection and quality assurance purposes are identified through structured query language (SQL) searches matching specific keywords. Recently, several neural network-based natural language processing (NLP) pipelines have emerged as an accurate alternative/complementary method for case retrieval. Methods The diagnosis section of 1000 pathology reports with the terms “colon” and “carcinoma” were retrieved from our laboratory information system through a SQL query. Each of the reports were labeled as either positive or negative, where cases are considered positive if the case was a primary adenocarcinoma of the colon. Negative cases comprised adenocarcinoma from other sites, metastatic adenocarcinomas, benign conditions, rectal cancers, and other cases that do not fit in the primary colonic adenocarcinoma category. The 1000 cases were randomly separated into training, validation, and holdout sets. A convolutional neural network (CNN) model built using Keras (a neural network library) was trained to identify positive cases, and the model was applied to the holdout set to predict the category for each case. Results The CNN model classified 141 out of 149 primary colonic adenocarcinoma cases, and 43 out of 51 negative cases correctly, achieving an accuracy of 92% and area under the ROC curve (AUC) of 0.957. Conclusion Trained convolutional neural network models by itself, or as an adjunct to keyword and pattern-based text extraction methods may be used to search for pathology cases of interest with high accuracy.
Collapse
Affiliation(s)
- Jerome Cheng
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
8
|
Empowering study of breast cancer data with application of artificial intelligence technology: promises, challenges, and use cases. Clin Exp Metastasis 2022; 39:249-254. [PMID: 34697751 PMCID: PMC8967766 DOI: 10.1007/s10585-021-10125-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 09/25/2021] [Indexed: 12/15/2022]
Abstract
In healthcare, artificial intelligence (AI) technologies have the potential to create significant value by improving time-sensitive outcomes while lowering error rates for each patient. Diagnostic images, clinical notes, and reports are increasingly generated and stored in electronic medical records. This heterogeneous data presenting us with challenges in data analytics and reusability that is by nature has high complexity, thereby necessitating novel ways to store, manage and process, and reuse big data. This presents an urgent need to develop new, scalable, and expandable AI infrastructure and analytical methods that can enable healthcare providers to access knowledge for individual patients, yielding better decisions and outcomes. In this review article, we briefly discuss the nature of data in breast cancer study and the role of AI for generating "smart data" which offer actionable information that supports the better decision for personalized medicine for individual patients. In our view, the biggest challenge is to create a system that makes data robust and smart for healthcare providers and patients that can lead to more effective clinical decision-making, improved health outcomes, and ultimately, managing the healthcare outcomes and costs. We highlight some of the challenges in using breast cancer data and propose the need for an AI-driven environment to address them. We illustrate our vision with practical use cases and discuss a path for empowering the study of breast cancer databases with the application of AI and future directions.
Collapse
|
9
|
Senders JT, Cho LD, Calvachi P, McNulty JJ, Ashby JL, Schulte IS, Almekkawi AK, Mehrtash A, Gormley WB, Smith TR, Broekman MLD, Arnaout O. Automating Clinical Chart Review: An Open-Source Natural Language Processing Pipeline Developed on Free-Text Radiology Reports From Patients With Glioblastoma. JCO Clin Cancer Inform 2021; 4:25-34. [PMID: 31977252 DOI: 10.1200/cci.19.00060] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
PURPOSE The aim of this study was to develop an open-source natural language processing (NLP) pipeline for text mining of medical information from clinical reports. We also aimed to provide insight into why certain variables or reports are more suitable for clinical text mining than others. MATERIALS AND METHODS Various NLP models were developed to extract 15 radiologic characteristics from free-text radiology reports for patients with glioblastoma. Ten-fold cross-validation was used to optimize the hyperparameter settings and estimate model performance. We examined how model performance was associated with quantitative attributes of the radiologic characteristics and reports. RESULTS In total, 562 unique brain magnetic resonance imaging reports were retrieved. NLP extracted 15 radiologic characteristics with high to excellent discrimination (area under the curve, 0.82 to 0.98) and accuracy (78.6% to 96.6%). Model performance was correlated with the inter-rater agreement of the manually provided labels (ρ = 0.904; P < .001) but not with the frequency distribution of the variables of interest (ρ = 0.179; P = .52). All variables labeled with a near perfect inter-rater agreement were classified with excellent performance (area under the curve > 0.95). Excellent performance could be achieved for variables with only 50 to 100 observations in the minority group and class imbalances up to a 9:1 ratio. Report-level classification accuracy was not associated with the number of words or the vocabulary size in the distinct text documents. CONCLUSION This study provides an open-source NLP pipeline that allows for text mining of narratively written clinical reports. Small sample sizes and class imbalance should not be considered as absolute contraindications for text mining in clinical research. However, future studies should report measures of inter-rater agreement whenever ground truth is based on a consensus label and use this measure to identify clinical variables eligible for text mining.
Collapse
Affiliation(s)
- Joeky T Senders
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.,Department of Neurosurgery, Leiden University Medical Center, Leiden, the Netherlands
| | - Logan D Cho
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.,Department of Neuroscience, Brown University, Providence, RI
| | - Paola Calvachi
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - John J McNulty
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.,Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
| | - Joanna L Ashby
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Isabelle S Schulte
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Ahmad Kareem Almekkawi
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Alireza Mehrtash
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - William B Gormley
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Timothy R Smith
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Marike L D Broekman
- Department of Neurosurgery, Leiden University Medical Center, Leiden, the Netherlands.,Department of Neurosurgery, Haaglanden Medical Center, The Hague, the Netherlands
| | - Omar Arnaout
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| |
Collapse
|
10
|
Manco L, Maffei N, Strolin S, Vichi S, Bottazzi L, Strigari L. Basic of machine learning and deep learning in imaging for medical physicists. Phys Med 2021; 83:194-205. [DOI: 10.1016/j.ejmp.2021.03.026] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 03/07/2021] [Accepted: 03/16/2021] [Indexed: 02/08/2023] Open
|
11
|
Abstract
Machine learning (ML) has been slowly entering every aspect of our lives and its positive impact has been astonishing. To accelerate embedding ML in more applications and incorporating it in real-world scenarios, automated machine learning (AutoML) is emerging. The main purpose of AutoML is to provide seamless integration of ML in various industries, which will facilitate better outcomes in everyday tasks. In healthcare, AutoML has been already applied to easier settings with structured data such as tabular lab data. However, there is still a need for applying AutoML for interpreting medical text, which is being generated at a tremendous rate. For this to happen, a promising method is AutoML for clinical notes analysis, which is an unexplored research area representing a gap in ML research. The main objective of this paper is to fill this gap and provide a comprehensive survey and analytical study towards AutoML for clinical notes. To that end, we first introduce the AutoML technology and review its various tools and techniques. We then survey the literature of AutoML in the healthcare industry and discuss the developments specific to clinical settings, as well as those using general AutoML tools for healthcare applications. With this background, we then discuss challenges of working with clinical notes and highlight the benefits of developing AutoML for medical notes processing. Next, we survey relevant ML research for clinical notes and analyze the literature and the field of AutoML in the healthcare industry. Furthermore, we propose future research directions and shed light on the challenges and opportunities this emerging field holds. With this, we aim to assist the community with the implementation of an AutoML platform for medical notes, which if realized can revolutionize patient outcomes.
Collapse
|
12
|
Issa NT, Stathias V, Schürer S, Dakshanamurthy S. Machine and deep learning approaches for cancer drug repurposing. Semin Cancer Biol 2021; 68:132-142. [PMID: 31904426 PMCID: PMC7723306 DOI: 10.1016/j.semcancer.2019.12.011] [Citation(s) in RCA: 103] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 10/31/2019] [Accepted: 12/15/2019] [Indexed: 02/07/2023]
Abstract
Knowledge of the underpinnings of cancer initiation, progression and metastasis has increased exponentially in recent years. Advanced "omics" coupled with machine learning and artificial intelligence (deep learning) methods have helped elucidate targets and pathways critical to those processes that may be amenable to pharmacologic modulation. However, the current anti-cancer therapeutic armamentarium continues to lag behind. As the cost of developing a new drug remains prohibitively expensive, repurposing of existing approved and investigational drugs is sought after given known safety profiles and reduction in the cost barrier. Notably, successes in oncologic drug repurposing have been infrequent. Computational in-silico strategies have been developed to aid in modeling biological processes to find new disease-relevant targets and discovering novel drug-target and drug-phenotype associations. Machine and deep learning methods have especially enabled leaps in those successes. This review will discuss these methods as they pertain to cancer biology as well as immunomodulation for drug repurposing opportunities in oncologic diseases.
Collapse
Affiliation(s)
- Naiem T Issa
- Dr. Phillip Frost Department of Dermatology and Cutaneous Surgery, University of Miami School of Medicine, Miami, FL, USA
| | - Vasileios Stathias
- Department of Molecular and Cellular Pharmacology, University of Miami School of Medicine, Miami, FL, USA
| | - Stephan Schürer
- Department of Molecular and Cellular Pharmacology, University of Miami School of Medicine, Miami, FL, USA
| | - Sivanesan Dakshanamurthy
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA.
| |
Collapse
|
13
|
Bozkurt S, Paul R, Coquet J, Sun R, Banerjee I, Brooks JD, Hernandez-Boussard T. Phenotyping severity of patient-centered outcomes using clinical notes: A prostate cancer use case. Learn Health Syst 2020; 4:e10237. [PMID: 33083539 PMCID: PMC7556418 DOI: 10.1002/lrh2.10237] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 06/15/2020] [Accepted: 06/23/2020] [Indexed: 01/12/2023] Open
Abstract
Introduction A learning health system (LHS) must improve care in ways that are meaningful to patients, integrating patient‐centered outcomes (PCOs) into core infrastructure. PCOs are common following cancer treatment, such as urinary incontinence (UI) following prostatectomy. However, PCOs are not systematically recorded because they can only be described by the patient, are subjective and captured as unstructured text in the electronic health record (EHR). Therefore, PCOs pose significant challenges for phenotyping patients. Here, we present a natural language processing (NLP) approach for phenotyping patients with UI to classify their disease into severity subtypes, which can increase opportunities to provide precision‐based therapy and promote a value‐based delivery system. Methods Patients undergoing prostate cancer treatment from 2008 to 2018 were identified at an academic medical center. Using a hybrid NLP pipeline that combines rule‐based and deep learning methodologies, we classified positive UI cases as mild, moderate, and severe by mining clinical notes. Results The rule‐based model accurately classified UI into disease severity categories (accuracy: 0.86), which outperformed the deep learning model (accuracy: 0.73). In the deep learning model, the recall rates for mild and moderate group were higher than the precision rate (0.78 and 0.79, respectively). A hybrid model that combined both methods did not improve the accuracy of the rule‐based model but did outperform the deep learning model (accuracy: 0.75). Conclusion Phenotyping patients based on indication and severity of PCOs is essential to advance a patient centered LHS. EHRs contain valuable information on PCOs and by using NLP methods, it is feasible to accurately and efficiently phenotype PCO severity. Phenotyping must extend beyond the identification of disease to provide classification of disease severity that can be used to guide treatment and inform shared decision‐making. Our methods demonstrate a path to a patient centered LHS that could advance precision medicine.
Collapse
Affiliation(s)
- Selen Bozkurt
- Department of Medicine, Biomedical Informatics Research Stanford University Stanford California USA
| | - Rohan Paul
- Department of Biomedical Data Sciences Stanford University Stanford California USA
| | - Jean Coquet
- Department of Medicine, Biomedical Informatics Research Stanford University Stanford California USA
| | - Ran Sun
- Department of Medicine, Biomedical Informatics Research Stanford University Stanford California USA
| | - Imon Banerjee
- Department of Biomedical Data Sciences Stanford University Stanford California USA.,Department of Radiology Stanford University Stanford California USA
| | - James D Brooks
- Department of Urology Stanford University Stanford California USA
| | - Tina Hernandez-Boussard
- Department of Medicine, Biomedical Informatics Research Stanford University Stanford California USA.,Department of Biomedical Data Sciences Stanford University Stanford California USA.,Department of Surgery Stanford University Stanford California USA
| |
Collapse
|
14
|
Samala RK, Chan HP, Hadjiiski LM, Helvie MA, Richter CD. Generalization error analysis for deep convolutional neural network with transfer learning in breast cancer diagnosis. Phys Med Biol 2020; 65:105002. [PMID: 32208369 DOI: 10.1088/1361-6560/ab82e8] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Deep convolutional neural network (DCNN), now popularly called artificial intelligence (AI), has shown the potential to improve over previous computer-assisted tools in medical imaging developed in the past decades. A DCNN has millions of free parameters that need to be trained, but the training sample set is limited in size for most medical imaging tasks so that transfer learning is typically used. Automatic data mining may be an efficient way to enlarge the collected data set but the data can be noisy such as incorrect labels or even a wrong type of image. In this work we studied the generalization error of DCNN with transfer learning in medical imaging for the task of classifying malignant and benign masses on mammograms. With a finite available data set, we simulated a training set containing corrupted data or noisy labels. The balance between learning and memorization of the DCNN was manipulated by varying the proportion of corrupted data in the training set. The generalization error of DCNN was analyzed by the area under the receiver operating characteristic curve for the training and test sets and the weight changes after transfer learning. The study demonstrates that the transfer learning strategy of DCNN for such tasks needs to be designed properly, taking into consideration the constraints of the available training set having limited size and quality for the classification task at hand, to minimize memorization and improve generalizability.
Collapse
Affiliation(s)
- Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109-5842, United States of America
| | | | | | | | | |
Collapse
|