Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Banerjee I, Chen MC, Lungren MP, Rubin DL. Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort. J Biomed Inform 2017;77:11-20. [PMID: 29175548 DOI: 10.1016/j.jbi.2017.11.012] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Revised: 11/17/2017] [Accepted: 11/19/2017] [Indexed: 12/31/2022]

For:	Banerjee I, Chen MC, Lungren MP, Rubin DL. Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort. J Biomed Inform 2017;77:11-20. [PMID: 29175548 DOI: 10.1016/j.jbi.2017.11.012] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Revised: 11/17/2017] [Accepted: 11/19/2017] [Indexed: 12/31/2022]

Number

Cited by Other Article(s)

Lam BD, Chrysafi P, Chiasakul T, Khosla H, Karagkouni D, McNichol M, Adamski A, Reyes N, Abe K, Mantha S, Vlachos IS, Zwicker JI, Patell R. Machine learning natural language processing for identifying venous thromboembolism: systematic review and meta-analysis. Blood Adv 2024;8:2991-3000. [PMID: 38522096 PMCID: PMC11215191 DOI: 10.1182/bloodadvances.2023012200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 02/22/2024] [Accepted: 02/22/2024] [Indexed: 03/26/2024] Open

Abstract

ABSTRACT

Venous thromboembolism (VTE) is a leading cause of preventable in-hospital mortality. Monitoring VTE cases is limited by the challenges of manual medical record review and diagnosis code interpretation. Natural language processing (NLP) can automate the process. Rule-based NLP methods are effective but time consuming. Machine learning (ML)-NLP methods present a promising solution. We conducted a systematic review and meta-analysis of studies published before May 2023 that use ML-NLP to identify VTE diagnoses in the electronic health records. Four reviewers screened all manuscripts, excluding studies that only used a rule-based method. A meta-analysis evaluated the pooled performance of each study's best performing model that evaluated for pulmonary embolism and/or deep vein thrombosis. Pooled sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) with confidence interval (CI) were calculated by DerSimonian and Laird method using a random-effects model. Study quality was assessed using an adapted TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) tool. Thirteen studies were included in the systematic review and 8 had data available for meta-analysis. Pooled sensitivity was 0.931 (95% CI, 0.881-0.962), specificity 0.984 (95% CI, 0.967-0.992), PPV 0.910 (95% CI, 0.865-0.941) and NPV 0.985 (95% CI, 0.977-0.990). All studies met at least 13 of the 21 NLP-modified TRIPOD items, demonstrating fair quality. The highest performing models used vectorization rather than bag-of-words and deep-learning techniques such as convolutional neural networks. There was significant heterogeneity in the studies, and only 4 validated their model on an external data set. Further standardization of ML studies can help progress this novel technology toward real-world implementation.

Collapse

Affiliation(s)

Barbara D. Lam Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA Division of Clinical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
Pavlina Chrysafi Department of Medicine, Mount Auburn Hospital, Harvard Medical School, Boston, MA
Thita Chiasakul Center of Excellence in Translational Hematology, Division of Hematology, Department of Medicine, Faculty of Medicine, Chulalongkorn University and King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok, Thailand
Harshit Khosla Department of Medicine, Saint Vincent Hospital, Worcester, MA
Dimitra Karagkouni Department of Pathology, Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
Megan McNichol Library Sciences, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
Alys Adamski Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
Nimia Reyes Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
Karon Abe Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
Simon Mantha Division of Hematology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
Ioannis S. Vlachos Department of Pathology, Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
Jeffrey I. Zwicker Division of Hematology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
Rushad Patell Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA

Collapse

dos Santos DP, Kotter E, Mildenberger P, Martí-Bonmatí L. ESR paper on structured reporting in radiology-update 2023. Insights Imaging 2023;14:199. [PMID: 37995019 PMCID: PMC10667169 DOI: 10.1186/s13244-023-01560-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 10/03/2023] [Indexed: 11/24/2023] Open

Abstract

Structured reporting in radiology continues to hold substantial potential to improve the quality of service provided to patients and referring physicians. Despite many physicians' preference for structured reports and various efforts by radiological societies and some vendors, structured reporting has still not been widely adopted in clinical routine.While in many countries national radiological societies have launched initiatives to further promote structured reporting, cross-institutional applications of report templates and incentives for usage of structured reporting are lacking. Various legislative measures have been taken in the USA and the European Union to promote interoperable data formats such as Fast Healthcare Interoperability Resources (FHIR) in the context of the EU Health Data Space (EHDS) which will certainly be relevant for the future of structured reporting. Lastly, recent advances in artificial intelligence and large language models may provide innovative and efficient approaches to integrate structured reporting more seamlessly into the radiologists' workflow.The ESR will remain committed to advancing structured reporting as a key component towards more value-based radiology. Practical solutions for structured reporting need to be provided by vendors. Policy makers should incentivize the usage of structured radiological reporting, especially in cross-institutional setting.Critical relevance statement Over the past years, the benefits of structured reporting in radiology have been widely discussed and agreed upon; however, implementation in clinical routine is lacking due-policy makers should incentivize the usage of structured radiological reporting, especially in cross-institutional setting.Key points1. Various national societies have established initiatives for structured reporting in radiology.2. Almost no monetary or structural incentives exist that favor structured reporting.3. A consensus on technical standards for structured reporting is still missing.4. The application of large language models may help structuring radiological reports.5. Policy makers should incentivize the usage of structured radiological reporting.

Collapse

Yang E, Li MD, Raghavan S, Deng F, Lang M, Succi MD, Huang AJ, Kalpathy-Cramer J. Transformer versus traditional natural language processing: how much data is enough for automated radiology report classification? Br J Radiol 2023;96:20220769. [PMID: 37162253 PMCID: PMC10461267 DOI: 10.1259/bjr.20220769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 04/21/2023] [Accepted: 04/26/2023] [Indexed: 05/11/2023] Open

Abstract

OBJECTIVES

Current state-of-the-art natural language processing (NLP) techniques use transformer deep-learning architectures, which depend on large training datasets. We hypothesized that traditional NLP techniques may outperform transformers for smaller radiology report datasets.

METHODS

We compared the performance of BioBERT, a deep-learning-based transformer model pre-trained on biomedical text, and three traditional machine-learning models (gradient boosted tree, random forest, and logistic regression) on seven classification tasks given free-text radiology reports. Tasks included detection of appendicitis, diverticulitis, bowel obstruction, and enteritis/colitis on abdomen/pelvis CT reports, ischemic infarct on brain CT/MRI reports, and medial and lateral meniscus tears on knee MRI reports (7,204 total annotated reports). The performance of NLP models on held-out test sets was compared after training using the full training set, and 2.5%, 10%, 25%, 50%, and 75% random subsets of the training data.

RESULTS

In all tested classification tasks, BioBERT performed poorly at smaller training sample sizes compared to non-deep-learning NLP models. Specifically, BioBERT required training on approximately 1,000 reports to perform similarly or better than non-deep-learning models. At around 1,250 to 1,500 training samples, the testing performance for all models began to plateau, where additional training data yielded minimal performance gain.

CONCLUSIONS

With larger sample sizes, transformer NLP models achieved superior performance in radiology report binary classification tasks. However, with smaller sizes (<1000) and more imbalanced training data, traditional NLP techniques performed better.

ADVANCES IN KNOWLEDGE

Our benchmarks can help guide clinical NLP researchers in selecting machine-learning models according to their dataset characteristics.

Collapse

Diab KM, Deng J, Wu Y, Yesha Y, Collado-Mesa F, Nguyen P. Natural Language Processing for Breast Imaging: A Systematic Review. Diagnostics (Basel) 2023;13:diagnostics13081420. [PMID: 37189521 DOI: 10.3390/diagnostics13081420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 04/05/2023] [Accepted: 04/11/2023] [Indexed: 05/17/2023] Open

Jujjavarapu C, Suri P, Pejaver V, Friedly J, Gold LS, Meier E, Cohen T, Mooney SD, Heagerty PJ, Jarvik JG. Predicting decompression surgery by applying multimodal deep learning to patients' structured and unstructured health data. BMC Med Inform Decis Mak 2023;23:2. [PMID: 36609379 PMCID: PMC9824905 DOI: 10.1186/s12911-022-02096-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 12/29/2022] [Indexed: 01/08/2023] Open

Abstract

BACKGROUND

Low back pain (LBP) is a common condition made up of a variety of anatomic and clinical subtypes. Lumbar disc herniation (LDH) and lumbar spinal stenosis (LSS) are two subtypes highly associated with LBP. Patients with LDH/LSS are often started with non-surgical treatments and if those are not effective then go on to have decompression surgery. However, recommendation of surgery is complicated as the outcome may depend on the patient's health characteristics. We developed a deep learning (DL) model to predict decompression surgery for patients with LDH/LSS.

MATERIALS AND METHOD

We used datasets of 8387 and 8620 patients from a prospective study that collected data from four healthcare systems to predict early (within 2 months) and late surgery (within 12 months after a 2 month gap), respectively. We developed a DL model to use patients' demographics, diagnosis and procedure codes, drug names, and diagnostic imaging reports to predict surgery. For each prediction task, we evaluated the model's performance using classical and generalizability evaluation. For classical evaluation, we split the data into training (80%) and testing (20%). For generalizability evaluation, we split the data based on the healthcare system. We used the area under the curve (AUC) to assess performance for each evaluation. We compared results to a benchmark model (i.e. LASSO logistic regression).

RESULTS

For classical performance, the DL model outperformed the benchmark model for early surgery with an AUC of 0.725 compared to 0.597. For late surgery, the DL model outperformed the benchmark model with an AUC of 0.655 compared to 0.635. For generalizability performance, the DL model outperformed the benchmark model for early surgery. For late surgery, the benchmark model outperformed the DL model.

CONCLUSIONS

For early surgery, the DL model was preferred for classical and generalizability evaluation. However, for late surgery, the benchmark and DL model had comparable performance. Depending on the prediction task, the balance of performance may shift between DL and a conventional ML method. As a result, thorough assessment is needed to quantify the value of DL, a relatively computationally expensive, time-consuming and less interpretable method.

Collapse

Affiliation(s)

Chethan Jujjavarapu Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Box 358047, Seattle, WA, 98195, USA
Pradeep Suri Clinical Learning, Evidence and Research Center, University of Washington, 4333 Brooklyn Ave NE, Seattle, WA, 98105, USA Department of Rehabilitation Medicine, University of Washington, 1959 NE Pacific St, Seattle, WA, 98195, USA
Vikas Pejaver Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
Janna Friedly Clinical Learning, Evidence and Research Center, University of Washington, 4333 Brooklyn Ave NE, Seattle, WA, 98105, USA Department of Rehabilitation Medicine, University of Washington, 1959 NE Pacific St, Seattle, WA, 98195, USA
Laura S Gold Clinical Learning, Evidence and Research Center, University of Washington, 4333 Brooklyn Ave NE, Seattle, WA, 98105, USA Department of Radiology, University of Washington, 1959 NE Pacific Street, Seattle, WA, 98195, USA
Eric Meier Clinical Learning, Evidence and Research Center, University of Washington, 4333 Brooklyn Ave NE, Seattle, WA, 98105, USA Department of Biostatistics, University of Washington, Box 357232, Seattle, WA, 98195-7232, USA Center for Biomedical Statistics, University of Washington, Seattle, WA, USA
Trevor Cohen Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Box 358047, Seattle, WA, 98195, USA
Sean D Mooney Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Box 358047, Seattle, WA, 98195, USA
Patrick J Heagerty Department of Biostatistics, University of Washington, Box 357232, Seattle, WA, 98195-7232, USA Center for Biomedical Statistics, University of Washington, Seattle, WA, USA
Jeffrey G Jarvik Clinical Learning, Evidence and Research Center, University of Washington, 4333 Brooklyn Ave NE, Seattle, WA, 98105, USA. Department of Radiology, University of Washington, 1959 NE Pacific Street, Seattle, WA, 98195, USA. Department of Neurological Surgery, University of Washington, 1959 NE Pacific Street, Seattle, WA, 98195, USA. Department of Health Services, University of Washington, Box 357660, Seattle, WA, 98195-7660, USA.

Collapse

Natural Language Processing in Radiology: Update on Clinical Applications. J Am Coll Radiol 2022;19:1271-1285. [PMID: 36029890 DOI: 10.1016/j.jacr.2022.06.016] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/25/2022] [Accepted: 06/03/2022] [Indexed: 11/24/2022]

Linna N, Kahn CE. Applications of Natural Language Processing in Radiology: A Systematic Review. Int J Med Inform 2022;163:104779. [DOI: 10.1016/j.ijmedinf.2022.104779] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/28/2022] [Accepted: 04/21/2022] [Indexed: 12/27/2022]

Cheng J. Neural Network Assisted Pathology Case Identification. J Pathol Inform 2022;13:100008. [PMID: 35242447 PMCID: PMC8860736 DOI: 10.1016/j.jpi.2022.100008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 01/03/2022] [Indexed: 12/02/2022] Open

Jujjavarapu C, Pejaver V, Cohen TA, Mooney SD, Heagerty PJ, Jarvik JG. A Comparison of Natural Language Processing Methods for the Classification of Lumbar Spine Imaging Findings Related to Lower Back Pain. Acad Radiol 2022;29 Suppl 3:S188-S200. [PMID: 34862122 PMCID: PMC8917985 DOI: 10.1016/j.acra.2021.09.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 08/22/2021] [Accepted: 09/04/2021] [Indexed: 11/28/2022]

Abstract

RATIONALE AND OBJECTIVES

The use of natural language processing (NLP) in radiology provides an opportunity to assist clinicians with phenotyping patients. However, the performance and generalizability of NLP across healthcare systems is uncertain. We assessed the performance within and generalizability across four healthcare systems of different NLP representational methods, coupled with elastic-net logistic regression to classify lower back pain-related findings from lumbar spine imaging reports.

MATERIALS AND METHODS

We used a dataset of 871 X-ray and magnetic resonance imaging reports sampled from a prospective study across four healthcare systems between October 2013 and September 2016. We annotated each report for 26 findings potentially related to lower back pain. Our framework applied four different NLP methods to convert text into feature sets (representations). For each representation, our framework used an elastic-net logistic regression model for each finding (i.e., 26 binary or "one-vs.-rest" classification models). For performance evaluation, we split data into training (80%, 697/871) and testing (20%, 174/871). In the training set, we used cross validation to identify the optimal hyperparameter value and then retrained on the full training set. We then assessed performance based on area under the curve (AUC) for the test set. We repeated this process 25 times with each repeat using a different random train/test split of the data, so that we could estimate 95% confidence intervals, and assess significant difference in performance between representations. For generalizability evaluation, we trained models on data from three healthcare systems with cross validation and then tested on the fourth. We repeated this process for each system, then calculated mean and standard deviation (SD) of AUC across the systems.

RESULTS

For individual representations, n-grams had the best average performance across all 26 findings (AUC: 0.960). For generalizability, document embeddings had the most consistent average performance across systems (SD: 0.010). Out of these 26 findings, we considered eight as potentially clinically important (any stenosis, central stenosis, lateral stenosis, foraminal stenosis, disc extrusion, nerve root displacement compression, endplate edema, and listhesis grade 2) since they have a relatively greater association with a history of lower back pain compared to the remaining 18 classes. We found a similar pattern for these eight in which n-grams and document embeddings had the best average performance (AUC: 0.954) and generalizability (SD: 0.007), respectively.

CONCLUSION

Based on performance assessment, we found that n-grams is the preferred method if classifier development and deployment occur at the same system. However, for deployment at multiple systems outside of the development system, or potentially if physician behavior changes within a system, one should consider document embeddings since embeddings appear to have the most consistent performance across systems.

Collapse

Robustness, replicability and scalability in topic modelling. J Informetr 2022. [DOI: 10.1016/j.joi.2021.101224] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Ryan L, Maharjan J, Mataraso S, Barnes G, Hoffman J, Mao Q, Calvert J, Das R. Predicting pulmonary embolism among hospitalized patients with machine learning algorithms. Pulm Circ 2022;12:e12013. [PMID: 35506114 PMCID: PMC9052977 DOI: 10.1002/pul2.12013] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 11/24/2021] [Accepted: 11/28/2021] [Indexed: 01/15/2023] Open

Natural language processing of head CT reports to identify intracranial mass effect: CTIME algorithm. Am J Emerg Med 2021;51:388-392. [PMID: 34839182 DOI: 10.1016/j.ajem.2021.11.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 10/30/2021] [Accepted: 11/02/2021] [Indexed: 12/17/2022] Open

Abstract

BACKGROUND

The Mortality Probability Model (MPM) is used in research and quality improvement to adjust for severity of illness and can also inform triage decisions. However, a limitation for its automated use or application is that it includes the variable "intracranial mass effect" (IME), which requires human engagement with the electronic health record (EHR). We developed and tested a natural language processing (NLP) algorithm to identify IME from CT head reports.

METHODS

We obtained initial CT head reports from adult patients who were admitted to the ICU from our ED between 10/2013 and 9/2016. Each head CT head report was labeled yes/no IME by at least two of five independent labelers. The reports were then randomly divided 80/20 into training and test sets. All reports were preprocessed to remove linguistic and style variability, and a dictionary was created to map similar common terms. We tested three vectorization strategies: Term Frequency-Inverse Document frequency (TF-IDF), Word2Vec, and Universal Sentence Encoder to convert the report text to a numerical vector. This vector served as the input to a classification-tree-based ensemble machine learning algorithm (XGBoost). After training, model performance was assessed in the test set using the area under the receiver operating characteristic curve (AUROC). We also divided the continuous range of scores into positive/inconclusive/negative categories for IME.

RESULTS

Of the 1202 CT reports in the training set, 308 (25.6%) reports were manually labeled as "yes" for IME. Of the 355 reports in the test set, 108 (30.4%) were labeled as "yes" for IME. The TF-IDF vectorization strategy as an input for the XGBoost model had the best AUROC:-- 0.9625 (95% CI 0.9443-0.9807). TF-IDF score categories were defined and had the following likelihood ratios: "positive" (TF-IDF score > 0.5) LR = 24.59; "inconclusive" (TF-IDF 0.05-0.5) LR = 0.99; and "negative" (TF-IDF < 0.05) LR = 0.05. 82% of reports were classified as either "positive" or "negative". In the test set, only 4 of 199 (2.0%) reports with a "negative" classification were false negatives and only 8 of 93 (8.6%) reports classified as "positive" were false positives.

CONCLUSION

NLP can accurately identify IME from free-text reports of head CTs in approximately 80% of records, adequate to allow automatic calculation of MPM based on EHR data for many applications.

Collapse

Bizzo BC, Almeida RR, Alkasab TK. Artificial Intelligence Enabling Radiology Reporting. Radiol Clin North Am 2021;59:1045-1052. [PMID: 34689872 DOI: 10.1016/j.rcl.2021.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Steinkamp J, Cook TS. Basic Artificial Intelligence Techniques: Natural Language Processing of Radiology Reports. Radiol Clin North Am 2021;59:919-931. [PMID: 34689877 DOI: 10.1016/j.rcl.2021.06.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Paul A, Shen TC, Lee S, Balachandar N, Peng Y, Lu Z, Summers RM. Generalized Zero-Shot Chest X-Ray Diagnosis Through Trait-Guided Multi-View Semantic Embedding With Self-Training. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021;40:2642-2655. [PMID: 33523805 PMCID: PMC8591713 DOI: 10.1109/tmi.2021.3054817] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Percha B. Modern Clinical Text Mining: A Guide and Review. Annu Rev Biomed Data Sci 2021;4:165-187. [PMID: 34465177 DOI: 10.1146/annurev-biodatasci-030421-030931] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Casey A, Davidson E, Poon M, Dong H, Duma D, Grivas A, Grover C, Suárez-Paniagua V, Tobin R, Whiteley W, Wu H, Alex B. A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak 2021;21:179. [PMID: 34082729 PMCID: PMC8176715 DOI: 10.1186/s12911-021-01533-7] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/17/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Natural language processing (NLP) has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses and quantifies recent literature in NLP applied to radiology reports.

METHODS

We conduct an automated literature search yielding 4836 results using automated filtering, metadata enriching steps and citation search combined with manual review. Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics.

RESULTS

We present a comprehensive analysis of the 164 publications retrieved with publications in 2019 almost triple those in 2015. Each publication is categorised into one of 6 clinical application categories. Deep learning use increases in the period but conventional machine learning approaches are still prevalent. Deep learning remains challenged when data is scarce and there is little evidence of adoption into clinical practice. Despite 17% of studies reporting greater than 0.85 F1 scores, it is hard to comparatively evaluate these approaches given that most of them use different datasets. Only 14 studies made their data and 15 their code available with 10 externally validating results.

CONCLUSIONS

Automated understanding of clinical narratives of the radiology reports has the potential to enhance the healthcare process and we show that research in this field continues to grow. Reproducibility and explainability of models are important if the domain is to move applications into clinical use. More could be done to share code enabling validation of methods on different institutional data and to reduce heterogeneity in reporting of study properties allowing inter-study comparisons. Our results have significance for researchers in the field providing a systematic synthesis of existing work to build on, identify gaps, opportunities for collaboration and avoid duplication.

Collapse

Maros ME, Cho CG, Junge AG, Kämpgen B, Saase V, Siegel F, Trinkmann F, Ganslandt T, Groden C, Wenz H. Comparative analysis of machine learning algorithms for computer-assisted reporting based on fully automated cross-lingual RadLex mappings. Sci Rep 2021;11:5529. [PMID: 33750857 PMCID: PMC7970897 DOI: 10.1038/s41598-021-85016-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 02/23/2021] [Indexed: 02/03/2023] Open

Affiliation(s)

Máté E Maros Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany. Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
Chang Gyu Cho Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
Andreas G Junge Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany
Benedikt Kämpgen Empolis Information Management GmbH, Kaiserslautern, Germany
Victor Saase Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany
Fabian Siegel Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
Frederik Trinkmann Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
Thomas Ganslandt Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
Christoph Groden Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany
Holger Wenz Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany

Collapse

Machine Learning and Deep Neural Network Applications in the Thorax: Pulmonary Embolism, Chronic Thromboembolic Pulmonary Hypertension, Aorta, and Chronic Obstructive Pulmonary Disease. J Thorac Imaging 2021;35 Suppl 1:S40-S48. [PMID: 32271281 DOI: 10.1097/rti.0000000000000492] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y, Soni S, Wang Q, Wei Q, Xiang Y, Zhao B, Xu H. Deep learning in clinical natural language processing: a methodical review. J Am Med Inform Assoc 2021;27:457-470. [PMID: 31794016 DOI: 10.1093/jamia/ocz200] [Citation(s) in RCA: 167] [Impact Index Per Article: 55.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 10/15/2019] [Accepted: 11/09/2019] [Indexed: 02/07/2023] Open

Chen TL, Emerling M, Chaudhari GR, Chillakuru YR, Seo Y, Vu TH, Sohn JH. Domain specific word embeddings for natural language processing in radiology. J Biomed Inform 2021;113:103665. [PMID: 33333323 PMCID: PMC7856086 DOI: 10.1016/j.jbi.2020.103665] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 11/03/2020] [Accepted: 12/10/2020] [Indexed: 11/25/2022]

Abstract

BACKGROUND

There has been increasing interest in machine learning based natural language processing (NLP) methods in radiology; however, models have often used word embeddings trained on general web corpora due to lack of a radiology-specific corpus.

PURPOSE

We examined the potential of Radiopaedia to serve as a general radiology corpus to produce radiology specific word embeddings that could be used to enhance performance on a NLP task on radiological text.

MATERIALS AND METHODS

Embeddings of dimension 50, 100, 200, and 300 were trained on articles collected from Radiopaedia using a GloVe algorithm and evaluated on analogy completion. A shallow neural network using input from either our trained embeddings or pre-trained Wikipedia 2014 + Gigaword 5 (WG) embeddings was used to label the Radiopaedia articles. Labeling performance was evaluated based on exact match accuracy and Hamming loss. The McNemar's test with continuity and the Benjamini-Hochberg correction and a 5×2 cross validation paired two-tailed t-test were used to assess statistical significance.

RESULTS

For accuracy in the analogy task, 50-dimensional (50-D) Radiopaedia embeddings outperformed WG embeddings on tumor origin analogies (p < 0.05) and organ adjectives (p < 0.01) whereas WG embeddings tended to outperform on inflammation location and bone vs. muscle analogies (p < 0.01). The two embeddings had comparable performance on other subcategories. In the labeling task, the Radiopaedia-based model outperformed the WG based model at 50, 100, 200, and 300-D for exact match accuracy (p < 0.001, p < 0.001, p < 0.01, and p < 0.05, respectively) and Hamming loss (p < 0.001, p < 0.001, p < 0.01, and p < 0.05, respectively).

CONCLUSION

We have developed a set of word embeddings from Radiopaedia and shown that they can preserve relevant medical semantics and augment performance on a radiology NLP task. Our results suggest that the cultivation of a radiology-specific corpus can benefit radiology NLP models in the future.

Collapse

Huang SC, Pareek A, Zamanian R, Banerjee I, Lungren MP. Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record: a case-study in pulmonary embolism detection. Sci Rep 2020;10:22147. [PMID: 33335111 PMCID: PMC7746687 DOI: 10.1038/s41598-020-78888-w] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 11/25/2020] [Indexed: 12/12/2022] Open

Zhang Y, Walecki R, Winter JR, Bragman FJS, Lourenco S, Hart C, Baker A, Perov Y, Johri S. Applying Artificial Intelligence Methods for the Estimation of Disease Incidence: The Utility of Language Models. Front Digit Health 2020;2:569261. [PMID: 34713043 PMCID: PMC8521977 DOI: 10.3389/fdgth.2020.569261] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 10/13/2020] [Indexed: 12/03/2022] Open

Abstract

Background: AI-driven digital health tools often rely on estimates of disease incidence or prevalence, but obtaining these estimates is costly and time-consuming. We explored the use of machine learning models that leverage contextual information about diseases from unstructured text, to estimate disease incidence.

Methods: We used a class of machine learning models, called language models, to extract contextual information relating to disease incidence. We evaluated three different language models: BioBERT, Global Vectors for Word Representation (GloVe), and the Universal Sentence Encoder (USE), as well as an approach which uses all jointly. The output of these models is a mathematical representation of the underlying data, known as “embeddings.” We used these to train neural network models to predict disease incidence. The neural networks were trained and validated using data from the Global Burden of Disease study, and tested using independent data sourced from the epidemiological literature.

Findings: A variety of language models can be used to encode contextual information of diseases. We found that, on average, BioBERT embeddings were the best for disease names across multiple tasks. In particular, BioBERT was the best performing model when predicting specific disease-country pairs, whilst a fusion model combining BioBERT, GloVe, and USE performed best on average when predicting disease incidence in unseen countries. We also found that GloVe embeddings performed better than BioBERT embeddings when applied to country names. However, we also noticed that the models were limited in view of predicting previously unseen diseases. Further limitations were also observed with substantial variations across age groups and notably lower performance for diseases that are highly dependent on location and climate.

Interpretation: We demonstrate that context-aware machine learning models can be used for estimating disease incidence. This method is quicker to implement than traditional epidemiological approaches. We therefore suggest it complements existing modeling efforts, where data is required more rapidly or at larger scale. This may particularly benefit AI-driven digital health products where the data will undergo further processing and a validated approximation of the disease incidence is adequate.

Collapse

Short RG, Bralich J, Bogaty D, Befera NT. Comprehensive Word-Level Classification of Screening Mammography Reports Using a Neural Network Sequence Labeling Approach. J Digit Imaging 2020;32:685-692. [PMID: 30338478 DOI: 10.1007/s10278-018-0141-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open

Abstract

Radiology reports contain a large amount of potentially valuable unstructured data. Recently, neural networks have been employed to perform classification of radiology reports over a few classes at the document level. The success of neural networks in sequence-labeling problems such as named entity recognition and part of speech tagging suggests that they could be used to classify radiology report text with greater granularity. We employed a neural network architecture to comprehensively classify mammography report text at the word level using a sequence labeling approach. Two radiologists devised a comprehensive classification system for screening mammography reports. Each word in each report was manually categorized by a radiologist into one of 33 categories according to the classification system. Tagged words referencing the same finding were grouped into unique sets. We pre-labeled reports with a rule-based algorithm and then manually edited these annotations for 6705 screening mammography reports (25.1%, 66.8%, and 8.1% BI-RADS 0, 1, and 2, respectively). A combined convolutional and recurrent neural network model was used to label words in each sentence of the individual reports. A siamese recurrent neural network was then used to group findings into sets. Performance of the neural network-based method was compared to a rule-based algorithm and a conditional random field (CRF) model. Global accuracy (percentage of documents where all word tags were predicted correctly) and keyword accuracy (percentage of all words that were labeled correctly, excluding words tagged as unimportant) were calculated on an unseen 519 report test set. Two-tailed t tests were used to assess differences between algorithm performance, and p < 0.05 was used to determine statistical significance. The neural network-based approach showed significantly higher global accuracy compared to both the rule-based algorithm (88.3 vs 57.0%, p < 0.001) and the CRF model (88.3% vs. 75.8%, p < 0.001). The neural network also showed significantly higher keyword level accuracy compared to the rule-based algorithm (95.5% vs. 80.9% p < 0.001) and CRF model (95.5% vs. 76.9%, p < 0.001). We demonstrate the potential of neural networks to accurately perform word-level multilabel classification of free text radiology reports across 33 classes, thus showing the utility of a sequence labeling approach to NLP of radiology reports. We found that a neural network classifier outperforms a rule-based algorithm and a CRF classifier for comprehensive multilabel classification of free text screening mammography reports at the word level. By approaching radiology report classification as a sequence-labeling problem, we demonstrate the ability of neural networks to extract data from free text radiology reports at a level of granularity not previously reported.

Collapse

Bozkurt S, Alkim E, Banerjee I, Rubin DL. Automated Detection of Measurements and Their Descriptors in Radiology Reports Using a Hybrid Natural Language Processing Algorithm. J Digit Imaging 2020;32:544-553. [PMID: 31222557 PMCID: PMC6646482 DOI: 10.1007/s10278-019-00237-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open

Ong CJ, Orfanoudaki A, Zhang R, Caprasse FPM, Hutch M, Ma L, Fard D, Balogun O, Miller MI, Minnig M, Saglam H, Prescott B, Greer DM, Smirnakis S, Bertsimas D. Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports. PLoS One 2020;15:e0234908. [PMID: 32559211 PMCID: PMC7304623 DOI: 10.1371/journal.pone.0234908] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 06/04/2020] [Indexed: 12/20/2022] Open

Abstract

Accurate, automated extraction of clinical stroke information from unstructured text has several important applications. ICD-9/10 codes can misclassify ischemic stroke events and do not distinguish acuity or location. Expeditious, accurate data extraction could provide considerable improvement in identifying stroke in large datasets, triaging critical clinical reports, and quality improvement efforts. In this study, we developed and report a comprehensive framework studying the performance of simple and complex stroke-specific Natural Language Processing (NLP) and Machine Learning (ML) methods to determine presence, location, and acuity of ischemic stroke from radiographic text. We collected 60,564 Computed Tomography and Magnetic Resonance Imaging Radiology reports from 17,864 patients from two large academic medical centers. We used standard techniques to featurize unstructured text and developed neurovascular specific word GloVe embeddings. We trained various binary classification algorithms to identify stroke presence, location, and acuity using 75% of 1,359 expert-labeled reports. We validated our methods internally on the remaining 25% of reports and externally on 500 radiology reports from an entirely separate academic institution. In our internal population, GloVe word embeddings paired with deep learning (Recurrent Neural Networks) had the best discrimination of all methods for our three tasks (AUCs of 0.96, 0.98, 0.93 respectively). Simpler NLP approaches (Bag of Words) performed best with interpretable algorithms (Logistic Regression) for identifying ischemic stroke (AUC of 0.95), MCA location (AUC 0.96), and acuity (AUC of 0.90). Similarly, GloVe and Recurrent Neural Networks (AUC 0.92, 0.89, 0.93) generalized better in our external test set than BOW and Logistic Regression for stroke presence, location and acuity, respectively (AUC 0.89, 0.86, 0.80). Our study demonstrates a comprehensive assessment of NLP techniques for unstructured radiographic text. Our findings are suggestive that NLP/ML methods can be used to discriminate stroke features from large data cohorts for both clinical and research-related investigations.

Collapse

Affiliation(s)

Charlene Jennifer Ong Boston University School of Medicine, Boston, Massachusetts, United States of America Boston Medical Center, Boston, Massachusetts, United States of America Harvard Medical School, Boston, Massachusetts, United States of America Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America * E-mail:
Agni Orfanoudaki Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
Rebecca Zhang Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
Francois Pierre M. Caprasse Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
Meghan Hutch Boston University School of Medicine, Boston, Massachusetts, United States of America Boston Medical Center, Boston, Massachusetts, United States of America
Liang Ma Boston University School of Medicine, Boston, Massachusetts, United States of America
Darian Fard Boston University School of Medicine, Boston, Massachusetts, United States of America
Oluwafemi Balogun Boston University School of Medicine, Boston, Massachusetts, United States of America Boston Medical Center, Boston, Massachusetts, United States of America
Matthew I. Miller Boston University School of Medicine, Boston, Massachusetts, United States of America
Margaret Minnig Boston University School of Medicine, Boston, Massachusetts, United States of America
Hanife Saglam Harvard Medical School, Boston, Massachusetts, United States of America
Brenton Prescott Boston Medical Center, Boston, Massachusetts, United States of America
David M. Greer Boston University School of Medicine, Boston, Massachusetts, United States of America Boston Medical Center, Boston, Massachusetts, United States of America
Stelios Smirnakis Harvard Medical School, Boston, Massachusetts, United States of America
Dimitris Bertsimas Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America

Collapse

A Scalable Natural Language Processing for Inferring BT-RADS Categorization from Unstructured Brain Magnetic Resonance Reports. J Digit Imaging 2020;33:1393-1400. [PMID: 32495125 DOI: 10.1007/s10278-020-00350-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open

Meng X, Ganoe CH, Sieberg RT, Cheung YY, Hassanpour S. Self-Supervised Contextual Language Representation of Radiology Reports to Improve the Identification of Communication Urgency. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020;2020:413-421. [PMID: 32477662 PMCID: PMC7233055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Yagahara A, Sato T. [Evaluation of the Automatic Full Form Retrieval Method from Abbreviation Using Word2vec for Terminology Expansion]. Nihon Hoshasen Gijutsu Gakkai Zasshi 2020;76:1118-1124. [PMID: 33229841 DOI: 10.6009/jjrt.2020_jsrt_76.11.1118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

SECNLP: A survey of embeddings in clinical natural language processing. J Biomed Inform 2020;101:103323. [DOI: 10.1016/j.jbi.2019.103323] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Revised: 09/12/2019] [Accepted: 10/27/2019] [Indexed: 12/11/2022]

Hassanzadeh H, Nguyen A, Verspoor K. Quantifying semantic similarity of clinical evidence in the biomedical literature to facilitate related evidence synthesis. J Biomed Inform 2019;100:103321. [PMID: 31676460 DOI: 10.1016/j.jbi.2019.103321] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Revised: 09/28/2019] [Accepted: 10/25/2019] [Indexed: 10/25/2022]

Abstract

OBJECTIVE

Published clinical trials and high quality peer reviewed medical publications are considered as the main sources of evidence used for synthesizing systematic reviews or practicing Evidence Based Medicine (EBM). Finding all relevant published evidence for a particular medical case is a time and labour intensive task, given the breadth of the biomedical literature. Automatic quantification of conceptual relationships between key clinical evidence within and across publications, despite variations in the expression of clinically-relevant concepts, can help to facilitate synthesis of evidence. In this study, we aim to provide an approach towards expediting evidence synthesis by quantifying semantic similarity of key evidence as expressed in the form of individual sentences. Such semantic textual similarity can be applied as a key approach for supporting selection of related studies.

MATERIAL AND METHODS

We propose a generalisable approach for quantifying semantic similarity of clinical evidence in the biomedical literature, specifically considering the similarity of sentences corresponding to a given type of evidence, such as clinical interventions, population information, clinical findings, etc. We develop three sets of generic, ontology-based, and vector-space models of similarity measures that make use of a variety of lexical, conceptual, and contextual information to quantify the similarity of full sentences containing clinical evidence. To understand the impact of different similarity measures on the overall evidence semantic similarity quantification, we provide a comparative analysis of these measures when used as input to an unsupervised linear interpolation and a supervised regression ensemble. In order to provide a reliable test-bed for this experiment, we generate a dataset of 1000 pairs of sentences from biomedical publications that are annotated by ten human experts. We also extend the experiments on an external dataset for further generalisability testing.

RESULTS

The combination of all diverse similarity measures showed stronger correlations with the gold standard similarity scores in the dataset than any individual kind of measure. Our approach reached near 0.80 average Pearson correlation across different clinical evidence types using the devised similarity measures. Although they were more effective when combined together, individual generic and vector-space measures also resulted in strong similarity quantification when used in both unsupervised and supervised models. On the external dataset, our similarity measures were highly competitive with the state-of-the-art approaches developed and trained specifically on that dataset for predicting semantic similarity.

CONCLUSION

Experimental results showed that the proposed semantic similarity quantification approach can effectively identify related clinical evidence that is reported in the literature. The comparison with a state-of-the-art method demonstrated the effectiveness of the approach, and experiments with an external dataset support its generalisability.

Collapse

Banerjee I, Sofela M, Yang J, Chen JH, Shah NH, Ball R, Mushlin AI, Desai M, Bledsoe J, Amrhein T, Rubin DL, Zamanian R, Lungren MP. Development and Performance of the Pulmonary Embolism Result Forecast Model (PERFORM) for Computed Tomography Clinical Decision Support. JAMA Netw Open 2019;2:e198719. [PMID: 31390040 PMCID: PMC6686780 DOI: 10.1001/jamanetworkopen.2019.8719] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open

Abstract

IMPORTANCE

Pulmonary embolism (PE) is a life-threatening clinical problem, and computed tomographic imaging is the standard for diagnosis. Clinical decision support rules based on PE risk-scoring models have been developed to compute pretest probability but are underused and tend to underperform in practice, leading to persistent overuse of CT imaging for PE.

OBJECTIVE

To develop a machine learning model to generate a patient-specific risk score for PE by analyzing longitudinal clinical data as clinical decision support for patients referred for CT imaging for PE.

DESIGN, SETTING, AND PARTICIPANTS

In this diagnostic study, the proposed workflow for the machine learning model, the Pulmonary Embolism Result Forecast Model (PERFORM), transforms raw electronic medical record (EMR) data into temporal feature vectors and develops a decision analytical model targeted toward adult patients referred for CT imaging for PE. The model was tested on holdout patient EMR data from 2 large, academic medical practices. A total of 3397 annotated CT imaging examinations for PE from 3214 unique patients seen at Stanford University hospitals and clinics were used for training and validation. The models were externally validated on 240 unique patients seen at Duke University Medical Center. The comparison with clinical scoring systems was done on randomly selected 100 outpatient samples from Stanford University hospitals and clinics and 101 outpatient samples from Duke University Medical Center.

MAIN OUTCOMES AND MEASURES

Prediction performance of diagnosing acute PE was evaluated using ElasticNet, artificial neural networks, and other machine learning approaches on holdout data sets from both institutions, and performance of models was measured by area under the receiver operating characteristic curve (AUROC).

RESULTS

Of the 3214 patients included in the study, 1704 (53.0%) were women from Stanford University hospitals and clinics; mean (SD) age was 60.53 (19.43) years. The 240 patients from Duke University Medical Center used for validation included 132 women (55.0%); mean (SD) age was 70.2 (14.2) years. In the samples for clinical scoring system comparisons, the 100 outpatients from Stanford University hospitals and clinics included 67 women (67.0%); mean (SD) age was 57.74 (19.87) years, and the 101 patients from Duke University Medical Center included 59 women (58.4%); mean (SD) age was 73.06 (15.3) years. The best-performing model achieved an AUROC performance of predicting a positive PE study of 0.90 (95% CI, 0.87-0.91) on intrainstitutional holdout data with an AUROC of 0.71 (95% CI, 0.69-0.72) on an external data set from Duke University Medical Center; superior AUROC performance and cross-institutional generalization of the model of 0.81 (95% CI, 0.77-0.87) and 0.81 (95% CI, 0.73-0.82), respectively, were noted on holdout outpatient populations from both intrainstitutional and extrainstitutional data.

CONCLUSIONS AND RELEVANCE

The machine learning model, PERFORM, may consider multitudes of applicable patient-specific risk factors and dependencies to arrive at a PE risk prediction that generalizes to new population distributions. This approach might be used as an automated clinical decision-support tool for patients referred for CT PE imaging to improve CT use.

Collapse

Bozkurt S, Kan KM, Ferrari MK, Rubin DL, Blayney DW, Hernandez-Boussard T, Brooks JD. Is it possible to automatically assess pretreatment digital rectal examination documentation using natural language processing? A single-centre retrospective study. BMJ Open 2019;9:e027182. [PMID: 31324681 PMCID: PMC6661600 DOI: 10.1136/bmjopen-2018-027182] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open

Langlotz CP, Allen B, Erickson BJ, Kalpathy-Cramer J, Bigelow K, Cook TS, Flanders AE, Lungren MP, Mendelson DS, Rudie JD, Wang G, Kandarpa K. A Roadmap for Foundational Research on Artificial Intelligence in Medical Imaging: From the 2018 NIH/RSNA/ACR/The Academy Workshop. Radiology 2019;291:781-791. [PMID: 30990384 PMCID: PMC6542624 DOI: 10.1148/radiol.2019190613] [Citation(s) in RCA: 175] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Revised: 03/24/2019] [Accepted: 03/25/2019] [Indexed: 01/08/2023]

Affiliation(s)

Curtis P. Langlotz From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Bibb Allen From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Bradley J. Erickson From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Jayashree Kalpathy-Cramer From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Keith Bigelow From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Tessa S. Cook From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Adam E. Flanders From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Matthew P. Lungren From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
David S. Mendelson From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Jeffrey D. Rudie From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Ge Wang From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Krishna Kandarpa From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)

Collapse

Fócil-Arias C, Sidorov G, Gelbukh A. Medical events extraction to analyze clinical records with conditional random fields. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-179014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Meng X, Ganoe CH, Sieberg RT, Cheung YY, Hassanpour S. Assisting radiologists with reporting urgent findings to referring physicians: A machine learning approach to identify cases for prompt communication. J Biomed Inform 2019;93:103169. [PMID: 30959206 DOI: 10.1016/j.jbi.2019.103169] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 03/15/2019] [Accepted: 04/04/2019] [Indexed: 10/27/2022]

Banerjee I, Choi HH, Desser T, Rubin DL. A Scalable Machine Learning Approach for Inferring Probabilistic US-LI-RADS Categorization. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018;2018:215-224. [PMID: 30815059 PMCID: PMC6371287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Bozkurt S, Park JI, Kan KM, Ferrari M, Rubin DL, Brooks JD, Hernandez-Boussard T. An Automated Feature Engineering for Digital Rectal Examination Documentation using Natural Language Processing. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018;2018:288-294. [PMID: 30815067 PMCID: PMC6371344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Hassanzadeh H, Nguyen A, Karimi S, Chu K. Transferability of artificial neural networks for clinical document classification across hospitals: A case study on abnormality detection from radiology reports. J Biomed Inform 2018;85:68-79. [PMID: 30026067 DOI: 10.1016/j.jbi.2018.07.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Revised: 06/25/2018] [Accepted: 07/14/2018] [Indexed: 10/28/2022]

Abstract

OBJECTIVE

Application of machine learning techniques for automatic and reliable classification of clinical documents have shown promising results. However, machine learning models require abundant training data specific to each target hospital and may not be able to benefit from available labeled data from each of the hospitals due to data variations. Such training data limitations have presented one of the major obstacles for maximising potential application of machine learning approaches in the healthcare domain. We investigated transferability of artificial neural network models across hospitals from different domains representing various age demographic groups (i.e., children, adults, and mixed) in order to cope with such limitations.

MATERIALS AND METHODS

We explored the transferability of artificial neural networks for clinical document classification. Our case study was to detect abnormalities from limb X-ray reports obtained from the emergency department (ED) of three hospitals within different domains. Different transfer learning scenarios were investigated in order to employ a source hospital's trained model for addressing a target hospital's abnormality detection problem.

RESULTS

A Convolutional Neural Network (CNN) model exhibited the best effectiveness compared to other networks when employing an embedding model trained on a large corpus of clinical documents. Furthermore, CNN models derived from a source hospital outperformed a conventional machine learning approach based on Support Vector Machines (SVM) when applied to a different (target) hospital. These models were further improved by leveraging available training data in target hospitals and outperformed the models that used only the target hospital data with F1-Score of 0.92-0.96 across three hospitals.

DISCUSSION

Our transfer learning model used only simple vector representations of documents without any task-specific feature engineering. Transferring the CNN model significantly improved (approx.10% in F1-Score) the state-of-the-art approach for clinical document classification based on a trivial transferred model. In addition, the results showed that transfer learning techniques can further improve a CNN model that is trained only on either a source or target hospital's data.

CONCLUSION

Transferring a pre-trained CNN model generated in one hospital to another facilitates application of machine learning approaches that alleviate both hospital-specific feature engineering and training data.

Collapse

Banerjee I, Gensheimer MF, Wood DJ, Henry S, Aggarwal S, Chang DT, Rubin DL. Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients (PPES-Met) Utilizing Free-Text Clinical Narratives. Sci Rep 2018;8:10037. [PMID: 29968730 PMCID: PMC6030075 DOI: 10.1038/s41598-018-27946-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 06/12/2018] [Indexed: 02/07/2023] Open