1
|
Finkelstein J, Smiley A, Echeverria C, Mooney K. AI-Driven Prediction of Symptom Trajectories in Cancer Care: A Deep Learning Approach for Chemotherapy Management. Bioengineering (Basel) 2024; 11:1172. [PMID: 39593830 PMCID: PMC11592055 DOI: 10.3390/bioengineering11111172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 11/14/2024] [Accepted: 11/17/2024] [Indexed: 11/28/2024] Open
Abstract
This study presents an advanced method for predicting symptom escalation in chemotherapy patients using Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs). The accurate prediction of symptom escalation is critical in cancer care to enable timely interventions and improve symptom management to enhance patients' quality of life during treatment. The analytical dataset consists of daily self-reported symptom logs from chemotherapy patients, including a wide range of symptoms, such as nausea, fatigue, and pain. The original dataset was highly imbalanced, with approximately 84% of the data containing no symptom escalation. The data were resampled into varying interval lengths to address this imbalance and improve the model's ability to detect symptom escalation (n = 3 to n = 7 days). This allowed the model to predict significant changes in symptom severity across these intervals. The results indicate that shorter intervals (n = 3 days) yielded the highest overall performance, with the CNN model achieving an accuracy of 81%, precision of 87%, recall of 80%, and an F1 score of 83%. This was an improvement over the LSTM model, which had an accuracy of 79%, precision of 85%, recall of 79%, and an F1 score of 82%. The model's accuracy and recall declined as the interval length increased, though precision remained relatively stable. The findings demonstrate that both CNN's temporospatial feature extraction and LSTM's temporal modeling effectively capture escalation patterns in symptom progression. By integrating these predictive models into digital health systems, healthcare providers can offer more personalized and proactive care, enabling earlier interventions that may reduce symptom burden and improve treatment adherence. Ultimately, this approach has the potential to significantly enhance the overall quality of life for chemotherapy patients by providing real-time insights into symptom trajectories and guiding clinical decision making.
Collapse
Affiliation(s)
- Joseph Finkelstein
- Department of Biomedical Informatics, The University of Utah, Salt Lake City, UT 84108, USA;
| | - Aref Smiley
- Department of Biomedical Informatics, The University of Utah, Salt Lake City, UT 84108, USA;
| | - Christina Echeverria
- College of Nursing, The University of Utah, Salt Lake City, UT 84112, USA; (C.E.); (K.M.)
| | - Kathi Mooney
- College of Nursing, The University of Utah, Salt Lake City, UT 84112, USA; (C.E.); (K.M.)
| |
Collapse
|
2
|
Fu YV, Ramachandran GK, Halwani A, McInnes BT, Xia F, Lybarger K, Yetisgen M, Uzuner Ö. CACER: Clinical concept Annotations for Cancer Events and Relations. J Am Med Inform Assoc 2024; 31:2583-2594. [PMID: 39225779 PMCID: PMC11491616 DOI: 10.1093/jamia/ocae231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 08/08/2024] [Accepted: 08/12/2024] [Indexed: 09/04/2024] Open
Abstract
OBJECTIVE Clinical notes contain unstructured representations of patient histories, including the relationships between medical problems and prescription drugs. To investigate the relationship between cancer drugs and their associated symptom burden, we extract structured, semantic representations of medical problem and drug information from the clinical narratives of oncology notes. MATERIALS AND METHODS We present Clinical concept Annotations for Cancer Events and Relations (CACER), a novel corpus with fine-grained annotations for over 48 000 medical problems and drug events and 10 000 drug-problem and problem-problem relations. Leveraging CACER, we develop and evaluate transformer-based information extraction models such as Bidirectional Encoder Representations from Transformers (BERT), Fine-tuned Language Net Text-To-Text Transfer Transformer (Flan-T5), Large Language Model Meta AI (Llama3), and Generative Pre-trained Transformers-4 (GPT-4) using fine-tuning and in-context learning (ICL). RESULTS In event extraction, the fine-tuned BERT and Llama3 models achieved the highest performance at 88.2-88.0 F1, which is comparable to the inter-annotator agreement (IAA) of 88.4 F1. In relation extraction, the fine-tuned BERT, Flan-T5, and Llama3 achieved the highest performance at 61.8-65.3 F1. GPT-4 with ICL achieved the worst performance across both tasks. DISCUSSION The fine-tuned models significantly outperformed GPT-4 in ICL, highlighting the importance of annotated training data and model optimization. Furthermore, the BERT models performed similarly to Llama3. For our task, large language models offer no performance advantage over the smaller BERT models. CONCLUSIONS We introduce CACER, a novel corpus with fine-grained annotations for medical problems, drugs, and their relationships in clinical narratives of oncology notes. State-of-the-art transformer models achieved performance comparable to IAA for several extraction tasks.
Collapse
Affiliation(s)
- Yujuan Velvin Fu
- Department of Biomedical Informatics & Medical Education, University of Washington, Seattle, WA 98195, United States
| | | | - Ahmad Halwani
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Bridget T McInnes
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Fei Xia
- Department of Linguistics, University of Washington, Seattle, WA 98195, United States
| | - Kevin Lybarger
- Department of Information Sciences and Technology, George Mason University, Fairfax, VA 22030, United States
| | - Meliha Yetisgen
- Department of Biomedical Informatics & Medical Education, University of Washington, Seattle, WA 98195, United States
| | - Özlem Uzuner
- Department of Information Sciences and Technology, George Mason University, Fairfax, VA 22030, United States
| |
Collapse
|
3
|
Bryant AK, Zamora‐Resendiz R, Dai X, Morrow D, Lin Y, Jungles KM, Rae JM, Tate A, Pearson AN, Jiang R, Fritsche L, Lawrence TS, Zou W, Schipper M, Ramnath N, Yoo S, Crivelli S, Green MD. Artificial intelligence to unlock real-world evidence in clinical oncology: A primer on recent advances. Cancer Med 2024; 13:e7253. [PMID: 38899720 PMCID: PMC11187737 DOI: 10.1002/cam4.7253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 02/05/2024] [Accepted: 04/28/2024] [Indexed: 06/21/2024] Open
Abstract
PURPOSE Real world evidence is crucial to understanding the diffusion of new oncologic therapies, monitoring cancer outcomes, and detecting unexpected toxicities. In practice, real world evidence is challenging to collect rapidly and comprehensively, often requiring expensive and time-consuming manual case-finding and annotation of clinical text. In this Review, we summarise recent developments in the use of artificial intelligence to collect and analyze real world evidence in oncology. METHODS We performed a narrative review of the major current trends and recent literature in artificial intelligence applications in oncology. RESULTS Artificial intelligence (AI) approaches are increasingly used to efficiently phenotype patients and tumors at large scale. These tools also may provide novel biological insights and improve risk prediction through multimodal integration of radiographic, pathological, and genomic datasets. Custom language processing pipelines and large language models hold great promise for clinical prediction and phenotyping. CONCLUSIONS Despite rapid advances, continued progress in computation, generalizability, interpretability, and reliability as well as prospective validation are needed to integrate AI approaches into routine clinical care and real-time monitoring of novel therapies.
Collapse
Affiliation(s)
- Alex K. Bryant
- Department of Radiation OncologyUniversity of Michigan School of MedicineAnn ArborMichiganUSA
- Department of Radiation Oncology, Veterans Affairs Ann Arbor Healthcare SystemAnn ArborMichiganUSA
| | - Rafael Zamora‐Resendiz
- Applied Mathematics and Computational Research Division, Lawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
| | - Xin Dai
- Computational Science Initiative, Brookhaven National LaboratoryUptonNew YorkUSA
| | - Destinee Morrow
- Applied Mathematics and Computational Research Division, Lawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
| | - Yuewei Lin
- Computational Science Initiative, Brookhaven National LaboratoryUptonNew YorkUSA
| | - Kassidy M. Jungles
- Department of PharmacologyUniversity of Michigan School of MedicineAnn ArborMichiganUSA
| | - James M. Rae
- Department of PharmacologyUniversity of Michigan School of MedicineAnn ArborMichiganUSA
- Department of Internal MedicineUniversity of Michigan School of MedicineAnn ArborMichiganUSA
| | - Akshay Tate
- Department of Radiation OncologyUniversity of Michigan School of MedicineAnn ArborMichiganUSA
| | - Ashley N. Pearson
- Department of Radiation OncologyUniversity of Michigan School of MedicineAnn ArborMichiganUSA
| | - Ralph Jiang
- Department of Radiation OncologyUniversity of Michigan School of MedicineAnn ArborMichiganUSA
- Department of StatisticsUniversity of MichiganAnn ArborMichiganUSA
| | - Lars Fritsche
- Department of StatisticsUniversity of MichiganAnn ArborMichiganUSA
| | - Theodore S. Lawrence
- Department of Radiation OncologyUniversity of Michigan School of MedicineAnn ArborMichiganUSA
| | - Weiping Zou
- Department of StatisticsUniversity of MichiganAnn ArborMichiganUSA
- Center of Excellence for Cancer Immunology and ImmunotherapyUniversity of Michigan Rogel Cancer CenterAnn ArborMichiganUSA
- Department of PathologyUniversity of MichiganAnn ArborMichiganUSA
- Graduate Program in ImmunologyUniversity of MichiganAnn ArborMichiganUSA
| | - Matthew Schipper
- Department of Radiation OncologyUniversity of Michigan School of MedicineAnn ArborMichiganUSA
- Department of PharmacologyUniversity of Michigan School of MedicineAnn ArborMichiganUSA
| | - Nithya Ramnath
- Division of Hematology Oncology, Department of MedicineUniversity of Michigan School of MedicineAnn ArborMichiganUSA
- Division of Hematology Oncology, Department of MedicineVeterans Affairs Ann Arbor Healthcare SystemAnn ArborMichiganUSA
| | - Shinjae Yoo
- Computational Science Initiative, Brookhaven National LaboratoryUptonNew YorkUSA
| | - Silvia Crivelli
- Applied Mathematics and Computational Research Division, Lawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
| | - Michael D. Green
- Department of Radiation OncologyUniversity of Michigan School of MedicineAnn ArborMichiganUSA
- Department of Radiation Oncology, Veterans Affairs Ann Arbor Healthcare SystemAnn ArborMichiganUSA
- Graduate Program in ImmunologyUniversity of MichiganAnn ArborMichiganUSA
- Graduate Program in Cancer BiologyUniversity of MichiganAnn ArborMichiganUSA
- Department of Microbiology and ImmunologyUniversity of Michigan School of MedicineAnn ArborMichiganUSA
| |
Collapse
|
4
|
Sim JA, Huang X, Horan MR, Baker JN, Huang IC. Using natural language processing to analyze unstructured patient-reported outcomes data derived from electronic health records for cancer populations: a systematic review. Expert Rev Pharmacoecon Outcomes Res 2024; 24:467-475. [PMID: 38383308 PMCID: PMC11001514 DOI: 10.1080/14737167.2024.2322664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 02/20/2024] [Indexed: 02/23/2024]
Abstract
INTRODUCTION Patient-reported outcomes (PROs; symptoms, functional status, quality-of-life) expressed in the 'free-text' or 'unstructured' format within clinical notes from electronic health records (EHRs) offer valuable insights beyond biological and clinical data for medical decision-making. However, a comprehensive assessment of utilizing natural language processing (NLP) coupled with machine learning (ML) methods to analyze unstructured PROs and their clinical implementation for individuals affected by cancer remains lacking. AREAS COVERED This study aimed to systematically review published studies that used NLP techniques to extract and analyze PROs in clinical narratives from EHRs for cancer populations. We examined the types of NLP (with and without ML) techniques and platforms for data processing, analysis, and clinical applications. EXPERT OPINION Utilizing NLP methods offers a valuable approach for processing and analyzing unstructured PROs among cancer patients and survivors. These techniques encompass a broad range of applications, such as extracting or recognizing PROs, categorizing, characterizing, or grouping PROs, predicting or stratifying risk for unfavorable clinical results, and evaluating connections between PROs and adverse clinical outcomes. The employment of NLP techniques is advantageous in converting substantial volumes of unstructured PRO data within EHRs into practical clinical utilities for individuals with cancer.
Collapse
Affiliation(s)
- Jin-ah Sim
- Department of Epidemiology and Cancer Control, St. Jude Children’s Research Hospital, Memphis, TN, USA
- Department of AI Convergence, Hallym University, Chuncheon, Republic of Korea
| | - Xiaolei Huang
- Department of Computer Science, University of Memphis, Memphis, Tennessee, United States
| | - Madeline R. Horan
- Department of Epidemiology and Cancer Control, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Justin N. Baker
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - I-Chan Huang
- Department of Epidemiology and Cancer Control, St. Jude Children’s Research Hospital, Memphis, TN, USA
| |
Collapse
|
5
|
Lin H, Ni L, Phuong C, Hong JC. Natural Language Processing for Radiation Oncology: Personalizing Treatment Pathways. Pharmgenomics Pers Med 2024; 17:65-76. [PMID: 38370334 PMCID: PMC10874185 DOI: 10.2147/pgpm.s396971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 01/29/2024] [Indexed: 02/20/2024] Open
Abstract
Natural language processing (NLP), a technology that translates human language into machine-readable data, is revolutionizing numerous sectors, including cancer care. This review outlines the evolution of NLP and its potential for crafting personalized treatment pathways for cancer patients. Leveraging NLP's ability to transform unstructured medical data into structured learnable formats, researchers can tap into the potential of big data for clinical and research applications. Significant advancements in NLP have spurred interest in developing tools that automate information extraction from clinical text, potentially transforming medical research and clinical practices in radiation oncology. Applications discussed include symptom and toxicity monitoring, identification of social determinants of health, improving patient-physician communication, patient education, and predictive modeling. However, several challenges impede the full realization of NLP's benefits, such as privacy and security concerns, biases in NLP models, and the interpretability and generalizability of these models. Overcoming these challenges necessitates a collaborative effort between computer scientists and the radiation oncology community. This paper serves as a comprehensive guide to understanding the intricacies of NLP algorithms, their performance assessment, past research contributions, and the future of NLP in radiation oncology research and clinics.
Collapse
Affiliation(s)
- Hui Lin
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
- UC Berkeley-UCSF Graduate Program in Bioengineering, University of California, Berkeley and San Francisco, San Francisco, CA, USA
| | - Lisa Ni
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
| | - Christina Phuong
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
| | - Julian C Hong
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA
- Joint Program in Computational Precision Health, University of California, Berkeley and San Francisco, Berkeley, CA, USA
| |
Collapse
|
6
|
Pappot H, Björnsson BP, Krause O, Bæksted C, Bidstrup PE, Dalton SO, Johansen C, Knoop A, Vogelius I, Holländer-Mieritz C. Machine learning applied in patient-reported outcome research-exploring symptoms in adjuvant treatment of breast cancer. Breast Cancer 2024; 31:148-153. [PMID: 37940813 DOI: 10.1007/s12282-023-01515-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 10/15/2023] [Indexed: 11/10/2023]
Abstract
BACKGROUND Patient-reported outcome (PRO) data may help us better understand the life of breast cancer patients. We have previously collected PRO data in a national Danish breast cancer study in patients undergoing adjuvant chemotherapy. The aim of the present post-hoc explorative study is to apply Machine Learning (ML) algorithms using permutation importance to explore how specific PRO symptoms influence nonadherence to six cycles of planned adjuvant chemotherapy in breast cancer patients. METHODS We here investigate ePRO-data from the 347 patients. The ePRO presented 42 PROCTCAE questions on 25 symptoms. Patients completed the ePRO before each cycle of chemotherapy. Number of patients with completion of the scheduled six cycles of chemotherapy were registered. Two ML models were applied. One aimed at discovering the individual relative importance of the different questions in the dataset while the second aimed at discovering the relationships between the questions. Permutation importance was used. RESULTS Out of 347 patients 238 patients remained in the final dataset, 15 patients dropped out. Two symptoms: aching joints and numbness/tingling, were the most important for dropout in the final dataset, each with an importance value of about 0.04. Model's average ROC-AUC-score being 0.706. In the second model a low performance score made the results very unreliable. CONCLUSION In conclusion, this explorative data analysis using ML methodologies in an ePRO dataset from a population of women with breast cancer treated with adjuvant chemotherapy unravels that the symptoms aching joints and numbness/tingling could be important for drop out of planned adjuvant chemotherapy.
Collapse
Affiliation(s)
- Helle Pappot
- Department of Oncology, Rigshospitalet Section 5073, University Hospital of Copenhagen, Blegdamsvej 9, 2100, Copenhagen, Denmark.
- Institute of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark.
| | - Benóný P Björnsson
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Oswin Krause
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | | | - Pernille E Bidstrup
- Danish Cancer Society Research Center, Copenhagen, Denmark
- Institute of Psychology, University of Copenhagen, Copenhagen, Denmark
| | - Susanne O Dalton
- Institute of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
- Danish Cancer Society Research Center, Copenhagen, Denmark
| | - Christoffer Johansen
- Department of Oncology, Rigshospitalet Section 5073, University Hospital of Copenhagen, Blegdamsvej 9, 2100, Copenhagen, Denmark
- Institute of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Ann Knoop
- Department of Oncology, Rigshospitalet Section 5073, University Hospital of Copenhagen, Blegdamsvej 9, 2100, Copenhagen, Denmark
| | - Ivan Vogelius
- Department of Oncology, Rigshospitalet Section 5073, University Hospital of Copenhagen, Blegdamsvej 9, 2100, Copenhagen, Denmark
- Institute of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Cecilie Holländer-Mieritz
- Department of Oncology, Rigshospitalet Section 5073, University Hospital of Copenhagen, Blegdamsvej 9, 2100, Copenhagen, Denmark
| |
Collapse
|
7
|
Chen S, Guevara M, Ramirez N, Murray A, Warner JL, Aerts HJWL, Miller TA, Savova GK, Mak RH, Bitterman DS. Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy. JCO Clin Cancer Inform 2023; 7:e2300048. [PMID: 37506330 DOI: 10.1200/cci.23.00048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/05/2023] [Accepted: 05/26/2023] [Indexed: 07/30/2023] Open
Abstract
PURPOSE Radiotherapy (RT) toxicities can impair survival and quality of life, yet remain understudied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. METHODS Our corpus consisted of a gold-labeled data set of 1,524 clinical notes from 124 patients with lung cancer treated with RT, manually annotated for Common Terminology Criteria for Adverse Events (CTCAE) v5.0 esophagitis grade, and a silver-labeled data set of 2,420 notes from 1,832 patients from whom toxicity grades had been collected as structured data during clinical care. We fine-tuned statistical and pretrained Bidirectional Encoder Representations from Transformers-based models for three esophagitis classification tasks: task 1, no esophagitis versus grade 1-3; task 2, grade ≤1 versus >1; and task 3, no esophagitis versus grade 1 versus grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT. RESULTS Fine-tuning of PubMedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for tasks 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by ≥2% for all tasks. Silver-labeled data improved the macro-F1 by ≥3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for tasks 1, 2, and 3, respectively, without additional fine-tuning. CONCLUSION To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinical notes. This provides proof of concept for NLP-based automated detailed toxicity monitoring in expanded domains.
Collapse
Affiliation(s)
- Shan Chen
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Marco Guevara
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Nicolas Ramirez
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Arpi Murray
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Jeremy L Warner
- Population Sciences Program, Legorreta Cancer Center, Brown University, Providence, RI
- Lifespan Cancer Institute, Providence, RI
| | - Hugo J W L Aerts
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
- Radiology and Nuclear Medicine, GROW & CARIM, Maastricht University, Maastricht, the Netherlands
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
| | - Raymond H Mak
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Danielle S Bitterman
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| |
Collapse
|
8
|
Scoarta S, Küçükosmanoglu A, Bindt F, Pouwer M, Westerman BA. Review: A Roadmap to Use Nonstructured Data to Discover Multitarget Cancer Therapies. JCO Clin Cancer Inform 2023; 7:e2200096. [PMID: 37116097 PMCID: PMC10281332 DOI: 10.1200/cci.22.00096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 12/29/2022] [Accepted: 03/01/2023] [Indexed: 04/30/2023] Open
Abstract
Therapy resistance to single agents has led to the realization that combination therapies could become the cornerstone of cancer treatment. To operationalize the selection of effective and safe multitarget therapies, we propose to integrate chemical and preclinical therapeutic information with clinical efficacy and toxicity data, allowing a new perspective on the drug target landscape. To assess the feasibility of this approach, we evaluated the publicly available chemical, preclinical, and clinical therapeutic data, and we addressed some potential limitations while integrating the data. First, by mapping available structured data from the main biomedical resources, we noticed that there is only a 1.7% overlap between drugs in chemical, preclinical, or clinical databases. Especially, the limited amount of structured data in the clinical domain hinders linking drugs to clinical aspects such as efficacy and side effects. Second, to overcome the abovementioned knowledge gap between the chemical, preclinical, and clinical domain, we suggest information extraction from scientific literature and other unstructured resources through natural language processing models, where BioBERT and PubMedBERT are the current state-of-the-art approaches. Finally, we propose that knowledge graphs can be used to link structured data, scientific literature, and electronic health records, to come to meaningful interpretations. Together, we expect this richer knowledge will lower barriers toward clinical application of personalized combination therapies with high efficacy and limited adverse events.
Collapse
Affiliation(s)
- Silvia Scoarta
- Department of Neurosurgery, Brain Tumor Center Amsterdam, Amsterdam University Medical Center, Cancer Center Amsterdam, Amsterdam, the Netherlands
- The WINDOW Consortium, a collaboration between Amsterdam UMC, University of Birmingham, Birmingham, UK, and IOTA Pharmaceuticals, St Johns Innovation Centre, Cambridge, UK
| | - Asli Küçükosmanoglu
- Department of Neurosurgery, Brain Tumor Center Amsterdam, Amsterdam University Medical Center, Cancer Center Amsterdam, Amsterdam, the Netherlands
- The Toxicity-Atlas Consortium, a collaboration between Amsterdam UMC and Medstone, supported by the IKNL (Integrative Cancer-Center the Netherlands), Eindhoven, the Netherlands
| | - Felix Bindt
- Department of Pharmaceutical Sciences, Faculty of Science, Utrecht University, Utrecht, the Netherlands
| | - Marianne Pouwer
- The WINDOW Consortium, a collaboration between Amsterdam UMC, University of Birmingham, Birmingham, UK, and IOTA Pharmaceuticals, St Johns Innovation Centre, Cambridge, UK
- Medstone Science, Almere, the Netherlands
| | - Bart A. Westerman
- Department of Neurosurgery, Brain Tumor Center Amsterdam, Amsterdam University Medical Center, Cancer Center Amsterdam, Amsterdam, the Netherlands
- The WINDOW Consortium, a collaboration between Amsterdam UMC, University of Birmingham, Birmingham, UK, and IOTA Pharmaceuticals, St Johns Innovation Centre, Cambridge, UK
| |
Collapse
|
9
|
Petch J, Kempainnen J, Pettengell C, Aviv S, Butler B, Pond G, Saha A, Bogach J, Allard-Coutu A, Sztur P, Ranisau J, Levine M. Developing a Data and Analytics Platform to Enable a Breast Cancer Learning Health System at a Regional Cancer Center. JCO Clin Cancer Inform 2023; 7:e2200182. [PMID: 37001040 PMCID: PMC10281330 DOI: 10.1200/cci.22.00182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 02/10/2023] [Indexed: 04/03/2023] Open
Abstract
PURPOSE This study documents the creation of automated, longitudinal, and prospective data and analytics platform for breast cancer at a regional cancer center. This platform combines principles of data warehousing with natural language processing (NLP) to provide the integrated, timely, meaningful, high-quality, and actionable data required to establish a learning health system. METHODS Data from six hospital information systems and one external data source were integrated on a nightly basis by automated extract/transform/load jobs. Free-text clinical documentation was processed using a commercial NLP engine. RESULTS The platform contains 141 data elements of 7,019 patients with newly diagnosed breast cancer who received care at our regional cancer center from January 1, 2014, to June 3, 2022. Daily updating of the database takes an average of 56 minutes. Evaluation of the tuning of NLP jobs found overall high performance, with an F1 of 1.0 for 19 variables, with a further 16 variables with an F1 of > 0.95. CONCLUSION This study describes how data warehousing combined with NLP can be used to create a prospective data and analytics platform to enable a learning health system. Although upfront time investment required to create the platform was considerable, now that it has been developed, daily data processing is completed automatically in less than an hour.
Collapse
Affiliation(s)
- Jeremy Petch
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
- Institute for Health Policy Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
- Division of Cardiology, Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Canada
- Population Health Research Institute, Hamilton Health Sciences, Hamilton, Canada
| | - Joel Kempainnen
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
| | | | | | | | - Greg Pond
- Escarpment Cancer Research Institute, Hamilton Health Sciences, Hamilton, Canada
| | - Ashirbani Saha
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
- Escarpment Cancer Research Institute, Hamilton Health Sciences, Hamilton, Canada
- Department of Oncology, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - Jessica Bogach
- Department of Surgery, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | | | - Peter Sztur
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
| | - Jonathan Ranisau
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
| | - Mark Levine
- Hamilton Health Sciences, Hamilton, Canada
- Escarpment Cancer Research Institute, Hamilton Health Sciences, Hamilton, Canada
| |
Collapse
|
10
|
Durieux BN, Zverev SR, Tarbi EC, Kwok A, Sciacca K, Pollak KI, Tulsky JA, Lindvall C. Development of a keyword library for capturing PRO-CTCAE-focused "symptom talk" in oncology conversations. JAMIA Open 2023; 6:ooad009. [PMID: 36789287 PMCID: PMC9912707 DOI: 10.1093/jamiaopen/ooad009] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 01/18/2023] [Accepted: 02/02/2023] [Indexed: 02/12/2023] Open
Abstract
Objectives As computational methods for detecting symptoms can help us better attend to patient suffering, the objectives of this study were to develop and evaluate the performance of a natural language processing keyword library for detecting symptom talk, and to describe symptom communication within our dataset to generate insights for future model building. Materials and Methods This was a secondary analysis of 121 transcribed outpatient oncology conversations from the Communication in Oncologist-Patient Encounters trial. Through an iterative process of identifying symptom expressions via inductive and deductive techniques, we generated a library of keywords relevant to the Patient-Reported Outcome version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) framework from 90 conversations, and tested the library on 31 additional transcripts. To contextualize symptom expressions and the nature of misclassifications, we qualitatively analyzed 450 mislabeled and properly labeled symptom-positive turns. Results The final library, comprising 1320 terms, identified symptom talk among conversation turns with an F1 of 0.82 against a PRO-CTCAE-focused gold standard, and an F1 of 0.61 against a broad gold standard. Qualitative observations suggest that physical symptoms are more easily detected than psychological symptoms (eg, anxiety), and ambiguity persists throughout symptom communication. Discussion This rudimentary keyword library captures most PRO-CTCAE-focused symptom talk, but the ambiguity of symptom speech limits the utility of rule-based methods alone, and limits to generalizability must be considered. Conclusion Our findings highlight opportunities for more advanced computational models to detect symptom expressions from transcribed clinical conversations. Future improvements in speech-to-text could enable real-time detection at scale.
Collapse
Affiliation(s)
- Brigitte N Durieux
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| | - Samuel R Zverev
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, USA,NYU School of Medicine, New York University, New York, New York, USA
| | - Elise C Tarbi
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, USA,Department of Nursing, University of Vermont, Burlington, Vermont, USA
| | - Anne Kwok
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| | - Kate Sciacca
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, USA,Department of Palliative Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, USA
| | - Kathryn I Pollak
- Department of Population Health Sciences, Duke University School of Medicine, Duke University, Durham, North Carolina, USA,Cancer Prevention and Control Program, Duke Cancer Institute, Duke University, Durham, North Carolina, USA
| | - James A Tulsky
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, USA,Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, USA
| | - Charlotta Lindvall
- Corresponding Author: Charlotta Lindvall, MD, PhD, Department of Psychosocial Oncology & Palliative Care, Dana-Farber Cancer Institute, 450 Brookline Ave, LW670, Boston, MA 02215, USA;
| |
Collapse
|