Gao M, Varshney A, Chen S, Goddla V, Gallifant J, Doyle P, Novack C, Dillon-Martin M, Perkins T, Correia X, Duhaime E, Isenstein H, Sharon E, Lehmann LS, Kozono D, Anthony B, Dligach D, Bitterman DS. The use of large language models to enhance cancer clinical trial educational materials.
JNCI Cancer Spectr 2025;
9:pkaf021. [PMID:
39921887 DOI:
10.1093/jncics/pkaf021]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 01/21/2025] [Accepted: 01/30/2025] [Indexed: 02/10/2025] Open
Abstract
BACKGROUND
Adequate patient awareness and understanding of cancer clinical trials is essential for trial recruitment, informed decision making, and protocol adherence. Although large language models (LLMs) have shown promise for patient education, their role in enhancing patient awareness of clinical trials remains unexplored. This study explored the performance and risks of LLMs in generating trial-specific educational content for potential participants.
METHODS
Generative Pretrained Transformer 4 (GPT4) was prompted to generate short clinical trial summaries and multiple-choice question-answer pairs from informed consent forms from ClinicalTrials.gov. Zero-shot learning was used for summaries, using a direct summarization, sequential extraction, and summarization approach. One-shot learning was used for question-answer pairs development. We evaluated performance through patient surveys of summary effectiveness and crowdsourced annotation of question-answer pair accuracy, using held-out cancer trial informed consent forms not used in prompt development.
RESULTS
For summaries, both prompting approaches achieved comparable results for readability and core content. Patients found summaries to be understandable and to improve clinical trial comprehension and interest in learning more about trials. The generated multiple-choice questions achieved high accuracy and agreement with crowdsourced annotators. For both summaries and multiple-choice questions, GPT4 was most likely to include inaccurate information when prompted to provide information that was not adequately described in the informed consent forms.
CONCLUSIONS
LLMs such as GPT4 show promise in generating patient-friendly educational content for clinical trials with minimal trial-specific engineering. The findings serve as a proof of concept for the role of LLMs in improving patient education and engagement in clinical trials, as well as the need for ongoing human oversight.
Collapse