AutoCriteria: a generalizable
clinical trial eligibility criteria extraction system powered by large language models.
J Am Med Inform Assoc 2024;
31:375-385. [PMID:
37952206 PMCID:
PMC10797270 DOI:
10.1093/jamia/ocad218]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 09/19/2023] [Accepted: 11/08/2023] [Indexed: 11/14/2023] Open
Abstract
OBJECTIVES
We aim to build a generalizable information extraction system leveraging large language models to extract granular eligibility criteria information for diverse diseases from free text clinical trial protocol documents. We investigate the model's capability to extract criteria entities along with contextual attributes including values, temporality, and modifiers and present the strengths and limitations of this system.
MATERIALS AND METHODS
The clinical trial data were acquired from https://ClinicalTrials.gov/. We developed a system, AutoCriteria, which comprises the following modules: preprocessing, knowledge ingestion, prompt modeling based on GPT, postprocessing, and interim evaluation. The final system evaluation was performed, both quantitatively and qualitatively, on 180 manually annotated trials encompassing 9 diseases.
RESULTS
AutoCriteria achieves an overall F1 score of 89.42 across all 9 diseases in extracting the criteria entities, with the highest being 95.44 for nonalcoholic steatohepatitis and the lowest of 84.10 for breast cancer. Its overall accuracy is 78.95% in identifying all contextual information across all diseases. Our thematic analysis indicated accurate logic interpretation of criteria as one of the strengths and overlooking/neglecting the main criteria as one of the weaknesses of AutoCriteria.
DISCUSSION
AutoCriteria demonstrates strong potential to extract granular eligibility criteria information from trial documents without requiring manual annotations. The prompts developed for AutoCriteria generalize well across different disease areas. Our evaluation suggests that the system handles complex scenarios including multiple arm conditions and logics.
CONCLUSION
AutoCriteria currently encompasses a diverse range of diseases and holds potential to extend to more in the future. This signifies a generalizable and scalable solution, poised to address the complexities of clinical trial application in real-world settings.
Collapse