1
|
Reason T, Benbow E, Langham J, Gimblett A, Klijn SL, Malcolm B. Artificial Intelligence to Automate Network Meta-Analyses: Four Case Studies to Evaluate the Potential Application of Large Language Models. Pharmacoecon Open 2024; 8:205-220. [PMID: 38340277 PMCID: PMC10884375 DOI: 10.1007/s41669-024-00476-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/01/2024] [Indexed: 02/12/2024]
Abstract
BACKGROUND The emergence of artificial intelligence, capable of human-level performance on some tasks, presents an opportunity to revolutionise development of systematic reviews and network meta-analyses (NMAs). In this pilot study, we aim to assess use of a large-language model (LLM, Generative Pre-trained Transformer 4 [GPT-4]) to automatically extract data from publications, write an R script to conduct an NMA and interpret the results. METHODS We considered four case studies involving binary and time-to-event outcomes in two disease areas, for which an NMA had previously been conducted manually. For each case study, a Python script was developed that communicated with the LLM via application programming interface (API) calls. The LLM was prompted to extract relevant data from publications, to create an R script to be used to run the NMA and then to produce a small report describing the analysis. RESULTS The LLM had a > 99% success rate of accurately extracting data across 20 runs for each case study and could generate R scripts that could be run end-to-end without human input. It also produced good quality reports describing the disease area, analysis conducted, results obtained and a correct interpretation of the results. CONCLUSIONS This study provides a promising indication of the feasibility of using current generation LLMs to automate data extraction, code generation and NMA result interpretation, which could result in significant time savings and reduce human error. This is provided that routine technical checks are performed, as recommend for human-conducted analyses. Whilst not currently 100% consistent, LLMs are likely to improve with time.
Collapse
Affiliation(s)
- Tim Reason
- Estima Scientific, Mediaworks, 191 Wood Lane, London, W12 7FP, UK.
| | - Emma Benbow
- Estima Scientific, Mediaworks, 191 Wood Lane, London, W12 7FP, UK
| | - Julia Langham
- Estima Scientific, Mediaworks, 191 Wood Lane, London, W12 7FP, UK
| | - Andy Gimblett
- Estima Scientific, Mediaworks, 191 Wood Lane, London, W12 7FP, UK
| | | | | |
Collapse
|
2
|
Reason T, Rawlinson W, Langham J, Gimblett A, Malcolm B, Klijn S. Artificial Intelligence to Automate Health Economic Modelling: A Case Study to Evaluate the Potential Application of Large Language Models. Pharmacoecon Open 2024; 8:191-203. [PMID: 38340276 PMCID: PMC10884386 DOI: 10.1007/s41669-024-00477-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/01/2024] [Indexed: 02/12/2024]
Abstract
BACKGROUND Current generation large language models (LLMs) such as Generative Pre-Trained Transformer 4 (GPT-4) have achieved human-level performance on many tasks including the generation of computer code based on textual input. This study aimed to assess whether GPT-4 could be used to automatically programme two published health economic analyses. METHODS The two analyses were partitioned survival models evaluating interventions in non-small cell lung cancer (NSCLC) and renal cell carcinoma (RCC). We developed prompts which instructed GPT-4 to programme the NSCLC and RCC models in R, and which provided descriptions of each model's methods, assumptions and parameter values. The results of the generated scripts were compared to the published values from the original, human-programmed models. The models were replicated 15 times to capture variability in GPT-4's output. RESULTS GPT-4 fully replicated the NSCLC model with high accuracy: 100% (15/15) of the artificial intelligence (AI)-generated NSCLC models were error-free or contained a single minor error, and 93% (14/15) were completely error-free. GPT-4 closely replicated the RCC model, although human intervention was required to simplify an element of the model design (one of the model's fifteen input calculations) because it used too many sequential steps to be implemented in a single prompt. With this simplification, 87% (13/15) of the AI-generated RCC models were error-free or contained a single minor error, and 60% (9/15) were completely error-free. Error-free model scripts replicated the published incremental cost-effectiveness ratios to within 1%. CONCLUSION This study provides a promising indication that GPT-4 can have practical applications in the automation of health economic model construction. Potential benefits include accelerated model development timelines and reduced costs of development. Further research is necessary to explore the generalisability of LLM-based automation across a larger sample of models.
Collapse
Affiliation(s)
- Tim Reason
- Estima Scientific, Mediaworks, 191 Wood Ln, London, W12 7FP, UK.
| | | | - Julia Langham
- Estima Scientific, Mediaworks, 191 Wood Ln, London, W12 7FP, UK
| | - Andy Gimblett
- Estima Scientific, Mediaworks, 191 Wood Ln, London, W12 7FP, UK
| | | | - Sven Klijn
- Bristol Myers Squibb, Princeton, NJ, USA
| |
Collapse
|
3
|
Reason T, McCrea C, Hettle R, Ghate S, Poehlein CH, Olmos D. Indirect treatment comparison of the efficacy of olaparib 300 mg tablets BID and cabazitaxel 25 mg/m 2 every 3 weeks plus daily prednisolone and granulocyte colony-stimulating factor in the treatment of patients with metastatic castration-resistant prostate cancer (mCRPC). J Clin Oncol 2021. [DOI: 10.1200/jco.2021.39.15_suppl.5051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
5051 Background: In PROfound, olaparib demonstrated improved radiological PFS (rPFS) and overall survival (OS) versus new hormonal agent (NHA) in patients with homologous recombination repair mutated (HRRm) mCRPC that had progressed on prior NHA. This efficacy was observed across prespecified subgroups including patients treated with prior taxane therapy and for whom intravenous cabazitaxel is an alternative treatment option. The relative efficacy of olaparib versus cabazitaxel has not been assessed in head-to-head studies. An indirect treatment comparison (ITC) was performed to simulate the comparative efficacy of olaparib and cabazitaxel in patients with HRRm mCRPC after prior taxane and NHA. Methods: Fixed-effects frequentist ITCs were conducted using efficacy data from the prior taxane subgroup of PROfound (NCT02987543) and published data from the Phase IV CARD study of cabazitaxel versus NHA after prior NHA and taxane treatment (NCT02485691). Baseline variables feasible for comparison across studies were assessed for effect modification. Efficacy analyses were performed on the hazard ratios (HR) of rPFS by independent central review and OS. The OS analysis was performed using the final PROfound OS results, which included switching from NHA to olaparib after progression, and using results that were adjusted for switching. In the absence of biomarker subgroup data, the efficacy results of the overall population in CARD were assumed generalizable to the HRRm biomarker population of PROfound, such that mutation status is not a modifier of relative treatment effect for cabazitaxel versus NHA. Results were presented for the comparison of olaparib with cabazitaxel in the BRCA1-/BRCA2-mutated (BRCAm) and BRCAm/ATM populations. Results: The ITC HR for rPFS was 0.36 (95% confidence interval 0.20–0.64) in BRCAm and 0.51 (0.31–0.84) for the BRCAm/ATM population. Without adjustment for switching in PROfound, the ITC HRs for OS in the BRCAm population and BRCAm/ATM population were 0.99 (0.55–1.78) and 0.88 (0.52–1.47), respectively; after switch adjustment, the OS HRs were 0.47 (0.12–1.79) and 0.44 (0.17–1.10), respectively. Conclusions: The ITC results suggest that olaparib is associated with significantly improved rPFS versus cabazitaxel in the treatment of BRCAm and BRCAm/ATM patients who have progressed on taxane and NHA therapy. After removing the effect of switching from NHA to olaparib in PROfound, olaparib appears associated with a non-significant OS improvement versus cabazitaxel in both populations. The results require confirmation in comparative studies. Analysis limitations include uncertainty over the efficacy of cabazitaxel versus NHA in HRRm mCRPC patients, and heterogeneity in prior taxane and NHA therapy. Clinical trial information: NCT02987543.
Collapse
Affiliation(s)
- Tim Reason
- Estima Scientific, London, United Kingdom
| | | | | | | | | | - David Olmos
- Centro Nacional de Investigaciones Oncológicas, Madrid, Spain
| |
Collapse
|
4
|
Reason T, Gill K, Longshaw C, McCool R, Wilson K, Lopes S. 1578. Treatments for complicated urinary tract infections (cUTI) caused by multidrug resistant (MDR) Gram-negative (GN) pathogens- a systematic review and network meta-analysis (NMA). Open Forum Infect Dis 2020. [PMCID: PMC7777770 DOI: 10.1093/ofid/ofaa439.1758] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Background Antimicrobial resistance is a major and growing threat to global public health. Cefiderocol (CFDC) is a new siderophore-cephalosporin with a wide activity spectrum covering all aerobic GN pathogens including all WHO critical priority pathogens, that was recently approved by FDA for the treatment of GN cUTI in susceptible organisms. We aim to understand the relative efficacy and safety of current treatment options for cUTI caused by MDR GN pathogens. Methods We conducted a systematic review to identify all relevant trials that investigated the efficacy and safety of antimicrobial regimens, for the treatment of GN pathogens in cUTI. Outcomes of interest included clinical cure and microbiological eradication (ME) at time of cure (TOC) and sustained follow up (SFU), and safety. Evidence networks were constructed using data for outcomes of interest and analyses were conducted in a frequentist framework using NMA methods outlined by the NICE decision support unit using the netmeta package in R. Results A total of 5 studies, 6 interventions and 2,349 randomised patients were included in the final analysis. Interventions included CFDC, imipenem-cilastatin (IPM-CIL), ceftazidime-avibactam (CAZ/AVI), doripenem (DOR), levofloxacin and ceftolozane-tazobactam (CEF/TAZ). Trials included predominantly Enterobacterales, and Pseudomonas aeruginosa and very few Acinetobacter baumannii. The patient population presented some clinical differences across trials, which were not adjusted for the NMA. Overall, there were numerical differences (especially in endpoints at SFU favouring CFDC), but all treatments showed similar efficacy and safety, with exception of higher ME rate at TOC for CFDC vs IPM, Table 1, also observed at SFU, consistent with the data from the individual clinical trial. Table 1- Results for microbiological eradication Table 1- Results for microbiological eradication ![]()
Conclusion This NMA, showed superiority of CFDC vs IPM-CIL in ME at TOC and SFU and similar efficacy and safety vs all other comparators, with numeric differences favouring CFDC for outcomes at SFU. These traditional methodologies for NMA, are only valid within a similar pathogens pool and population across the trials, and may not reflect the full value of breadth of coverage that new therapeutic options bring for the treatment of MDR GN pathogens. Disclosures Tim Reason, PhD, Shionogi (Consultant) Karan Gill, MSc, Shionogi BV (Employee) Christopher Longshaw, PhD, Shionogi B.V. (Employee) Rachael McCool, PhD, York Health Economics Consortium (Employee, YHEC was commissioned by Shionogi to conduct the systematic review) Katy Wilson, PhD, York Health Economics Consortium (Employee, Shionogi commissioned YHEC to conduct the systematic review) Sara Lopes, PharmD, Shionogi BV (Employee)
Collapse
Affiliation(s)
- Tim Reason
- Estima Scientific, South Suislip, England, United Kingdom
| | - Karan Gill
- Shionogi BV, London, England, United Kingdom
| | | | - Rachael McCool
- York Health Economics Consortium, York, England, United Kingdom
| | - Katy Wilson
- York Health Economics Consortium, York, England, United Kingdom
| | - Sara Lopes
- Shionogi BV, London, England, United Kingdom
| |
Collapse
|
5
|
McNeill AM, Davies G, Kruger E, Kowal S, Reason T, Ejzykowicz F, Hannachi H, Cater N, McLeod E. Ertugliflozin Compared to Other Anti-hyperglycemic Agents as Monotherapy and Add-on Therapy in Type 2 Diabetes: A Systematic Literature Review and Network Meta-Analysis. Diabetes Ther 2019; 10:473-491. [PMID: 30689140 PMCID: PMC6437246 DOI: 10.1007/s13300-019-0566-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Indexed: 12/17/2022] Open
Abstract
INTRODUCTION Ertugliflozin is a new sodium-glucose co-transporter-2 inhibitor (SGLT2i) for the treatment of type 2 diabetes mellitus. As there are no head-to-head trials comparing the efficacy of SGLT2is, the primary objective of this analysis was to indirectly compare ertugliflozin to other SGLT2i in patient populations with inadequately controlled glycated hemoglobin (HbA1c > 7.0%) and previously treated with either diet/exercise, metformin alone or metformin plus a dipeptidyl peptidase-4 inhibitor (DPP4i). METHODS A systematic literature review (SLR) identified randomized controlled trials (RCTs) reporting outcomes at 24-26 weeks of treatment. Comparators to ertugliflozin were the SGLT2is canagliflozin, dapagliflozin and empagliflozin, with non-SGLT2i comparators also evaluated third-line [insulin and glucagon-like peptide-1 receptor agonists (GLP-1 RAs)]. Outcomes were change from baseline in HbA1c, weight and systolic blood pressure (SBP) as well as HbA1c < 7% and key safety events. Bayesian network meta-analysis was used to synthesize evidence. Results are presented as the median of the mean difference (MD) or as odds ratios with 95% credible intervals (CrI). RESULTS In patients uncontrolled on diet/exercise, the efficacy of ertugliflozin 5 mg monotherapy was not significantly different from that of other low-dose SGLT2is in terms of HbA1c reduction, while ertugliflozin 15 mg was more effective than dapagliflozin 10 mg (MD - 0.36%, CrI - 0.65, - 0.08) and empagliflozin 25 mg (MD - 0.31%, CrI - 0.58, - 0.04). As add-on therapy to metformin, ertugliflozin 5 mg was more effective in lowering HbA1c than dapagliflozin 5 mg (MD - 0.22%, CrI - 0.42, - 0.02), and ertugliflozin 15 mg was more effective than dapagliflozin 10 mg (MD - 0.26%, CrI - 0.46, - 0.06) and empagliflozin 25 mg (MD - 0.23%, CrI - 0.44, - 0.03). Among patients uncontrolled on combination therapy metformin plus a DPP4i, no relevant RCTs with insulin were identified from the SLR. One study with a GLP-1 RA was included in a sensitivity analysis due to limited data. There were no differences between ertugliflozin 5 or 15 mg and other SGLT2is, with the exception of dapagliflozin 10 mg, which was significantly less effective when added to sitagliptin and metformin. Overall, there were no other significant differences for remaining efficacy and safety outcomes except for a lower SBP for canagliflozin 300 mg compared to ertugliflozin 15 mg in the diet/exercise population. CONCLUSIONS Indirect comparisons for HbA1c reduction found that ertugliflozin 5 mg was more effective than dapagliflozin 5 mg when added to metformin monotherapy, whereas ertugliflozin 15 mg was more effective than dapagliflozin 10 mg and empagliflozin 25 mg when added to diet/exercise and to metformin monotherapy. The HbA1c reduction associated with ertugliflozin was no different than that associated with canagliflozin across all populations. FUNDING Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc., Kenilworth, NJ, USA, and Pfizer Inc., New York, NY, USA.
Collapse
|
6
|
Muston D, Korobelnik JF, Reason T, Hawkins N, Chatzitheofilou I, Ryan F, Kaiser PK. An efficacy comparison of anti-vascular growth factor agents and laser photocoagulation in diabetic macular edema: a network meta-analysis incorporating individual patient-level data. BMC Ophthalmol 2018; 18:340. [PMID: 30591022 PMCID: PMC6307247 DOI: 10.1186/s12886-018-1006-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 12/10/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND This was an updated network meta-analysis (NMA) of anti-vascular endothelial growth factor (VEGF) agents and laser photocoagulation in patients with diabetic macular edema (DME). Unlike previous NMA that used meta-regression to account for potential confounding by systematic variation in treatment effect modifiers across studies, this update incorporated individual patient-level data (IPD) regression to provide more robust adjustment. METHODS An updated review was conducted to identify randomised controlled trials for inclusion in a Bayesian NMA. The network included intravitreal aflibercept (IVT-AFL) 2 mg bimonthly (2q8) after 5 initial doses, ranibizumab 0.5 mg as-needed (PRN), ranibizumab 0.5 mg treat-and-extend (T&E), and laser photocoagulation. Outcomes included in the analysis were change in best-corrected visual acuity (BCVA), measured using an Early Treatment Diabetic Retinopathy Study (ETDRS) chart, and patients with ≥10 and ≥ 15 ETDRS letter gains/losses at 12 months. Analyses were performed using networks restricted to IPD-only and IPD and aggregate data with (i) no covariable adjustment, (ii) covariable adjustment for baseline BVCA assuming common interaction effects (against reference treatment), and (iii) covariable adjustments specific to each treatment comparison (restricted to IPD-only network). RESULTS Thirteen trials were included in the analysis. IVT-AFL 2q8 was superior to laser in all analyses. IVT-AFL 2q8 showed strong evidence of superiority (95% credible interval [CrI] did not cross null) versus ranibizumab 0.5 mg PRN for mean change in BCVA (mean difference 5.20, 95% CrI 1.90-8.52 ETDRS letters), ≥15 ETDRS letter gain (odds ratio [OR] 2.30, 95% CrI 1.12-4.20), and ≥10 ETDRS letter loss (OR 0.25, 95% CrI 0.05-0.74) (IPD and aggregate random-effects model with baseline BCVA adjustment). IVT-AFL 2q8 was not superior to ranibizumab 0.5 mg T&E for mean change in BCVA (mean difference 5.15, 95% CrI -0.26-10.61 ETDRS letters) (IPD and aggregate random-effects model). CONCLUSIONS This NMA, which incorporated IPD to improve analytic robustness, showed evidence of superiority of IVT-AFL 2q8 to laser and ranibizumab 0.5 mg PRN. These results were irrespective of adjustment for baseline BCVA.
Collapse
Affiliation(s)
| | - Jean-Francois Korobelnik
- Service d’ophtalmologie CHU, Bordeaux, France
- University of Bordeaux, Inserm, Bordeaux Population Health Research Center, team LEHA, Bordeaux, France
| | | | | | | | | | | |
Collapse
|
7
|
Schlueter M, Gonzalez-Rojas N, Baldwin M, Groenke L, Voss F, Reason T. Comparative efficacy of fixed-dose combinations of long-acting muscarinic antagonists and long-acting β2-agonists: a systematic review and network meta-analysis. Ther Adv Respir Dis 2016; 10:89-104. [PMID: 26746383 PMCID: PMC5933564 DOI: 10.1177/1753465815624612] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND A number of long-acting muscarinic antagonist (LAMA)/long-acting β2-agonist (LABA) fixed-dose combinations (FDCs) for treatment of moderate-to-very severe chronic obstructive pulmonary disease (COPD) have recently become available, but none have been directly compared in head-to-head randomized controlled trials (RCTs). The purpose of this study was to assess the relative clinical benefit of all currently available LAMA/LABA FDCs using a Bayesian network meta-analysis (NMA). METHODS A systematic literature review identified RCTs investigating the efficacy, safety and quality of life associated with licensed LAMA/LABA FDCs for the treatment of moderate-to-very severe COPD. RCTs were screened for inclusion in the NMA using prespecified eligibility criteria. Data were extracted for outcomes of interest, including change in trough forced expiratory volume in 1 second (tFEV1) from baseline, St. George Respiratory Questionnaire (SGRQ) percentage of responders, Transition Dyspnea Index (TDI) percentage of responders, change in SGRQ score from baseline, change in TDI focal score from baseline, moderate-to-severe exacerbations, all-cause discontinuation, and discontinuation due to adverse events. RESULTS Following screening, a total of 27 trials from 26 publications with 30,361 subjects were eligible for inclusion in the NMA. Nonsignificant results were seen in most analyses comparing efficacy, exacerbations and discontinuation rates of included LAMA/LABA FDCs (i.e. aclidinium/formoterol 400/12 µg, glycopyrronium/indacaterol 110/50 µg, tiotropium + olodaterol 5/5 µg, umeclidinium/vilanterol 62.5/25 µg). Meta-regression controlling for post-bronchodilator percentage of tFEV1 predicted at baseline as well as meta-regression adjusting for concomitant use of inhaled corticosteroids at baseline was performed to assess the magnitude of effect modification and produced similar results as observed in the base case analysis. CONCLUSION All LAMA/LABA FDCs were found to have similar efficacy and safety. Definitive assessment of the relative efficacy of different treatments can only be performed through direct comparison in head-to-head RCTs. In the absence of such data, this indirect comparison may be of value in clinical and health economic decision-making.
Collapse
Affiliation(s)
| | | | | | - Lars Groenke
- Boehringer Ingelheim GmbH, Ingelheim am Rhein, Germany
| | - Florian Voss
- Boehringer Ingelheim GmbH, Ingelheim am Rhein, Germany
| | | |
Collapse
|
8
|
Reason T, Dias S, Welton N. Dose-Response Network Meta-Analysis To Address Dose Heterogeneity In A Cost-Effectiveness Analysis In Acute Migraine. Value Health 2014; 17:A563. [PMID: 27201865 DOI: 10.1016/j.jval.2014.08.1864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Affiliation(s)
| | - S Dias
- University of Bristol, Bristol, UK
| | - N Welton
- University of Bristol, Bristol, UK
| |
Collapse
|
9
|
Reason T, Dias S, Welton N. Multi-Level Network Meta-Analysis To Account for Dose-Response and Class Effects. Value Health 2014; 17:A578. [PMID: 27201947 DOI: 10.1016/j.jval.2014.08.1950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Affiliation(s)
| | - S Dias
- University of Bristol, Bristol, UK
| | - N Welton
- University of Bristol, Bristol, UK
| |
Collapse
|
10
|
Affiliation(s)
- Serena Carville
- National Clinical Guideline Centre, Royal College of Physicians, London NW1 4LE, UK
| | | | | | | |
Collapse
|