Predicting long-term effectiveness with a model-averaging approach

Context

Randomised controlled trials (RCTs) of new medicines in oncology and other disease areas are often undertaken over relatively short time periods (for example, 2 years). These trials are primarily designed to meet the regulatory requirements for marketing authorisation, with primary endpoints based on physiological response to the interventions. Health technology assessment (HTA) agencies are likely to be more interested in ‘final’ endpoints such as survival (or patient-focused endpoints such as quality of life). Data on final endpoints may be too ‘immature’ (with too few events) to draw conclusions at the time of trial reporting, and further data collection may be frustrated by cross-over between study groups.

What is it?

An extrapolation technique is proposed, using real-world data (RWD) to support the generation of estimates of treatment effect, especially survival and quality-adjusted survival, over time periods beyond those measured in an RCT.

A GetReal case study explored this method using RCT data (Scagliotti et al., 2008) and real-world data (RWD) for pemetrexed in stage IIIB/IV non-small-cell lung cancer (NSCLC). Data were made available to GetReal by Eli Lilly and Co. (GetReal partner). Although NSCLC was chosen as a suitable disease for using these methods, the same methods can be applied generally to other disease areas.

Parametric survival functions (curves) were fitted to trial survival data from the RCT (JMDB trial) alone. The curves were then adjusted to fit survival estimates from RWD (UK cancer registry). Goodness of fit of the different survival curves was assessed using the area under the curve (AUC) calculated over 6 years.

In the case study, trial data on overall survival were available for 2.5 years of follow-up, but in the associated submission to UK extrapolation to 6 years was undertaken (NICE TA181, 2009).

A variety of parametric distributions (exponential, Weibull and lognormal) were fitted to the trial data, and AUC (life-years) were calculated for each, with selection based on a summary goodness of fit statistic, the deviance information criterion (DIC).

AUCs were then averaged to account for model uncertainty using weights derived from the 5-year overall survival estimates from the UK cancer registry. From this a single averaged AUC and associated uncertainty were derived.

Different weighting methods were used:

  • (i) uniform weights, assuming that each parametric distribution is equally plausible a priori
  • (ii) Gaussian weights derived from UK cancer registry data, where each parametric distribution was assessed against 5-year survival from the UK cancer registry, and plausibility weights derived.

A Bayesian model averaging approach was used (Jackson et al., 2009), with analyses undertaken using Markov Chain Monte Carlo (MCMC) methods in WinBUGS version 1.4.3.

What were the results?

When RCT data only were used a Weibull distribution appeared to provide the best fit for overall survival. The model-averaged results using the UK cancer registry data were drawn towards those of the lognormal distribution, but with substantially increased uncertainty. A model-averaging approach not only needs to account for the uncertainty associated with each model but also between-model uncertainty.

Table. Area under curve (AUC) and deviance information criterion (DIC) for parametric survival models fitted separately to the gemcitabine and pemetrexed arms of the JMDB trial

Overall survival (72 months) Gemcitabine Pemetrexed
  AUC (SE) DIC AUC (SE) DIC
Exponential 12.95 (0.62) 3337.93 16.04 (0.73) 3167.50
Weibull 12.85 (0.48) 3301.66 14.80 (0.59) 3142.93
Lognormal 14.98 (0.78) 3369.55 17.84 (0.87) 3160.73
Uniform weights 12.85 (0.48) 14.80 (0.48)
UK cancer registry weights 12.95 (0.62) 16.55 (1.65)
Abbreviations: AUC, area under the curve; DIC, deviance information criterion; SE, standard error
Results are for 72 months, based on original JMDB RCT (unweighted)

Effectiveness challenge addressed by the method

This analytical method allows a real-world treatment effect in the long term to be predicted based on surrogate outcomes measured in the trial. The efficacy-effectiveness gap here is due to the trial duration being too short to collect long-term outcomes, which are considered to be most important by HTA agencies.

When is it useful?

  • For many disease areas, especially chronic conditions: this method can in principle be applied to medicines in a variety of disease areas. It is most suited to chronic conditions, where long-term outcomes are ‘immature’ (with few events reported) or unmeasured in RCTs.
  • The extrapolation method can be used to inform the choice of most appropriate extrapolation functions, preferably applied to an untreated (or best standard care) cohort to reflect the situation for a medicine prior to launch.
  • After phase 3 trials: its most obvious use is after pivotal phase 3 trials have concluded, as supplementary analyses for HTA submissions. Estimated long-term outcomes such as overall survival will then be used as inputs to model-based cost-effectiveness analyses.
  • When there are different opinions on the method for extrapolation: its greatest value may be when there are differences of opinion as to which survival model to use when extrapolating long-term outcomes (especially if there is a wide range of possible model-based extrapolations) – the use of RWD could help resolve this.
  • Within adaptive pathways or patient access schemes: may be useful within adaptive pathways or patient access schemes. Probably less applicable in traditional regulatory settings.
  • At re-assessment (for HTA): an assessment can be made of the accuracy of the initial (projected) estimates.
  • To support research and development (R&D): this approach may be used within pharmaceutical R&D, for example to support design of trials, interim analyses and long-term follow-up. In particular, an understanding of uncertainty in the projected effectiveness estimates may inform decisions about trial duration or trial extensions.

What are the limitations?

  • Availability of long-term data sources: it is important to identify as many long-term data sources as possible (to serve as the basis for extrapolation), to understand likely variation in the projected endpoint.
  • Quality of RWD: the quality and relevance of the RWD is critical to the acceptability of this method. Historical data is less likely to be acceptable, due to changing treatment practices.
  • Where possible, apply conventional methods using the RCT data only, to exclude some possible models before introducing RWD.
  • The use of long-term RWD (for example, from cancer registries) may help to fit extrapolation models when the pattern of outcome events over time suggests that a single survival curve is inappropriate. A variety of extrapolated curves can be constructed by choosing different time points at which to introduce RWD.
  • Planning and investment in RWD sources: early planning is important. If RWD sources are likely to be required there may need to be investment in their quality (and access) so the data from these sources can readily be used when the trial reaches its conclusion.

What do stakeholders say?

Stakeholder views on this analysis were sought at a GetReal workshop on NSCLC held in Frankfurt (10th Sept 2015).  For further information, see a summary of the GetReal case study.

Key contributors

Michael Happich and Mark Belger, Lilly
Prof. Keith Abrams, University of Leicester
Mike Chambers, GSK