Propensity weighting and extrapolation: a case study in non-small-cell lung cancer


A key objective of Work Package 1 of GetReal was to develop a framework for incorporating real-world data (RWD) into decision-making. Case studies were constructed to explore the different ways that RWD may be used to help demonstrate the relative effectiveness of new medicines.

Randomised controlled trials (RCTs) of new medicines in oncology and other disease areas are often undertaken in study populations drawn from many centres, satisfying strict inclusion and exclusion criteria. These trials are primarily designed to meet the regulatory requirements for marketing authorisation, with tight protocols and high internal validity. However, generalisability of trial results to specific national or local ‘reimbursable’ populations may be questioned, in particular when results are presented for health technology assessment (HTA). In addition to the ‘effectiveness challenge’ that study participants (or settings) do not correspond to the local target population, a further challenge may be that the effectiveness endpoint of interest (often overall survival in oncology) is too ‘immature’ (with too few events) to draw conclusions at the time the trial results are reported.

External real-world data (RWD) may help to address these challenges, when used in analytical techniques to reweight data available from RCTs or in models to project RCT data over longer time periods. In this case study the use of these techniques was explored using RCT data (JMDB trial) and RWD (FRAME study) for pemetrexed in stage IIIB/IV non-small-cell lung cancer (NSCLC). Data were made available to GetReal by Eli Lilly and Co. (GetReal partner). Although NSCLC was chosen as a suitable disease for using these methods, the same methods can be applied generally to other disease areas.

What was examined in this case study?

The following methods were used to generate estimates of the relative effectiveness of pemetrexed in stage IIIB/IV NSCLC.

  • Propensity weighting, in which observational RWD are used to reweight data available from an RCT to generate estimates of relative effectiveness that are generalisable to a target population suitable for reimbursement decision making (and HTA). (for more information about these methods see here)
  • Extrapolation techniques, which use RWD to support generation of estimates of treatment effect, especially survival and quality-adjusted survival, over time periods beyond that measured in the source RCT. (for more information about these methods see here)

In the propensity weighting analysis, estimates of the relative effectiveness of pemetrexed + cisplaitin vs. gemcitabine + cisplatin in stage IIIB/IV NSCLC were generated using data from the JMDB study (RCT data) and the FRAME study (RWD). Estimates were generated for the population described by the FRAME study, assuming this to represent a target population for reimbursement.

In the extrapolation analysis, parametric survival functions (curves) were fitted to overall survival data from the JMBD trial, both unweighted and reweighted (using the FRAME study). These survival curves were then adjusted to fit estimates of long-term survival from the UK cancer registry. Goodness of fit of the different survival curves was assessed using the area under the curve (AUC) calculated over 6 years.

Outputs of analyses were discussed at a GetReal workshop held in Frankfurt (10 Sept 2015) during which stakeholder views of the utility and applicability of the methods tested were obtained. Workshop outputs are reported here. The following questions were discussed:

  • Could you envisage using these approaches in your decision-making process?
  • What issues might stand in the way of adopting this approach?
  • Are there situations where these approaches would be particularly useful (or not at all useful)?
  • How can we communicate the implications of these approaches to engage a broad range of stakeholders?

What were the findings and conclusions?

Propensity weighting
The reweighted analysis of the RCT yielded a hazard ratio (HR) closer to 1, with greater uncertainty (HR: 0.86, 95% CI: 0.59 to 1.30) compared with the original (HR: 0.81, 95% CI: 0.70 to 0.94) in a similar population in the clinical trial. Sensitivity analyses to both the methods of reweighting and the inclusion of baseline covariates gave broadly similar results.

When trial data only were used, a Weibull distribution appeared to provide the best fit for overall survival. However, when cancer registry data were included, use of a lognormal distribution was more supported.

What do stakeholders say?

Propensity weighting

  • Although there was substantial interest in the propensity weighting method, it was not considered ready for use in regulatory or reimbursement decision-making yet: a wider range of case studies, coupled with a technical review of the methods, validation of outputs and education of decision makers is needed.
  • However, it could be useful for pharmaceutical research and development to help understand or explore the benefit‑risk profile of medicines in development and help design phase 3 trials.
  • The method will most likely have utility in disease areas where there is known to be a large ‘efficacy‑effectiveness gap’ and possibly in the context of adaptive pathways or managed access schemes.
  • There was a call for earlier planning of real-world evidence (RWE) studies and involvement of RWE experts in trial designs, to enable such analyses to be performed more readily in future.


  • Use of RWE in extrapolations (of overall survival) was more familiar to workshop participants and is of most interest currently to NICE in the UK, where it is used extensively.
  • The method was thought to have most utility for chronic conditions with long-term (unmeasured) outcomes, and possibly in the context of adaptive pathways.
  • The main value may be to help reconcile differences of opinion about which survival model to use for such extrapolations, but attention needs to be paid to whether the source trial and RWE populations are well matched.
  • More evidence of the robustness of the method and case studies was called for.

More details from the stakeholder workshop are found here.

Key contributors

Michael Happich and Mark Belger, Lilly
Prof. Keith Abrams, University of Leicester
Mike Chambers, GSK