Studies that use non-randomised methods (for example, clinician preference and patient suitability) to determine who will receive different treatments may result in there being systematic differences between participants in different treatment arms. When these differences, whether known or unknown, are also related to the outcome they are considered to be confounding factors. For example, if participants in one arm have more severe disease, they may respond differently to the treatment: the reported difference between treatment arms may be a result of the difference in severity as well as the impact of the treatment itself. Results from studies with confounding are less reliable and are considered to be biased (this is called selection bias).
By design, well-conducted randomised studies with an adequate study size should eliminate both known and unknown differences between treatment arms which may influence the outcome (i.e. have low risk of selection bias) due to the randomised nature of treatment selection.
In the absence of randomisation, modifications to study design, based on patient stratification or matching, may be used to control for some known differences between treatment arms, which should produce less biased results. However, use of these methods is not always possible, and they cannot be used to control for unknown differences.
In studies that do not use randomisation to control for confounding, statistical methods can be used to adjust the results and provide a less biased and more accurate estimate of the treatment effect. These methods can normally be categorised into those that adjust for known confounding factors and those that adjust for unknown confounding factors. The table below provides some of the more commonly known methods.
Table. Summary of methods to adjust for either known or unknown confounding
Methods that adjust for known confounding | |
Regression adjustment using regression models (such as logistic regression models by prognostic factors)* | Regression models represent quantitatively how covariates (such as prognostic factors) predict the outcome of interest. Models are fitted to data for both treated and untreated populations, and the estimated treatment effects are then based on the differences between the predictions of the two models (Faria et al., 2015). |
Inverse probability weighting (IPW)* | This method aims to make the groups more comparable by using a propensity score function to ‘weight’ data from different study subjects depending on chosen covariates or prognostic factors. A propensity score is a probability score attached to each study subject based their mix of characteristics. The inverse of the propensity score is used as a weight when mean values are calculated for each study group (Faria et al., 2015). |
Doubly robust methods | This method combines regression adjustment and IPW. Regression adjustment is made for the outcome, but not the treatment selection, resulting in a model being estimated for the probability of receiving treatment but not for an outcome (Faria et al., 2015). |
Regression based on propensity score* | This method uses the propensity score to control for correlation between treatment and covariates. Parametric regression is frequently used for the outcome variable (Faria et al., 2015).
This method may only be sufficient for controlling for known confounders when there are relatively few confounders (Schmidt et al., 2016). |
Regression based on disease risk score* | This method uses the disease risk score to control for correlations between treatment and covariates.
This method may only be sufficient (and less biased) when there are relatively few confounders (Schmidt et al., 2016). |
Matching | While matching can be done at the study design stage, analytical methods can also be used to ‘match’ control individuals who are similar to treated patients in one or more characteristic(s). This may be achieved using a propensity score matching (Faria et al., 2015). |
Parametric regression on a matched sample | This approach combines regression adjustment with matching, using the regression to control for any factors not adjusted for with matching (Faria et al., 2015). |
Methods that adjust for unknown confounding | |
Instrumental variable methods | This is the most commonly used method to deal with unknown confounding. The approach aims to find a variable (or instrument) that is correlated with the treatment, but not directly correlated to the outcome (except through the treatment) (Faria et al., 2015). |
Panel data models | This approach uses individuals as their own controls, at different time-points (Faria et al., 2015). |
*Method examined in a GetReal simulation study (Schmidt et al., 2016). |
Methods may also be categorised by their purpose in supporting the overall analysis of the study:
- Make the groups more comparable (for example, matching, inverse probability weighting)
- Control for the effect of the confounding factors (e.g. regression adjustment, instrumental variable methods)
- Make use of natural experiments that mimic randomisation (i.e. difference-in-difference and regression discontinuity) (Faria et al., 2015).
A NICE Decision Support Unit technical support document (Faria et al., 2015) has been produced ‘to help improve the quality of analysis, reporting, critical appraisal and interpretation of estimates of treatment effect from non-RCT studies’. This document summarises commonly available methods to analyse comparative individual participant data (IPD) from non-RCTs to estimate a treatment effect. The document also provides various tools, including an algorithm to help users to select the appropriate method for use in an analysis.
Many of the methods described here on adjusting for bias in pragmatic trials have been tested in GetReal simulations (see Analysing Pragmatic Trials). |
GetReal simulation on adjusting for confounding
As part of the GetReal study, a simulation was performed to examine different methods to adjust for confounding in post-launch settings (Schmidt et al., 2016). This simulation reported that methods that use disease risk scores may be a good alternative to logistic regression when there are low event rates or low numbers of participants in the treatment arms, but these methods remain far from perfect.