The defining feature of a randomised controlled trial (RCT), the random assignment of treatment groups, can ensure that characteristics of participants are similar in the groups being compared, when the trial is well conducted. This is most important when those characteristics also have a direct impact on the effect of a medicine, such as the severity of the disease (often called confounding variables or treatment effect modifiers).
While there methods other than randomisation that can be used to ensure equal distribution of these factors between groups (such as matching), random allocation is particularly important as there may be characteristics that influence a treatment effect that are not known.
Although other factors may influence the internal validity of a study, including the adherence to treatment protocols and the measurement of outcomes, the internal validity of well-conducted RCTs is likely to be high, providing more reliable estimates of a medicine’s effect. However, traditional RCTs are less likely to reflect the real world in the populations included, the way that interventions are administered or in other factors (i.e. they may have lower external validity).
The use of data collected outside RCTs (real-world data [RWD]) may have better external validity. However, the potential lack of internal validity and the potential for bias (the ‘robustness’ of the data) causes most uncertainty when using this data as a source of evidence on relative effectiveness. More information on the potential limitations of different RWD sources or study designs to inform relative effectiveness is found here and here.
Determining whether the effectiveness estimates reported in a study are credible and can be relied on for decision-making depends on multiple aspects relating to the quality of the study. Checklists to help assess a study for quality and credibility are discussed below.
Checklists for quality assessment
One of the key concerns about the use of evidence collected outside RCTs is the quality of studies used.
In the field of evidence-based medicine, checklists are often used to assess the quality of different study designs, aiming to ensure consistency across quality assessors. A number of existing checklists focus on methodological quality, but some also incorporate broader elements such as those relevant to cost-effectiveness analyses considered by payers or health technology assessment agencies.
A NICE Decision Support Unit technical support document (Faria et al 2015) has been produced ‘to help improve the quality of analysis, reporting, critical appraisal and interpretation of estimates of treatment effect from non-RCT studies’. This document includes a review and assessment of a number of existing checklists for quality assessment of the analysis of non-randomised studies.
The table below includes a list of commonly used checklists, organised by study design, some of which were reviewed by Faria et al 2015.
Table: Commonly used quality checklists by study design
|Study designa||Quality checklists|
|Randomised controlled trials (RCTs)||Cochrane risk of bias tool|
|Non-randomised study designs, controlled cohort, controlled before-and-after studies||In the context of cost-effectiveness analyses:
NICE DSU QuEENS checklist (for use on its own or to complement other checklists)
|Cohort and cross-sectional||STROBE checklists for cohort and cross-sectional studies|
|Case-control||STROBE checklist for case control studies|
|a A difficulty in choosing the appropriate checklist is in determining the classification of a study, particularly for observational studies. A checklist of design features is covered in the Cochrane handbook for systematic reviews of interventions (see tables 13.2a and 13.2b). Furthermore, Box 13.4a of the Cochrane handbook for systematic reviews of interventions provides useful notes for completing the appropriate checklist. The appropriate checklist for a pragmatic trial will be dependent on whether or not randomisation was used as a feature of the study.
b Included in the review and assessment by Faria et al 2015.
Abbreviations: CASP, Critical Appraisal Skills Programme; GRACE, Good Research for Comparative Effectiveness; ISPOR, International Society for Pharmacoeconomics and Outcomes Research; NICE DSU; National Institute for Health and Care Excellence Decision Support Unit; QuEENS, Quality of Effectiveness Estimates from Non-randomised studies; ROBINS-I, The Risk of Bias in Non-randomized Studies – of Interventions; STROBE, Strengthening the Reporting of Observational Studies in Epidemiology.
In addition to the checklists above, Grading of Recommendations Assessment, Development and Evaluation (GRADE) is an approach that guides users to assess the quality or certainty of evidence in terms of the directness of the evidence to the decision, the precision of the effect estimates, and the heterogeneity of the results in addition to the risk of bias. This system is used to assess evidence to inform the strength of recommendations in the context of a clinical guideline.
It may be possible adjust or control for some of the bias in non-randomised and observational studies – for methods on controlling for confounding bias, see here.