Assuring quality and credibility of RWE

The random assignment of participants to treatment groups is the defining feature of a randomised controlled trial (RCT). This ensures that (observed and unobserved) characteristics of participants are similar in the groups being compared, if the trial is well conducted. This is most important when those characteristics may also have a direct impact on the effect of a medicine. Typical examples of such characteristics, often referred to as confounding variables or treatment effect modifiers, are age of participants and severity of disease

Although there are methods other than randomisation (such as matching) that can be used to ensure equal distribution of these characteristics between groups, random allocation is particularly important as there may be characteristics that influence a treatment effect which are not known.

Although other factors, such as adherence to treatment protocols and measurement of outcomes, may influence the internal validity of a study, a well-conducted RCTs is likely to have high internal validity, providing reliable estimates of a medicine’s effect. However, traditional ‘explanatory’ RCTs are less likely to reflect the real world with respect to the population included, the administration of interventions or other factors (i.e. they may have lower external validity). Data collected outside RCTs (real-world data [RWD]) may have better external validity, but the potential lack of internal validity and the potential for bias (the ‘robustness’ of the data) from such sources results in more uncertainty when used as evidence for relative effectiveness.

For more information on the potential limitations of different RWD sources or study designs to inform relative effectiveness, see Sources of Real-World Data and Generating Real-World Evidence.

Determining whether the effectiveness estimates reported in a study are credible and can be relied on for decision-making depends on a number of aspects relating to the quality of the study. Checklists to help assess a study for quality and credibility are discussed below.

Checklists for quality assessment

One of the key concerns about the use of evidence collected outside RCTs is the quality of studies used.

In the field of evidence-based medicine, checklists are often used to assess the quality of different study designs, aiming to ensure consistency across quality assessors. Many existing checklists focus on methodological quality of different types of study. Some also incorporate broader elements such as relevance of the study to cost-effectiveness analyses required by payers or health technology assessment agencies.

A NICE Decision Support Unit technical support document (Faria et al., 2015) has been produced ‘to help improve the quality of analysis, reporting, critical appraisal and interpretation of estimates of treatment effect from non-RCT studies’. This document includes a review and assessment of a number of existing checklists for quality assessment of the analysis of non-randomised studies.

The table below includes a list of commonly used checklists, organised by study design, some of which were reviewed by Faria et al., 2015.

Table: Commonly used quality checklists by study design

Study design^a	Quality checklists
Randomised controlled trials (RCTs)	Cochrane risk of bias tool (Cochrane Handbook) CASP randomised controlled trial checklist (CASP Checklist)
Non-randomised study designs, controlled cohort, controlled before-and-after studies	In the context of cost-effectiveness analyses: ISPOR checklist for prospective observational studies^b ISPOR checklist for retrospective database studies^b Checklist for statistical methods to address selection bias in estimating incremental costs, effectiveness and cost-effectiveness (Kreif et al., 2013)^b NICE DSU QuEENS checklist (for use on its own or to complement other checklists) In general: GRACE checklist^a (Grace Initiative, 2014) STROBE combined checklist for cohort, case-control, and cross-sectional studies^b (Strobe Group, 2009) ROBINS-I assessment tool
Cohort and cross-sectional	STROBE checklists for cohort and cross-sectional studies (Strobe Group, 2009) ROBINS-I assessment tool CASP cohort study checklist (CASP Checklist) Newcastle-Ottawa scale (Wells et al., 2019)
Case-control	STROBE checklist for case control studies (Strobe Group, 2009) CASP case control checklist (CASP Checklist) Newcastle-Ottawa scale (Wells et al., 2019)
^a A difficulty in choosing the appropriate checklist is in determining the classification of a study, particularly for observational studies. A checklist of design features is covered in the Cochrane handbook for systematic reviews of interventions (Cochrane Training see tables 13.2a and 13.2b). Box 13.4a of the Cochrane handbook for systematic reviews of interventions provides useful notes for completing the appropriate checklist. ^b Included in the review and assessment by NICE DSU Unit (Faria et al., 2015). Abbreviations: CASP, Critical Appraisal Skills Programme; GRACE, Good Research for Comparative Effectiveness; ISPOR, International Society for Pharmacoeconomics and Outcomes Research; NICE DSU, National Institute for Health and Care Excellence Decision Support Unit; QuEENS, Quality of Effectiveness Estimates from Non-randomised studies; ROBINS-I, The Risk of Bias in Non-randomized Studies – of Interventions; STROBE, Strengthening the Reporting of Observational Studies in Epidemiology.

In addition to the checklists above, Grading of Recommendations Assessment, Development and Evaluation (GRADE) (GRADE Working Group) is an approach that guides users to assess the quality or certainty of evidence to support the strength of recommendations in clinical guidelines. This includes assessment of the directness of the evidence to the decision, the precision of the effect estimates, and the heterogeneity of the results in addition to the risk of bias.

It may be possible adjust or control for some of the bias in non-randomised and observational studies – for methods on controlling for confounding bias, Adjusting for Bias.