Analysing existing data

Analysing existing data from observational or randomised controlled trial (RCT) data can be a valuable method for identifying drivers of effectiveness.

What data should be analysed?


The information analysed should contain the following:

  • exposure to medicines (i.e. which medicine each participant received)
  • outcome (i.e. the medicine’s effect in patients)
  • key characteristics related to potential drivers of effectiveness (such as characteristics related to the actual use of medicines, the patients or disease, or the healthcare system).


A variety of data sources, such as electronic health records, observational studies and RCTs, can be used to explore drivers of effectiveness.

However, the following considerations need to be made when choosing different data sources:

  • Some sources, such as disease registries, may not contain information on exposure or key outcomes.
  • RCTs may only collect and report data on a few covariates for patients (such as age and gender) or disease (such as, duration of illness and baseline risk) and usually lack information on the healthcare system (such as the type of healthcare provider).
  • The use of a medicine in an RCT is usually standardised (for dose, duration and adherence of patients) and does not necessarily represent the use of the medicine in real life. More heterogeneous data offers more information on the variability of patients, disease characteristics and use of medicines in routine practice; however, larger datasets are needed when using more heterogeneous data to allow enough statistical power for the analyses.

For general descriptions of different data sources and study types see here and here.

Level of detail

Individual participant data (IPD) will have the optimal level of granularity (i.e. detail and precision of the information). Aggregate data are a summary of patient-level data from different sources and so lack precision.

What type of analyses should be run?

To determine the appropriate analyses to run, it is important to first consider the conceptual model that outlines the exposure to a medicine, the outcome and any possible contextual factors (which may be drivers of effectiveness).

Figure. Contextual factors or ‘drivers of effectiveness’ interact in the association between a medicine and the medicine’s effect or outcome

Analytical method to explore DOE

Specifying the exposure to a medicine with the outcome is important as there are different ways to define each of these:

  • Exposure to a medicine: a medicine can be considered alone (no comparison) and its absolute effect measured, or compared with another or several other drugs and its relative effect measured. This choice is important because identifying drivers of absolute effectiveness is different from identifying drivers of relative effectiveness. The choice will depend on the question that needs to be answered.
  • Outcome: ideally, several outcomes should be considered. The choice of the outcome is important since different drivers of effectiveness may be identified by looking at different outcome measures. Outcomes may be continuous (such as evolution of symptoms and biological parameters) or dichotomous (such as death and hospitalisation).

Literature reviews or expert interviews should be used to form hypotheses of potential drivers of effectiveness before conducting data analysis. This will help to avoid random findings or data-driven results, and the need to conduct multiple analyses.

The possibility that the association between exposure and outcome may be stronger (or be in a different direction) for different levels of the driver of effectiveness should be considered when designing analyses.

The figure below uses the example of medicines for schizophrenia to help visualise the conceptual model.

Figure. A conceptual model for schizophrenia

Analytical method to explore DOE

    In this example, the exposure is to drug A (vs. drug B – relative effectiveness), the outcome is the evolution of schizophrenia symptoms (time frame to be defined) and two potential drivers of effectiveness are analysed: adherence to medication and cannabis use. The figure also shows that adherence and cannabis use may be correlated.

How should data be analysed?

The observational or RCT studies identified during the literature review may report aggregate data or IPD.

Aggregate data may be used to compare the effect of a medicine reported in RCT and observational studies to determine if there is a gap between efficacy and effectiveness. The study population characteristics may then be analysed to see if they explain the gap (i.e. if they are the drivers of effectiveness).

IPD may be used to test the differences in effect of the potential drivers of effectiveness identified through expert interviews or literature reviews, and to simulate the impact of excluding individual patients with specific characteristics on the medicine’s effect.

The data extracted and the analyses that can be performed will be based on the level of detail available. For more detail about how data should be analysed for IPD or aggregate data, see the research briefing here [INSERT LINK TO RESEARCH BRIEFING AND D2.2].

GetReal case studies using data analyses to identify drivers of effectiveness

Key contributors

Clementine Nordon, LASER
Robert Olivares, Sanofi
Mikkel Z Ankarfeldt, Novo Nordisk