Report: Why do clinical trials fail? 

Overview

Clinical trials are time consuming, costly, and often onerous on patients. Clinical trials can fail for many reasons. [1]This report examines many of these reasons and presents insights on opportunities for improving the possibility of creating and executing successful clinical trials.

Clinical trials for pharmaceuticals and medical devices offer numerous opportunities for failure. Failures can arise from a lack of efficacy, a deficiency in funding or issues with safety[1]. Similarly there are other factors such as failing to maintain good manufacturing protocols, not following MHRA guidance, or there are problems with patient recruitment, enrolment, and retention.[2] Generating accurate and sufficient results to determine whether or not there is value in continuing is important in the clinical trial process. The investments of resources, time, and funding grow with successive stages, from pre-clinical through phase 3.[3] Therefore if a phase 3 fails, there is a huge financial loss as it relates to all previous trials, as well as the time spent looking into alternatives. This report describes some of the points of contention and issues that have come to light on the failures of clinical trials.

Factors associated with clinical trials that fail:

Eligibility criteria: Exclusion and Inclusion criteria

If there are too many exclusion criteria it becomes problematic, this is because they limit the number of patients, but also have a negative impact on the drug approval and will prevent the sponsor from “gaining knowledge in important patient populations”[1] before registration. It has been noted that in some cases this act of limitation in phase II of clinical development in particular, is justified by reducing variability;[2] on the other hand there is no published evidence that increasing patient population with other criteria will inevitably increase the variability of the primary endpoint.[3]

The inclusion and exclusion criteria should result in a population that matches the general patient population, the criteria must also be chosen in light of effect on recruitment.[4] However inclusion criteria may vary across studies therefore providing a lack of guidance to sponsors. As well as this, having specific inclusion criteria can lead to problems in finding the best and most suitable participants. Having inclusion criteria too narrow could lead to longer recruitment times, it was stated that 16% of protocol amendments are due to changes in inclusion or exclusion criteria,[5] this can lead to differences in the patient populations before and after the amendments.

Failing to demonstrate efficacy or safety

A way in which trials are ill designed is captured in the concept of efficacy, effectiveness and safety. Efficacy demonstrates how well a drug works in model conditions and its effectiveness on how well the drug works with patients. One of the primary sources of trial failure has been with the lack of efficacy[6] . It was determined that out of 640 phase 3 trials with novel therapeutics, that 57% of those that failed was due to inadequate efficacy[7]

Study design

A poor study design can lead to trial failures, for instance selecting the wrong patients or the wrong endpoint, not to mention bad data, can lead to problems in the trial.[1] However data sources can help sponsors be sure that the right patients are then recruited as well as choosing the proper and correct sites and countries to enhance the likely hood of success.

Another common cause of failure in clinical research is based on not being able to meet criteria that have been predetermined by the MHRA. As well as this, it is important to recognise that a sponsor is necessary to move a drug or device forward in the clinical trial process. If studies are rushed into phase 3 after a successful phase 2 it could lack time for reflection on how to address safety in phase 3[2].

Financial impact

It has been noted that of the phase 3 studies that failed, 22% of them failed due to a lack of funding[3]. This financial burden also leads to ethical issues regarding the patients that are involved in the trial, patients are under the impression that their involvement would lead to the advancement of the trial and its successful completion.[4] Therefore underfunded trials are likely to lack the enrolment needed to demonstrate efficacy.

Financial risks

There are risks at all stages of development, however the cost that is associated with having to re-do studies or even delays will escalate the cost further. However taking steps that will identify and address risks early on in the development process is key. But there are many companies that do not carefully monitor for risks, and sometimes don’t even identify problems until much further down the line when it is then difficult to address them cost-effectively.[5] On the other hand this comes from the hesitation of companies to terminate a project prematurely. It was noted that in a study of 842 molecules and 637 development program failures, it was evident that the companies that took time to identify problems early on and stop development on an imperilled trial, then have a much better likelihood of reaching the market with their drug.[6]

Other factors that can result in trial failure include; how funding is misspent, lack of a correct design study, not enough funding designated from the offset, which in turn shows costing is not accurately calculated.[7] As well as this dropout rates effect the financial stability of trials and difficulties with treatment adherence such as side effects, or a lack of follow ups will also contribute to the financial impact of clinical trials.

Patient recruitment

There are various reasons why patient recruitment results in the failure of clinical trials. Firstly, there are too many companies using the same, preferred trial site thus chasing going after too few subjects. The targeted disease may be rare and so the number of subjects is too small to begin with. The failure to enrol a sufficient number of patients is a long standing problem with a UK study of 114 trials 10 indicated that only 31% met enrolment goals.[1] 

Patient difficulties

Only another aspect of this is that patients who are ill cannot travel so easily to designated hospitals. There has been some companies trying to address this problem by potentially bringing the trial to people’s homes, however this could present further issues.[2] As well as this, some studies offer certain remunerations to patients to cover expenses in the hope that recruitment could be improved. However this step has not provided any evidence that paying patients to participate in said trials generate better recruitment.[3] Although financial incentives did not result in better recruitment, it was reported that financial incentives did increase participant’s response to questionnaires for the trial.[4]

Additional costs

There are additional costs with patient recruitment which can be difficult to estimate and become highly variable. It is evident that marketing strategies such as advertisement can play an important role in the financial viability of a trial.[5] As well as this, healthcare providers can significantly impact patient recruitment, the aspect of recruitment and retention can potentially suffer when staff are unavailable or just perceived to be, or if there is a constant rotation of new staff and no relationship is able to develop between staff and the patient. Establishing this communication and trust may lead to better participation.

All of these patient recruitment problems have an effect on the trial and cause massive delays in some cases. There are only 6% of clinical trials completed within the time frame given, with a further 80% of trials delayed by at least a month.[6] These delays affect study costs but also subsequent sales causing high potential loss. As costs are high there is potential for money to be lost, therefore there are massive gains to be made by improving the rate of recruitment and retention[7]

Unethically designed trials

It cannot be assumed that everyone understands the value of honesty, or is sensitive to it. Breaches occur, and ethical issues introduce a high risk of trial failure, severely damaging the reputation of all parties involved, i.e. the pharmaceutical company, the CRO and the pharmaceutical physicians.[1] Too many industry cases illustrate that alleged short-term gains can rapidly turn into long-term losses.[2]

Patient risk

The general problems with the ethics of clinical trials come from the fact that participants bear the risk and burden. Participation in a clinical trial has an increased level of risk; this is because of the exposure to effects of new treatment. These risks however are not “offset by a prospective clinical benefit”[3], this is because the goal of the trial is not to treat trial participants but to produce generalised medical knowledge


References:

Saberwal, Gayatri. “Biobusiness in Brief: What Ails Clinical Trials?” Current Science, vol. 115, no. 9, Current Science Association, 2018, pp. 1648–52, https://www.jstor.org/stable/26978474.

Pharmafile “clinical trials and their patient”  https://www.pharmafile.com/news/511225/clinical-trials-and-their-patients-rising-costs-and-how-stem-loss (2016)

Chris Plaford “why do most clinical trials fail” https://www.clinicalleader.com/doc/why-do-most-clinical-trials-fail-0001#_ftn1 (2015)

  Hwang T.J., Carpenter D., Lauffenburger J.C., Wang B., Franklin J.M., Kesselheim A.S. Failure of investigational drugs in late-stage clinical development and publication of trial results. JAMA Intern. Med. 2016;176:1826–1833.

 Tukey, John W. “Use of Many Covariates in Clinical Trials.” International Statistical Review / Revue Internationale de Statistique, vol. 59, no. 2, [Wiley, International Statistical Institute (ISI)], 1991, pp. 123–37, https://doi.org/10.2307/1403439.

Worrall, John. “<em>What</Em> Evidence in Evidence‐Based Medicine?” Philosophy of Science, vol. 69, no. S3, [The University of Chicago Press, Philosophy of Science Association], 2002, pp. S316–30, https://doi.org/10.1086/341855.

Altman, Douglas G. “Size Of Clinical Trials.” British Medical Journal (Clinical Research Edition), vol. 286, no. 6381, BMJ, 1983, pp. 1842–43, https://www.jstor.org/stable/29511193.

Fogel DB. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review. Contemp Clin Trials Commun. 2018;11:156-164. Published 2018 Aug 7. doi:10.1016/j.conctc.2018.08.001

Jansen, Lynn A. “The Problem with Optimism in Clinical Trials.” IRB: Ethics & Human Research, vol. 28, no. 4, Hastings Center, 2006, pp. 13–19, https://www.jstor.org/stable/30033204.

Chris Plaford “why do most clinical trials fail” https://www.clinicalleader.com/doc/why-do-most-clinical-trials-fail-0001#_ftn1 (2015)

Kobak Kenneth “why do clinical trials fail? Journal of Clinical Psychopharmacology: February 2007 – Volume 27 – Issue 1 – p 1-5 doi: 10.1097/JCP.0b013e31802eb4b7

Emerging use-cases for AI in clinical trials

Overview

Clinical trials are becoming more expensive where, according to a Deloitte’s report1, the average cost to get a drug to market in the USA was $1.188 billion in 2010, and $1.981 billion in 2019. This increase in cost reflects the difficulties that are associated with current linear clinical trial designs. Clinical trials can take a long time due to difficulty in finding suitable/eligible patients for each study and the growing amount of data available used to plan or inform a trial. Current clinical trials are still evolving to make use of the rapidly developing technologies, scientific methods, and data availability of recent years.

One possible way to improve and transform the clinical trial process as we know it would be through the implementation of Artificial Intelligence (AI). AI incorporates all intelligence demonstrated by machines and includes important aspects such as Machine Learning and Natural Language Processing. It is already widely used in modern technology, such as in smartphones and online website searches, but has also been used more recently to innovate the drug discovery process. AI implementation could be of benefit across diverse tasks is the planning, execution and analysis stages of clinical trials to improve cost effectiveness, study time, treatment efficacy, and quality of data2.

Many emerging developments in artificial intelligence (AI) have the potential to benefit the clinical trials landscape.

AI in Adaptive clinical trial design

Many clinical trials are designed linearly, however adaptive designs are being used to allow predetermined changes during a study in response to ongoing trial data. Adaptive designs provide the flexibility to optimise resource allocation, end an unproductive trial early, and better characterise a treatment’s efficacy and safety through multiple endpoints. AI can help to inform and optimise adaptive designs through the analysis of healthcare data to select optimal endpoints, determine and monitor parameters for early stopping, and identify appropriate protocols for the trial3. These study design changes can increase the efficiency of a study, resulting in a trial which is more cost and time effective while maintaining high quality data collection and analysis.

Meta-analysis

AI can also enhance the exploratory analysis of data from previous trials. AI-enabled technology can collect, organise, and analyse increasing amounts of data, which could be applied to data from previous trials. This is normally performed manually by a biostatistician as part of a meta-analysis, but AI could help to gather and perform initial analyses before a more in-depth statistical approach is taken. This could highlight potentially important patterns in collected evidence which would then be used in informing trial design.

Synthetic control arms

Current clinical trials typically compare an experimental treatment to placebo and established treatments, assigning enrolled patients to either a treatment or control group. Synthetic control arms are an AI-driven solution for having a control group in a single-arm trial, which usually have only a treatment group. Synthetic control arms use data from previous studies to simulate the control treatment in patients. This would allow for all patients in a trial to receive active treatment to provide more evidence for the treatment efficacy and safety at each clinical trial stage. This may also have an impact on patient enrolment as patients generally show less interest in enrolling on placebo-controlled trials4,5.

As synthetic control arms are relatively novel, comparisons should be made with traditional control groups. In a typical blinded trial, patients are unaware if they are receiving the experimental active treatment, or a placebo. This is to test that the new treatment is causative of any clinically meaningful response. Synthetic control arms could create more single-arm clinical trials, and the fact that patients are aware of receiving active treatment would be a factor given clinically meaningful results are found.

Site selection

Identification of suitable sites and investigators to perform a trial is an important factor in study efficiency and feasibility. A study site should be amply equipped to carry out a study, must be of a suitable size to process the needs of study participants, and be located in an accessible area to potential participants and investigators. Identification of target locations and investigators can be optimised through AI implementation, which also enables real-time monitoring of site performance once the trial has started. Study sites can be evaluated and compared through the development of a points-based algorithm which could factor in location, site size, and equipment.

Patient enrolment/recruitment

Most patients enrol on a clinical trial if they have not responded to existing treatments in a clinically meaningful way. However, there are often strict eligibility criteria required for patients to enrol for a trial, including diagnostic tests, biomarker profiles, and demographics. Currently, patients find out about clinical trials either through manually searching online databases or, on occasion, through a clinician’s recommendation. This puts a lot of responsibility on patients to search for potential trials, just to be faced with trying to understand eligibility criteria full of medical jargon. This can lead to low patient recruitment.

AI and natural language processing offer a potential solution for this issue. Natural language processing could be used to match patients with trials based on eligibility criteria and patient electronic health records. Potentially suitable trials could then be suggested to patients or their clinicians, making it easier for patients to find suitable trials and for trials to recruit patients. While this is an improvement to the current recruitment method, natural language processing may initially have some difficulty with clinical notes due to heavy use of acronyms, medical jargon, and deciphering of hand-written clinical notes. These problems aren’t specific to AI though, as currently patients looking for suitable trials have the same difficulties.

AI-driven patient recruitment could also be used to reduce population heterogeneity and use prognostic and predictive enrichment to increase study power. While reduced heterogeneity can be beneficial (e.g. in testing the safety of drugs for patients unable to enrol on early clinical trials), caution should be taken with limiting patient diversity. Selectively enrolling patients who are more likely to respond well to treatment initially could lead to advancing a treatment that may not be as wel tolerated in the post-market patient population.

AI balancing of diversity in clinical trials could be used to balance this. It is important to test safety and efficacy in with differing demographics for a treatment to be properly characterised. This may include different ethnic groups, body types (height, weight, BMI), ages, and sexes. For example, a drug intended for female patients should specify dosage programs and any specific adverse effects in female patients before approval.

Patient diagnostics

Trial data often uses diagnostic tests to measure a patient’s response to treatment. AI can be used to improve diagnostic accuracy and objectivity during trials, reduce the potential for bias, and help with blinding a trial. AI programs have already been shown to provide more accurate diagnoses compared to clinicians6, which could be extended to use in clinical trials. AI can also be used to integrate multiple biomarkers or large data sets (e.g. bioinformatics data) to better monitor and understand a patient’s response, and to make any needed changes in dosing.

Another application of AI in clinical trials would be in patient management. Wearable devices or apps could be used to provide real time safety and effectiveness data to both clinicians and patients, leading to better data quality and higher patient retention.

Patient monitoring, retainment, & medication adherence

AI could be used to monitor patients through automatic data capture and digital clinical assessments. Automated monitoring with AI can allow for personalised adherence alerts, and wearable devices could provide real time safety and efficacy data shared with both patient and clinician to increase retainment and adherence rates. Video consultations with clinicians could improve retainment by reducing travelling required from patients, but there would still be a risk of drop due to travel as some diagnostic tests would need to be done at an appropriate site, which may warrant additional costs if tests for research purposes are not covered by a patient’s health insurance or national health service.

Currently, medication adherence is mostly dependent on each patient’s diary/record keeping or memory, which is then discussed with clinicians during routine appointments. This can make it difficult to accurately track adherence. Digitising this through use of a website/app would allow for more accurate adherence data to be obtained, in addition to providing patients with notifications, educational content, and adherence records. Other medical devices such as timed medication bottles could also be used to ensure medication is used in appropriate intervals, and smart bottles could be used to synchronise this with a smartphone app if applicable.

Data Cleaning

Data cleaning for clinical trials is typically performed by trained biostatisticians and is essential to ensure that collected data is consistently formatted and free from inputting errors. However, the data cleaning process can be time-consuming, especially with large datasets collected during clinical trials. AI could be implemented through machine learning methods to identify and correct errors found in clinical trial datasets7, leading to better quality data which is optimised for analysis. An AI approach may also reduce the amount of time spent on data cleaning.

AI implementation and clinical trial digitisation

AI implementation is relevant in several aspects of clinical trials, be it in study design, patient diagnostics, or trial management. Several tech giants (including Apple & Google) have invested in developing solutions to process electronic health records, monitor patients remotely, and integrate healthcare data into devices. By improving the cost and time effectiveness of clinical trials, both patients and pharma-tech companies benefit with more affordably priced treatment costs for patients and greater return of investment for companies.

However, for clinical trials to implement AI successfully, many aspects of clinical trials would first require digitisation. Many trials still use paper documents instead of digital alternatives, which results in lost documents and slowed trial progression. Integrating electronic health records, digital copies of clinicians’ notes, and digital patient monitoring alone would help in designing and managing a trial. There is a concern that a switch to digital may be difficult for patients unfamiliar with technology, or for those who might prefer to keep paper diaries. However, digital solutions would allow for the development and implementation of AI-based solutions which would modernise and streamline the clinical trial process.

References

  1. Taylor K, Properzi F, Cruz M, Ronte H, Haughey J. Intelligent clinical trials [Internet]. www2.deloitte.com. 2020 [cited 16 March 2022]. Available from: https://www2.deloitte.com/content/dam/insights/us/articles/22934_intelligent-clinical-trials/DI_Intelligent-clinical-trials.pdf

  2. Glass L, Shorter G, Patil R. AI IN CLINICAL DEVELOPMENT [Internet]. IQVIA. 2019 [cited 16 March 2022]. Available from: https://www.iqvia.com/-/media/iqvia/pdfs/library/white-papers/ai-in-clinical-development.pdf
  • Bhatt A. Artificial intelligence in managing clinical trial design and conduct: Man and machine still on the learning curve?. Perspectives in Clinical Research. 2021;12(1):1-3. Available from: https://doi.org/10.4103/picr.PICR_312_20

4 common study designs for clinical trials

Clinical trial design is an important aspect of interventional trials that serves to optimise, ergonomise and economise the clinical trial conduct. The goals of a clinical trial, whether medtech or pharma, can encompass assessment of safety, dosage optimisation, evaluation of efficacy or accuracy and comparison to existing treatments or diagnostics. This of course varies with the phase of the trial. For phase III or IV trials the goal is most often to determine superiority, non-inferiority, or equivalence of the novel therapeutic or device to one in standard use. A well-conducted study that achieves regulatory approval for the asset in an efficient way depends upon the design that informs it. An optimal design, from a statistical and data collection perspective, ensures accurate evaluation efficacy and safety, as well as getting the product to market sooner. Knowing which study designs best suit your research will improve the chances of success, enable the best method for sample size estimation and re-estimation, save time and reduce unnecessary costs (Evans, 2010). While many clinical study designs exist this article focuses on perhaps the most rudimentary and frequently used designs

  • Parallel group design
  • Crossover design
  • Factorial design
  • Randomised withdrawal design

1. Parallel group study design

A commonly used study design is a parallel arm design. When using this as a study design, subjects are randomised and allocated to one or more study arms. In a parallel group study design, each study arm is allocated a different intervention. After study subjects have been randomised and allocated to a study arms they can not be allocated to another arm throughout the study.

Advantages of parallel group trial study design

A key advantage of parallel group trial design is that it can be applied to many different diseases as well as allows for conducting multiple experiments simultaneously between many groups. A further advantage is that these different groups need not be sourced from the same site.

Note: Once patients have been randomised and assigned to a specific arm, these arms are mutually exclusive. This means that unplanned co- interventions or cross-overs between different treatments cannot be introduced.

Steps involved in a parallel arm trial design:

1. Eligibility of study subject assessed

2. Recruitment into study after consent

 3. Randomisation

4. Allocation to either treatment or control arm
 

2. Cross-over study design

There are some ethical limitations to the use of placebo controls that can be partially overcome by using a cross over design. This means that every patient taking part in the clinical trial will receive both treatment and placebo being given in a randomised order (Evans, 2010). Cross-over study design can also be used in the absence of placebo where the intention is to compare the new treatment to the standard one.

Advantages of cross-over design

One of the advantages of cross over design is the fact that each patient acts as their own control results in order to balance the covariates in treatment and control arm.

Another major advantage of cross over design is the fact that it requires a smaller sample size (Nair, 2019).

Note: When cross over design is applicable and chosen for the study, some of the patients will start the trial with using intervention A and then switch to intervention B which is known as a AB sequence, whereas other patients will start with using intervention B and later switch to intervention A which is known as BA sequence.

! There needs to be an adequate washout period before the crossover in order to eliminate the effects from initially assigned and administrated intervention. After all data has been collected the results are then compared within the same subject assessing the effect of intervention A vs. effect of intervention B (Nair, 2019).

Variations of cross-over design

(i) Switch back design (ABA vs BAB arms) –

1. Drug A -> Drug B-> Drug A

2.Drug B -> Drug A -> Drug B

The switch back and multiple switchback designs are of emerging relevance with the advent of biosimilars where switchability and interchangeability of a biosimilar to a bio-originator molecule can only be confirmed by such trial designs.

(ii) N of 1 design – N of 1 trials or “single-subject” or “structured within-patient randomized controlled multi-crossover trial design”

This type of cross over design is used for evaluating all interventions in a single patient. A typical N of 1 design clinical trial consists of repeating experimental/ control treatment periods number of times. The interventions being tested are assigned randomly within each period pair. This design has gained a lot of popularity, because in most cases the aim of using this type of design is to determine which treatment works best for the individual patient.

3. Factorial design

Factorial design is most suited when the study is looking at two or more interventions in various combinations within one study setting. This design helps in the study of interactive effects that have resulted from a combination of different interventions (Nair, 2019).

Advantages of Factorial design

A key advantage of factorial design is that it can help answer multiple research questions in a clinical trial instead of conducting multiple trials.  This helps to optimise resources, thereby reducing costs and speeding up research pipelines.

2 × 2 factorial design with placebo

In a 2 × 2 factorial design with placebo, patients are randomized into four groups:

i) treatment A plus placebo
 ii) treatment B plus placebo
 iii) both treatments A and B
 iv) neither of them, placebo only.

Limitations of the factorial design

The main limitations of using factorial design for clinical trials is the fact that:

○  Increased complexity of the trial overall

○  Makes it more difficult to meet inclusion criteria

○  Inability to combine multiple incompatible interventions

○  The protocols are complex

○  High complexity of statistical analysis

4. Randomised withdrawal design (EERW)

The aim of randomised withdrawal design is to evaluate the optimal duration of the treatment for patients that are responsive to the intervention.

 After the initial enrichment period (open label period) which main purpose is to assign the subjects to intervention, the subjects that are not responding are removed (dropped) from the study and the subjects that did respond are randomised into receiving the intervention or placebo during the second phase of the clinical trial (Nair, 2019).

Note: This means that only subjects that have responded are carried forward to the second stage of the study and randomised.

Statistical analysis of randomised withdrawal design

When using randomised withdrawal design the analysis of the study is conducted using only data from the withdrawal phase. Outcome is usually set to relapse of symptoms. The aim of the enrichment phase is to increase the statistical power for the estimated sample size.

Advantages of EERW

A main advantage of a randomised withdrawal design is that it can reduce the time patients receive placebo. Only patients that are responsive to the intervention are randomised to placebo, hence an increased ethical advantage.
            A further advantage of this study design is that it can help to determine if the treatment should be stopped or continued (Nair,2019).

Conclusion

One of the key stages of planning a clinical trial involves deciding on the appropriate study design to ensure the success of the research and help to choose the right method for sample size estimation and re-estimation, save time and reduce unnecessary costs.

The most commonly used study designs are :

  • Parallel group study design
  • Cross over study design
  • Factorial study design
  • Randomised withdrawal study design (EERW )


            A well-conducted study with optimal design, that encorporates a robust hypothesis evolved from clinical practice, goes a long way in facilitating the regulatory approval process – evaluating efficacy and safety, and getting the product to market. Therefore when undertaking a clinical trial close attention should be paid to ensure that a study design forms a solid foundation upon which to conduct the trial phases.

References
 Evans, S., 2010. Fundamentals of clinical trial design. [online] PubMed Central (PMC). Available at: <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3083073/>.
 Expert, T., 2022. Clinical Trial Designs & Clinical Trial Phases | Credevo Articles. [online] Credevo Articles. Available at: <https://credevo.com/articles/2021/02/05/the-phase-of- study-clinical-trial-design/>.
 Nair, B., 2019. Clinical Trial Designs. [online] PubMed Central (PMC). Available at: <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6434767/>.
 The BMJ | The BMJ: leading general medical journal. Research. Education. Comment. n.d. 13. Study design and choosing a statistical test | The BMJ. [online] Available at: <https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/ 13-study-design-and-choosing-statisti>.

Bayesian approach for sample size estimation and re-adjustment in clinical trials

            Accurate sample size calculation plays an important role in clinical research. Sample size in this context simply refers to the number of human patients, wheather healthy or diseased, taking part in the study. Clinical studies conducted using an insufficient sample size can lack the statistical power to adequately evaluate the treatment of interest, whereas a superfluous sample size can unnecessarily waste limited resources.

          Various methods can be applied for determining the optimal sample size for a specific clinical study. Methods also exist for any re-adjustments throughout the study,  if required. These methods vary widely from straightforward tests and formulas to complex, time-consuming ones, depending on the type of study and available information from which to make the estimate. Most commonly used sample size calculation procedures are developed from a frequentist perspective

Importance of knowing your study parameters

          Accurate sample size calculation requires, information on several key study and research parameters. These parameters usually include an effect size and variability estimate, derived from available sources; a clinically meaningful difference. In practice these parameters are  generally unknown and must be estimated from the existing literature or from pilot studies.

The Bayesian Framework in sample size estimations and re-adjustments

The Bayesian Framework has gradually become one of the most frequently mentioned methods when it comes to randomised clinical trial sample size estimations and re-adjustments.

In practice, sample size calculation is usually treated explicitly as a decision problem and employs a loss or utility function.

The Bayesian approach involves three key stages:

  • 1. Prior estimate

A researcher has a prior estimate about the treatment effect (and other study parameters) that has been derived from meta-analysis of existing research, pilot studies, or  based on expert opinion in absence of these.

  • 2. Likelihood

Data is simulated to derive a likelihood estimate of prior parameters.

  • 3. Posterior estimate

Based on the insights obtained, prior estimates from the first stage are updated to give a more precise final estimate.

         

A challenge of using this approach is knowing when to stop this cycle when enough evidence has been gathered and avoid creating bias (Dreibe,2021). Peaking at the data in order to make a stopping decision is called “optional stopping”. In general an optional stopping rule is cautioned against as it can increase type one error rates (de Heide & Grunewald, 2021).

How to decide when to stop the simulation cycle ?

There are two approaches one could take.

  • 1. Posterior probability

            Calculating the posterior probability that the mean difference between the treatment and control arm is equal or greater than the estimated effect of the intervention. Based on the level of probably calculated (low or high) the cycle could be stopped and without any further need to gather more data.

  • 2. PPOS ( predictive probability of success)

        Calculating the predictive probability of achieving a successful result at the end of the study is a commonly used approach. It is really helpful when it comes to determining the success or failure of a study. Similarly, as with posterior probability based on the level of probability a decision could be made to stop or continue the study.

How to plan a Bayesian sample size calculation for a clinical trial

The key elements to consider when planning a Bayesian clinical trial are the same as for frequentists clinical trial.

Key planning stages:

  • Determine the objective of the clinical study
  • Determine and set endpoints
  • Decide on the appropriate study design
  • Run a meta analysis or review of existing evidence related to your research objective
  • Statistical test and statistical analysis plan (SAP)

Even though the key planning stages are the same for both approaches it does not mean that they can be mixed through out the study. If you have chosen to use one approach you can’t change to another method once the calculations have been generated and research started.

Bayesian approach vs Frequentist approach for sample size calculations

BayesianFrequentist
Prior and posterior( uses probability of hypothesis and data)No prior or posterior( never gives probability of hypothesis)
Sample size depends on the prior and likelihoodSample size depend on the likelihood
Requeres finding/deciding on prior in order to estimate sample sizeDoes not require prior to estimate sample size
Computationally intensive due to integration over many parametersLess computationally intense

          Frequentist measures such as p-values and confidence intervals continue to predominate the methodology across life sciences research, however, the use of the Bayesian approach in sample size estimations and re-estimation for RTCs has been increasing over time.

Bayesian approach for sample size calculations in medical device clinical trial

           In the recent years Bayesian approach has gained more popularity as the method used in clinical trials including medical device studies. One of the reasons being that if good prior information about the use of the specific therapeutic or device is available, the Bayesian approach may allow to include this information into the statistical analysis part of the clinical trial. Sometimes, the available prior information for a device of interest may be used as a justification for smaller sample size and shorten the length of the pivotal trial (Chen et al., 2011).

Computational algorithms and growing popularity of Bayesian approach

          Bayesian statistical analysis can be computationally intense. Despite that there have been multiple breakthroughs with computational algorithms and increased computing speed that have made it much easier to calculate and build more realistic Bayesian models, further contributing to the popularity of Bayesian approach. (FDA, 2010).

Markov Chain Monte Carlo (MCMC) method

          One of the basic computational tools being used is Markov Chain Monte Carlo ( MCMC) method. This method computes large number of simulations from the distributions of random quantities.

Why MCMC?

          MCMC helps to deal with computational difficulties one often can face when using Bayesian approach for needed sample  size estimations. The MCMC is an advanced random variable generation technique which allows one to simulate different samples from more sophistocated probability distributions.

Conclusion

          Sample size calculation plays an important role in clinical research. If underestimated, statistical power for the detection of a clinically meaningful difference will likely be insufficient; if overestimated, resources are wasted unnecessarilly.

          The Bayesian Framework has become quite popular approach for sample size estimation. There are advantages of using the Bayesian method, depite this there has been some criticism of this approach as a sample size estimation and re-adjustment method due to the prior being subjective and possibility of different researchers selecting different priors leading to different posteriors and final conclusions.

In reality, both the Bayesian and frequentist approaches to sample size calculation involve deriving the relevant input parameters from the literature or clinical expertise and could potentially differ due to variations in individual expert opinion as to which studies to include or exclude in this process.

          Bayesian approach is more computationally intensive compared to the traditional frequentist approaches. Therefore, when it comes to selecting a method for sample size estimation, it should be chosen carefully to best fit the particular study design and base-on advice provided by statistical professionals with expertise in cliical trials.

References:

Bokai WANG, C., 2017. Comparisons of Superiority, Non-inferiority, and Equivalence Trials. [online] PubMed Central (PMC). Available at: <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5925592/> [Accessed 28 February 2022].

Chen, M., Ibrahim, J., Lam, P., Yu, A. and Zhang, Y., 2011. Bayesian Design of Noninferiority Trials for Medical Devices Using Historical Data. Biometrics, 67(3), pp.1163-1170.

E, L., 2008. Superiority, equivalence, and non-inferiority trials. [online] PubMed. Available at: <https://pubmed.ncbi.nlm.nih.gov/18537788/> [Accessed 28 February 2022].

Gubbiotti, S., 2008. Bayesian Methods for Sample Size Determination and their use in Clinical Trials. [online] Core.ac.uk. Available at: <https://core.ac.uk/download/pdf/74322247.pdf> [Accessed 28 February 2022].

U.S. Food and Drug Administration. 2010. Guidance for the Use of Bayesian Statistics in Medical Device Clinical. [online] Available at: <https://www.fda.gov/regulatory-information/search-fda-guidance-documents/guidance-use-bayesian-statistics-medical-device-clinical-trials> [Accessed 28 February 2022].

van Ravenzwaaij, D., Monden, R., Tendeiro, J. and Ioannidis, J., 2019. Bayes factors for superiority, non-inferiority, and equivalence designs. BMC Medical Research Methodology, 19(1).

de Heide. R, Grunewald, P.D, 2021, Why optional stopping can be a problem for Bayesians; Psychonomic Bulletin & Review, 21(2), 201-208.

Clinical Trial Phases in Drug Development


The development of new drugs starts far before they are even seen in clinical trials. The discovery of multiple candidate drugs occur early on in the development process, often as a result of new information about how a disease functions, large-scale screening of small molecules, or the release of a new technology.

After a promising drug has been found, pre-clinical studies can be performed. A pre-clinical study for a new drug is used to determine important information about toxicity and suitable dosage amounts. These studies can be in vitro (in cell culture) and/or in vivo (in animal models) and determine whether a treatment will continue to the clinical trials stage.

Clinical trials test whether these experimental treatments are safe for use in humans, and whether they are more effective in treating or preventing a disease when compared to existing treatments. Clinical trials consist of several stages, called phases, where each phase is focused on answering a different clinical question: Progression of a treatment to the next phase requires the study to meet several parameters to ensure a treatment’s safety or efficacy.

  • Phase 0: Is the new treatment safe to use in humans in small doses?
  • Phase I: Is the new treatment safe to use in humans in therapeutic doses?
  • Phase II: Is the new treatment effective in humans?
  • Phase III: Is the new treatment more effective than existing treatments?
  • Phase IV: Does the new treatment remain safe and effective post-market?
Key phases of a pharmaceutical clinical trial

Phase 0: Small dose safety

Phase 0 studies can help to streamline the other clinical trial phases. Phase 0 consists of giving a few patients small, sub-therapeutic doses of the new treatment. This is to make sure that the new treatment behaves as expected by researchers and isn’t harmful to humans prior to using higher doses in phase I trials.

Phase I: Therapeutic dose safety

Phase I studies evaluate the safety of various doses of the new treatment in humans. This takes several months with typically around 20-80 healthy volunteers. In some cases, such as in anti-cancer drug trials, the study participants are patients with the targeted cancer type. A treatment may not pass phase I if the treatment leads to any serious adverse events.

Initial dosages in phase I studies can be informed based on data obtained during pre-clinical animal studies, and adjustments can be made to investigate the treatment’s side effect profile and develop an optimal dosing program. This could also include comparing different methods of giving a drug to patients (e.g., oral, intravenous etc.).

Phase II: Treatment efficacy

After passing phase I trials and having proven safety in humans, a new treatment advances to phase II studies designed to assess whether it may prevent or treat a disease. This phase can take between several months to 2 years, testing the new treatment in up to several hundred patients with the disease. Using a larger number of patients over a longer time period provides researchers with additional safety and effectiveness data, which is essential for the design of phase III trials.

To further test safety and efficacy, it is common to have a control group that receives either a placebo (a harmless pill or injection without the new treatment) or other current treatment (in trials where the disease is fatal unless treated e.g., cancer).

Phase III: Comparing to current treatments

Phase III studies are the last stage of a clinical trial before a new treatment can be approved for market use. The primary focus of a phase III study is to compare the safety and efficacy of a new treatment with current, existing treatments in patients with the target disease. Anywhere from several hundred to 3,000 patients may be included in a phase III study for between 1 to 4 years. Due to the scale of this phase, long-term or rare side effects are more likely to be uncovered.

Phase III studies are often randomised control trials, where patients will be randomly designated to different treatment groups. These groups may receive placebo, a current treatment (control group), the new treatment, or variations of the new treatment (e.g., different drug combinations). Randomised control trials are often double-blinded, where both the patient and the clinician administering their treatment do not know which treatment group they are assigned to.

A new treatment may continue to market and phase IV trials if the results prove it is as safe and effective as an existing treatment.

Phase IV: Post-market surveillance

If a new treatment passes phase III and is approved by the MHRA, FDA, or other national regulatory agency, it can be put to market. Phase IV is carried out in the post-market surveillance of the new treatment to keep updated on any emerging or long-term safety and efficacy concerns. This may include rare or long-term adverse side effects that were not yet discovered, or long-term analyses to see if the new treatment improves the life expectancy of a patient after recovery from disease.

Summary

Clinical trials are ultimately designed to mitigate risk. This includes the risk to the safety of trial participants by limiting the use of potentially unsafe treatments to small doses in a small number of patients before scaling up to testing therapeutic dose safety. Risk mitigation is not only for patient safety but also for preventing financial misspending as a treatment that is deemed unsafe in phase 0 would not proceed to the later, more costly clinical trial phases.

Not all clinical trials are the same, however, as each trial will have a different disease and treatment context. Trials for medical devices are somewhat different from pharmaceutical trials (for more information about the differences between medical device and pharma trials, click here). In addition, while sample sizes expand with phase progression, the required sample size for each trial and each phase is dependent on several factors including disease context (a rare disease may require lower sample sizes), patient availability (location of trial), trial budget and effect size. The sample size values mentioned earlier in this blog are purely indications of what each phase may use (for more information on how a biostatistician determines a suitable sample size, click here).

References

https://www.fda.gov/patients/drug-development-process/step-3-clinical-research

https://www.cancer.org/treatment/treatments-and-side-effects/clinical-trials/what-you-need-to-know/phases-of-clinicaltrials.html

https://www.healthline.com/health/clinical-trial-phaseshttps://www.brightfocus.org/clinical-trials/how-clinical-trials-work/phases-clinical-trials

Medical Device Clinical Trials vs Pharmaceutical Clinical Trials – What’s the Difference?


Medical devices and drugs share the same goal – to safely improve the health of patients. Despite this, substantial differences can be observed between the two. Principally, drugs interact with biochemical pathways in human bodies while medical devices can encompass a wide range of different actions and reactions, for example, heat, radiation (Taylor and Iglesias, 2009). Additionally, medical devices encompass not only therapeutic devices but diagnostic devices, as well (Stauffer, 2020).

More specifically medical device categories can include therapeutic and surgical devices, patient monitoring, diagnostic and medical imaging devices, among others; making it a very heterogeneous area (Stauffer, 2020). As such, medical device research spills over into many different fields of healthcare services and manufacturing. This research is mostly undertaken by SME’s ( small to medium enterprises) instead of larger well-established companies as is more predominantly the case with pharmaceutical research. SME’s and start-ups undertake the majority of the early stage device development, particularly where any new class of medical device is concerned, whereas the larger firms get involved in later stages of the testing process (Taylor and Iglesias, 2009).

Classification criteria for medical devices

There are strict regulations that researchers and developers need to follow, which includes general device classification criteria. This classification criterion consists of three classes of medical devices, the higher class medical device the stricter regulatory controls are for the medical device. 

  • Class I, typically do not require premarket notifications
  • Class II,  require premarket notifications
  • Class III, require premarket approval

Food and Drug Administration (FDA)

Drug licensing and market access approval by the Food and Drug Administration (FDA) and international equivalents require manufacturers to undertake phase II and III randomised controlled trials in order to provide the regulator with evidence of their drug’s efficacy and safety (Taylor and Iglesias, 2009).

Key stages of medical device clinical trials

In general medical device clinical trials are smaller than drug trials and usually start with feasibility study. This provides a limited clinical evaluation of the device. Next a pivotal trial is conducted to demonstrate the device in question is safe and effective (Stauffer, 2020).

Overall the medical device trials can be considered to have three stages:

  • Feasibility study,
  • Pivotal study to determine if the device is safe and effective,
  • Post-market study to analyse the long-term effectiveness of the device.

Clinical evaluation for medical devices

Clinical evaluation is an ongoing process conducted throughout the life cycle of a medical device. It is first performed during the development of a medical device in order to identify data that need to be generated for regulatory purposes and will inform if a new device clinical investigation is necessary. It is then repeated periodically as new safety, clinical performance and/or effectiveness information about the medical device is obtained during its use.(International Medical Device Regulators Forum, 2019)

During the evaluative process, a distinction must be made between device types – diagnostic or therapeutic. 

The criteria for diagnostic technology evaluations are usually divided into four groups:

  • technical capacity
  • diagnostic accuracy
  • diagnostic and therapeutic impact
  • patient outcome

The importance of evaluation

Evaluations provide important information about a device and can indicate the possible risks and complications. The main measures of diagnostic performance are sensitivity and specificity. Based on the results of the clinical investigation the intervention may be approved for the market. When placing a medical device on the market, the manufacturer must have demonstrated through the use of appropriate conformity assessment procedures that the medical device complies with the Essential Principles of Safety and Performance of Medical Devices(International Medical Device Regulators Forum, 2019).The information on effectiveness can be observed by conducting experimental or observational studies.

Post-market surveillance

Manufacturers are expected to implement and maintain surveillance programs that routinely monitor the safety, clinical performance and/or effectiveness of the medical device as part of their Quality Management System (International Medical Device Regulators Forum, 2019). The scope and nature of such post market surveillance should be appropriate to the medical device and its intended use. Using data generated from such programs (e.g. safety reports, including adverse event reports; results from published literature, any further clinical investigations), a manufacturer should periodically review performance, safety and the benefit-risk assessment for the medical device through a clinical evaluation, and update the clinical evidence accordingly.

The use of databases in medical device clinical trials

The variations in the available evidence-base for devices means that, unlike with drugs, medical devices will typically require the consideration and analysis of data from observational studies in ascertaining their clinical and cost-effectiveness. Using modern observational databases has advantages because these databases represent continuous monitoring of the device in real-life practice, including the outcome (Maresova et al., 2020).

Bayesian methods as an alternative framework for evaluation

Bayesian methods for the analysis of trial data have been proposed as an alternative framework for evaluation within the FDA’s Center for Devices and Radiological Health. These methods provide flexibility and may make them particularly well suited to address many of the issues associated with the assessment of clinical and economic evidence on medical devices, for example, learning effects and lack of head-to-head comparisons between different devices.

Use of placebo in medical vs pharmaceutical trials

An additional key difference between drug and medical device trials are that use of placebo in medical device trials are rare. If placebo is used in a trial for surgical / implanted devices  it would usually be a sham surgery or implantation of a sham device (Taylor and Iglesias, 2009). Sham procedures are high risk and may be considered unethical. Without this kind of control, however, there is in many cases no sure way of knowing whether the device is providing real clinical benefit or if the benefit experienced is due to the placebo effect. 

Conclusion

            In conclusion, there are many similarities between medical device and pharmaceutical clinical trials, but there are also some really important differences that one should not miss:

  1.  In general medical device clinical trials are smaller than drug trials.
  2.  The research is mostly undertaken by SME’s ( small to medium enterprises) instead of big well-known companies
  3. Drugs interact with biochemical pathways in human bodies whereas medical devices use a wide range of different actions and reactions, for example, heat, radiation.
  4. Medical devices can be used for not only diagnostic purposes but therapeutical purposes as well.
  5.  The use of placebo in medical device trials are rare in comparison to pharmaceutical clinical trials.

References:

Bokai WANG, C., 2017. Comparisons of Superiority, Non-inferiority, and Equivalence Trials. [online] PubMed Central (PMC). Available at: <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5925592/> [Accessed 28 February 2022].

Chen, M., Ibrahim, J., Lam, P., Yu, A. and Zhang, Y., 2011. Bayesian Design of Noninferiority Trials for Medical Devices Using Historical Data. Biometrics, 67(3), pp.1163-1170.

E, L., 2008. Superiority, equivalence, and non-inferiority trials. [online] PubMed. Available at: <https://pubmed.ncbi.nlm.nih.gov/18537788/> [Accessed 28 February 2022].

Gubbiotti, S., 2008. Bayesian Methods for Sample Size Determination and their use in Clinical Trials. [online] Core.ac.uk. Available at: <https://core.ac.uk/download/pdf/74322247.pdf> [Accessed 28 February 2022].

U.S. Food and Drug Administration. 2010. Guidance for the Use of Bayesian Statistics in Medical Device Clinical. [online] Available at: <https://www.fda.gov/regulatory-information/search-fda-guidance-documents/guidance-use-bayesian-statistics-medical-device-clinical-trials> [Accessed 28 February 2022].

van Ravenzwaaij, D., Monden, R., Tendeiro, J. and Ioannidis, J., 2019. Bayes factors for superiority, non-inferiority, and equivalence designs. BMC Medical Research Methodology, 19(1).

Complex Adaptive Systems (CAS) Approach to Biomedicine & Public Health

While a majority of biomedical and public health research still maintains a linear reductive approach to arrive at empiric insight, reality is in most cases neither of these things. A complex adaptive systems approach, like reality, is non-linear and high dimensional. There are many benefits from taking a linear cause-effect reductivist approach in that the complex problem and it’s solution becomes simplified into terms that can be understood and predicted.  Where this falls short is that predictions don’t often hold up in real world examples in which the outcomes tend to seem unpredictable.

Genomics, proteomics, transcriptomics and other “omics” techniques have generated an unprecedented amount of molecular genetics data. This data can be combined with larger scale data, from cell processes to environmental influences in disease states, to produce highly sophisticated multi-level models of contributing factors in human health and disease.

An area that is currently seeing an evolution into a more personalised nuanced approach, albeit still linear, is clinical trials. By introducing a biomarker component to clinical trials, for example to evaluate drug efficacy, the number of dimensions to the problem is slightly increased in order to arrive at more targeted and accurate solutions. More specifically, the number of patient sub-categories in the clinical trials increases to accommodate various biomarker groups which may respond more of less well to a different pharmacological approach to the same disease. Increasing the dimensions of the problem beyond this would, for now not be feasible or even helpful. On the other hand, understanding the interplay between biomolecular processes and environmental interactions in order to gain insight into disease processes themselves and thereby, which biochemical pathways for oncology drugs to target, is something that clearly benefits from a non-linear approach.

Another example of a system that benefits from a non-linear approach is public health service provision and the desire to garner insights into changes that increase prevention, early intervention and treatment effectiveness as well as reduce service cost for the government and patient. Both of the above examples require attention to both macro and micro processes.

Some components of complex adaptive systems: Connectivity, self organisation, emergent, fractal patterns, non-linear, governed by feedback loops, adaptive, nested systems, stochastic, simple rules, chaotic behaviour, iteration, sub optimal, requisite variety, may be considered optimised at the edge of chaos.

Whether modelling clinical health services networks or biological processes, complex adaptive systems consist of several common characteristics.

Components of complex adaptive systems

Massive interdependencies and elaborate connectivity

The complex adaptive systems approach shifts emphasis away from studying individual parts (such as seen in conventional medical science which produces notably fragmented results) to characterising the organisation of these parts in terms of their inherently dynamic interactions. CAS are open rather than closed systems because it is exogenous elements impacting on the system that cause the disruption required for growth.

Complex adaptive systems can be understood by relations and networks. System processes occur in networks that link component entities or agents. This approach emphasises that structures are dynamic and it is the process of becoming rather than the being itself that is of empirical interest.

Necessarily transdisciplinary or multi-disciplinary

A complex adaptive systems approach requires a transdisciplinary approach. The collaboration of numerous disparate experts is required in the combining of myriad biological, physical and societal based sciences into a holistic model. This model should aim to represent pertinent simultaneous top-down and bottom up processes that reveal contexts and relationships within the observed system dynamics.

Self-organising, emergent behaviour

Complex adaptive systems are selt-organising in the sense that observed patterns are open ended, potentially unfinished and cannot be predicted by the conventional definition. Rules of cause and effect are context dependent and can’t be applied in a rigid sense.

A self organising dynamic structure, which can be identified as a pattern, emerges as a result of individual spontaneous interactions between individual agents or elements. This pattern then impacts the interactions of individuals in a continual top down, bottom up symbiosis.

While linear models represent a reductionist, closed conceptualisation of the phenomena under analysis, a complex systems approach embraces high dimensionality true to the myriad real world phenomena composing a system. This requires that the system be treated as open and of uncertain ontology and thus lacking predictive capacity with regards to the outcomes of system dynamics.

As an emergent phenomena, the complex adaptive system can be understood by interacting with it rather than through analysis or static modelling. This approach is concerned with “state change” or to evaluate “how things are becoming”, rather than “how thing are”. How did today’s state emerge from yesterday’s trajectories and process dynamics?

Fractal engagement. Fractal engagement entails that the system as a whole orientates through multiple actions. The same data can produce frameworks at the level of responsibility of every individual agent. Using public health intervention as an example, Individual agents make decisions, based on the data, as to what changes they can make tomorrow within their own sphere of competence, rather than overarching changes being dictated and determined in a top down way, or by others.

Feedback loops

Feedback loops link individual parts into an overaching dynamic structure. Feedback loops are self-reinforcing and can be positive or negative.

Negative feedback loops are stabalising in that they have a dampening effect on oscillations that causes the system or component to move closer to equilibrium. Positive feedback loops are morphogenic and increase the frequency and amplitude of oscillations driving the system away from homeostasis and leading to changes in the underlying structure or the system.

Positive feedback loops, while facilitating growth and adaptation, tend towards chaos and decay and are thus crucially counterbalanced by simultaneously operating negative feedback loops. Evolution is supposed to occur as a series of phase transitions, back and forth, from ordered to disordered states.

Both top-down and bottom-up “causality”

While CAS models describe elements in terms of possibilities and probabilities, rather than cause and effect in the linear sense, there is a clear interplay between top down and bottom up causality and influence on the dynamic flows and trajectories of any system. This is very much a mirror of real world systems. One example of this is the human body where both conscious thought (top down) and biomolecular processes such as hormonal and neurochemical fluctuations (bottom up) effect mood, which in turn has a lot of flow on effects down stream that cause changes that shirt the system back and forth from health to disease. One such manifestation of this is stress induced illness of various kinds. As a social example, we can of course find many examples of top down and bottom up causation in a public heath or epidemiological setting.

This has been a non-exhaustive description of just some key components of complex adaptive systems. The main purpose is to differentiate the CAS paradigm from the more mainstream biomedical research paradigm and approach. For a deeper dive into the concepts mentioned see the references below.

References:

https://core-cms.prod.aop.cambridge.org/core/services/aop-cambridge core/content/view/F6F59CA8879515E3178770111717455A/9781108498692c7_100-118.pdf/role_of_the_complex_adaptive_systems_approach.pdf

Carmichael T., Hadžikadić M. (2019) The Fundamentals of Complex Adaptive Systems. In: Carmichael T., Collins A., Hadžikadić M. (eds) Complex Adaptive Systems. Understanding Complex Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-20309-2_1

https://www.health.org.uk/sites/default/files/ComplexAdaptiveSystems.pdf

Milanesi, L., Romano, P., Castellani, G., Remondini, D., & Liò, P. (2009). Trends in modeling Biomedical Complex Systems. BMC bioinformatics10 Suppl 12(Suppl 12), I1. https://doi.org/10.1186/1471-2105-10-S12-I1

Sturmberg J. P. (2021). Health and Disease Are Dynamic Complex-Adaptive States Implications for Practice and Research. Frontiers in psychiatry12, 595124. https://doi.org/10.3389/fpsyt.2021.595124

Master Protocols for Clinical Trials

Part 1: Basket & Umbrella Trial Designs

Introduction

As the clinical research landscape becomes ever more complex and interdisciplinary alongside an evolving genomic and biomolecular understanding of disease, the statistical design component that underpins this research must adapt to accommodate this. Accuracy of evidence and speed with which novel therapeutics are brought to market remain hurdles to be surmounted.

While efficacy studies or non-inferiority clinical trials in the drug development space traditionally only included broad disease states usually with patients randomised to a dual arm of new treatment compared to an existing standard treatment. Due to patient biomarker heterogeneity, effective treatments could be left unsupported by evidence. Similarly treatments found effective in a clinical trial don’t always translate to show real world effectiveness in a broader range of patients.

Our current ability to assess individual genomic, proteomic and transcriptomic data and other patient bio-markers for disease, as well as immunologic and receptor site activity, has shown that different patients respond differently to the same treatment and, the same disease may benefit from different treatments in different patients – thus the beginnings of precision medicine.  In addition to this is the scenario where a single therapeutic may be effective against a number of different diseases or subclasses of a disease based on the agent’s mechanism of action on molecular processes common to the disease states under evaluation.

Master protocols, or complex innovative designs, are designed to pool resources to avoid redundancy and test multiple hypotheses under one clinical trial, rather than multiple clinical trials being carried out separately over a longer period of time.

Due to this fairly novel evolution in the clinical research paradigm and also due to inherent flexibility within each study design, conflicting information related to the definition and characterisation of master protocols such as basket and umbrella clinical trials as well as cases in the published literature where the terms “basket” and “umbrella” trials have been used interchangeably or are ill-defined exists. For this reason a brief definition and overview of basket and umbrella clinical trials is included in the paragraphs that follow. Based on systematic reviews of existing research it seeks the clarity of consensus, before detailing some key statistical and operational elements of each design.

Master protocols for bio-marker based clinical trials.
Diagram of a basket trial design.

Basket trial:

A basket clinical trial design consists of a targeted therapy, such as a drug or treatment device, that is being tested on multiple disease states characterised by a common molecular process that is impacted by the treatment’s mechanism of action. These disease states could also share a common genetic or proteomic alteration that researchers are looking to target.

Basket trials can be either exploratory or confirmatory and range from full randomised, controlled double-blinded designs to single arm designs, or anything in between. Single arm designs are an option when feasibility is limited and are more focused on the pre-clinical stage of determining efficacy or whether a particular treatment has clear-cut commercial potential evidenced by a sizable enough retreat in disease symptomology. Depending on the nuances of the patient populations being evaluated final study data may be analyses by pooling disease states or by each disease state separately. Basket trials allow drug development companies to target the lowest hanging fruit in terms of treatment efficacy, focusing resources on therapeutics with the highest potential of success in terms of real patient outcomes.

Master protocol umbrella trial
Diagram of an umbrella trial design.

Umbrella trial:

An umbrella clinical trial design consists of multiple targeted treatments of a single disease where patients can be sub-categorised into biomarker subgroups defined by molecular characteristics that may lend themselves to one treatment over another.

Umbrella trials can be randomised, controlled double-blind studies that in which each intervention and control pair is analysed independently of other treatments in the trial, or where feasibility issues dictate, they can be conducted without a control group with results analysed together in-order to compare the different treatments directly.

Umbrella trials may be useful when a treatment has shown efficacy in some patients and not others, they increase the potential for confirmatory trial success by honing in on patient sub-populations that are most likely to benefit due to biomarker characteristics, rather than grouping all patients together as a whole.

Basket & Umbrella trials compared:

Both basket and umbrella trials are typically biomarker guided. The difference being that basket trials aim to evaluate tissue-agnostic treatments to multiple diseases based on common molecular characteristics, whereas umbrella trials aim to evaluate nuanced treatment approaches to the same disease based on differing molecular characteristics between patients.

Biomarker guided trials have an additional feasibility constraint to non-biomarker guided trials in that the size of the eligible patient pool is reduced in proportion to the prevalence of the biomarker/s of interest within that patient pool. This is why master protocol methodology becomes instrumental in enabling these appropriately complex research questions to be pursued.

Statistical Concepts and considerations of basket and umbrella Trials

Effect size

Basket and umbrella trials generally require a larger effect size than traditional clinical trials, in order to achieve statistical significance. This is in a large part due to the smaller sample sizes and higher variance that comes with that. While patient heterogeneity in terms of genomic or molecular diversity, and thus expected treatment outcome, has been reduced by the precision targeting of the trial design, there is a certain degree of between-patient heterogeneity that can only be expected when relying on treatment arms of very small sample sizes.

If resources, including time, are tight then basket trials enable drug developers to focus on less risky treatments that are more likely to end in profitability. It should be noted that this does not always mean that the treatments that are rejected by basket trials are truly clinically ineffective. A single arm exploratory basket trial could end up rejecting a potential new treatment that, if subject to a standard trial with more drawn out patient acquisition and a larger sample size, would have been deemed effective at a narrower effect size.

Screening efficiency

If researchers carry out separate clinical studies for each biomarker of interest, then a separate screening sample needs to be recruited for each study. The rarer the biomarker, the larger the recruited screening sample would need to find enough people with the biomarker to participate in the study. This number needs to be multiplied by the number of biomarkers. A benefit of master protocols is that a single sample of people can be screened for multiple biomarkers at once, greatly reducing the required screening sample size.

 For example, researchers interested in 4 different biomarkers could collectively reduce the required screening sample by three quarters compared to conducting separate clinical studies for each biomarker. This maximisation of resources can be particularly helpful when dealing with rare biomarkers or diseases.

Patient allocation considerations

If relevant biomarkers are not mutually exclusive a patient could fit into multiple biomarker groups for which treatment is being assessed in the study. In this scenario a decision has to be made as to which category the patient will be assigned and the decision process may occur at random where appropriate. If belonging to two overlapping biomarker groups is problematic in terms of introducing bias in small sample sizes, or if several patients have the same overlap, then a decision may be made to collapse the two biomarkers into a single group or eliminate one of the groups. If a rare genetic mutation is a priority focus in the study then feasibility would dictate that the patient be assigned to this biomarker group.

Sample Size calculations

Generally speaking, sample size calculation for basket trials should be based on the overall cohort, whereas sample size calculations for umbrella trials are typically undertaken individually for each treatment.

Basket and umbrella trials can be useful in situations where a smaller sample size is more feasible due to specifics of the patient population under investigation. Statistically designing for this smaller sample size typically comes at the cost of necessitating a greater effect size (difference between treatment and control) and this translates to lower overall study power and greater chance of type 1 error (false negative result) when compared to a standard clinical trial design. Despite these limitations master protocols such as basket or umbrella trials allow to evaluation of certain treatments to the highest possible level of evidence that otherwise might be too heterogeneous or rare to evaluate using a traditional phase II or III trial.

Randomisation and control

Randomised controlled designs are recommended for confirmatory analysis of an established treatment or target of interest. The control group typically treats patients with the established standard of care for their particular disease or, in the absence of one, placebo.

In master basket trials the established standard of care is likely to differ by disease or disease sub-type. For this reason it may be necessary for randomised controlled basket trials pair a control group with each disease sub-group rather than just incorporating a single overall control group and potentially pooling results from all diseases under one statistical analysis of treatment success. Instead it is worth considering if each disease type and corresponding control pair could be analysed separately to enhance statistical robustness in a truly randomised controlled methodology.

Single arm (non-randomised designs) are sometimes necessary for exploratory analysis of potential treatments or targets. These designs often require a greater margin of success (treatment efficacy) to be statistically significant as a trade-off for a smaller sample size required.

Blinding

To increase the quality of evidence, all clinical studies should be double blinded where possible.

To truly evaluate the effectiveness of a treatment without undue bias from a statistical perspective double-blinding is recommended.

Aside from increased risk of type 2 error that may be inherent in master protocol designs, there is a greater potential for statistical bias to be introduced. Bias can introduce itself in a myriad of ways and results in a reduction in the quality of evidence that a study can produce. Two key sources of bias are lack of randomisation (mentioned above) and lack of blinding.

Single armed trials do not include a control arm and therefore patients cannot be randomised to a treatment arm where double-blinding of patients, practitioners, researchers and data managers etc will prevent various types of bias creeping in to influence the study outcomes. With so many factors at play it is important not to overlook the importance of study blinding and implement it whenever feasible to do so.

If the priority is getting a new treatment or product to market fast to benefit patients and potentially save lives, accommodating this bias can be a necessary trade-off. It is after-all typically quite a challenge to have clinical data and patient populations that are at homogeneous and matched to any great degree, and this reality is especially noticeable with rare diseases or rare biomarkers.

Biomarker Assay methodology

The reliability of biologic variables included in a clinical trial should be assessed, for example the established sensitivity and specificity of particular assays needs to be taken into account. When considering patient allocation by biomarker group, the degree of potential inaccuracy of this allocation can have a significant impact on trial results, particularly when there is a small sample size. If the false positive rate of a biomarker assay is too high this will result in the wrong patients qualifying for treatment arms, in some cases this may reduce the statistical power of the study.

A further consideration of assay methodology pertains to the potential for non-uniform bio-specimen quality at different collection sites which may bias study results. A monitoring framework should be considered in order to mitigate this.

Patient tissue samples required for assays, can inhibit feasibility and increase time and cost in the short term and make study reproducibility more complicated. While this is important to note these techniques are in many cases necessary in effectively assessing treatments based on our contemporary understanding a many disease states such as cancer within the modern oncology paradigm. Without incorporating this level of complexity and personalisation into clinical research it will not be possible to develop evidence based treatments that translate into real-world effectiveness and thus widespread positive outcomes for patients.

Data management and statistical analysis

The ability to statistically analyse multiple research hypotheses at once within a single dataset increases efficiency at the biostatisticians end and allows frameworks for greater reproducibility of the methodology and final results, compared to the execution and analysis of multiple separate clinical trials testing the same hypotheses. Master protocols also enable increased data sharing and collaboration between sites and stakeholders.

Deloitte research estimated that master protocols can save clinical trials 12-15% in cost and 13-18% in study duration. These savings of course apply to situations where master protocols were a good fit for the clinical research context, rather than to the blanket application of these study designs across any or all clinical studies. Applying a master protocol study design to the wrong clinical study could actually end up increasing required resources and costs without benefit, therefore it is important to assess whether a master protocol study design is indeed the optimal approach for the goals of a particular clinical study or studies.

umbrella trials for precision medicine
Master protocols for precision medicine.

References:

Bitterman DS, Cagney DN, Singer LL, Nguyen PL, Catalano PJ, Mak RH. Master Protocol Trial Design for Efficient and Rational Evaluation of Novel Therapeutic Oncology Devices. J Natl Cancer Inst. 2020 Mar 1;112(3):229-237. doi: 10.1093/jnci/djz167. PMID: 31504680; PMCID: PMC7073911.

Lesser N, Na B, Master protocols: Shifting the drug development paradigm, Deloitte Center for Health solutions

Lai TL, Sklar M, Thomas, N, Novel clinical trial solutions and statistical methods in the era of precision medicine, Technical Report No. 2020-06, June 2020

Renfro LA, Sargent DJ. Statistical controversies in clinical research: basket trials, umbrella trials, and other master protocols: a review and examples. Ann Oncol. 2017 Jan 1;28(1):34-43. doi: 10.1093/annonc/mdw413. PMID: 28177494; PMCID: PMC5834138.

Park, J.J.H., Siden, E., Zoratti, M.J. et al. Systematic review of basket trials, umbrella trials, and platform trials: a landscape analysis of master protocols. Trials 20, 572 (2019). https://doi.org/10.1186/s13063-019-3664-1

Distributed Ledger Technology for Clinical & Life Sciences Research: some Use-Cases for Blockchain & Directed Acyclic Graphs

Applications of blockchain and other distributed ledger technology (DLT) such as directed acyclic graphs (DAG) to clinical trials and life sciences research are rapidly emerging.

Distributed ledger technology (DLT) such as blockchain has a myriad of use-cases in life sciences and clinical research.
Distributed ledger technology (DLT) has the potential to solve a myriad of problems that currently plague data collection, management and access processes in clinical and life sciences research, including clinical trials. DLT is an innovative approach to operating in environments where trust and integrity is paramount by paradoxically removing the need for trust in any individual component and providing full transparency as to the micro-environment of the platform operations as a whole.

Currently the two forms of DLT predominating are blockchain and directed acyclic graphs (DAGs). While quite distinct from one another, in theory the two technologies are intended to serve similar purposes, or were developed to address the same goals. In practice, blockchain and DAGs may have optimal use-cases that differ in nature from one another, or be better equipped to serve different goals – the nuance of which to be determined on a case by case basis.

Bitcoin is the first known example of blockchain, however blockchain goes well beyond the realms of bitcoin and cryptocurrency use cases. One of the earliest and currently predominating DAG DLT platforms is IOTA which has proved itself in a plethora of use cases that go well beyond what blockchain could currently achieve, particularly within the realm of the internet of things (IOT). Infact Iota has been operating an industry data marketplace since 2017 which makes it possible to store, sell via micro-tansactions and access data streams via web browser. For the purposes of this article we will focus on DLT applications in general and include use-cases in which blockchain or DAGs can be employed interchangeably. Before we begin, what is Distributed Ledger  technology?

The Iota Tangle has already been implemented in a plethora of use cases that may be beneficially translated to clinical and life sciences research.

Source: iota.org Iota’s Tangle is an example of directed acyclic graph (DAG) digital ledger technology. Iota has been operating an industry data marketplace since 2017.
​DLT is a decentralised digital system which can be used to store data and record transactions in the form of a ledger or smart contract. Smart contracts can be set up to form a pipeline of conditioned (if-then) events, or transactions, much like an escrow in finance, which are shared across nodes on the network. Nodes are used to both store data and process transactions, with multiple (if not all) nodes accommodating each transaction – hence the decentralisation. Transactions themselves are a form of dynamic data, while a data set is an example of static data. Both blockchain and DAGs employ advanced cryptography algorithms which as of today render them un-hackable. This is a huge benefit in the context of sensitive data collection such as patient medical records or confidential study data. It means that data can be kept secure, private, untampered with, and shared efficiently with whomever requires access. Because each interaction or transaction is recorded this enables the integrity of the data to be upheld in what is considered a “trustless” exchange. Because data is shared on multiple nodes for all involved to witness across the network, records become harder to manipulate of change in an underhanded way. This is important in the collection of patient records or experimental data that is destined for statistical analysis. Any alterations to data that are made are recorded across the network for all participants to see, enabling true transparency. All transactions can come in the form of smart contracts which are time stamped and tied to a participant’s identity via the use of digital signatures.

In this sense DLT is able to speed up transactions and processes, while reducing cost, due to the removal of a middle-man or central authority overseeing each transaction, or transfer of information. DLT can be public or private in nature. A private blockchain, for example, does have trusted intermediary who decides who is to have access to the blockchain, who can participate on the network, which data can be viewed by which participants. In the context of clinical and life sciences research this could be a consortium of interested parties, ie the research team, or an industry regulator or governing body. In a private blockchain, the transactions themselves remain decentralised, while the blockchain itself has built in permission layers that allow full or partial visibility of data depending upon the stakeholder. This is necessary in the context of sharing anonymised patient data and blinding in randomised controlled trials.
Blockchain and Hashgraph are two examples of distributed ledger technology (DLT) with applications which could achieve interoperability across healthcare,  medicine, insurance, clinical trials and life sciences research.

Source: Hedera Hashgraph whitepaper. Blockchain and Hashgraph are two examples of distributed ledger technology (DLT).
​Due to the immutable nature of each ledger transaction, or smart contract, stakeholders are unable to alter or delete study data without a consensus over the whole network. In this situation, an additional transaction recorded and time-stamped on the blockchain while the original transaction, that recorded the data to be altered in its original form, remains intact. This property helps to reduce the incidence of human error, such as data entry error, as well as any underhanded alterations with the potential to sway study outcomes.

In a clinical trials context the job of the data monitoring committee, and any other form of auditing  becomes much more straight forward. DLT also allow for complete transparency in all financial transactions associated with the research. Funding bodies can see exactly where all funds are being allocated and at what time points. In-fact every aspect of the research supply-chain, from inventory to event tracking, can be made transparent to the desired entities. Smart contracts operate among participants in the blockchain and also between the trusted intermediary and the DLT developer whose services have been contracted for building the platform framework, such as the private blockchain. The services contracts will need to be negotiated in advance so that the platform is tailored to adequately conform to individualised study needs. Once processes are in place and streamlines the platform can be replicated in comparable future studies.

DLT can address the problem of duplicate records in study data or patient records, make longitudinal data collection more consistent and reliable across multiple life cycles. Many disparate stakeholders, from doctor to insurer or researcher, can share in the same patient data source while maintaining patient privacy and improving data security. Patients can retain access to the data and decide with whom to share it with, which clinical studies to participate in and when to give or withdraw consent.

DLT, such as blockchain or DAGs, can improve collaboration by making the sharing of technical knowledge easier and centralising data or medical records, in the sense that they are located on the same platform as every other transaction taking place. This results in easier shared access by key stakeholders, shortening of negotiation cycles due to improved coordination and making established clinical research processes more consistent and replicable.

From a statisticians perspective, DLT should result in data of higher integrity which yields statistical analysis of greater accuracy and produces research with more reliable results that can be better replicated and validated in future research. Clinical studies will be streamlined due to the removal of much bureaucracy and therefore more time and cost effective to implement as a whole. This is particularly important in a micro-environment with many moving parts and disparate stakeholders such as the clinical trials landscape.


References and further reading:

From Clinical Trials to Highly Trustable Clinical Trials: Blockchain in Clinical Trials, a Game Changer for Improving Transparency?
https://www.frontiersin.org/articles/10.3389/fbloc.2019.00023/full#h4

Clinical Trials of Blockchain
https://www.phusewiki.org/docs/Frankfut%20Connect%202018/TT/Papers/TT18-1-paper-clinical-trials-on-blockhain-v10-19339.pdf

Blockchain technology for improving clinical research quality
https://trialsjournal.biomedcentral.com/articles/10.1186/s13063-017-2035-z

Blockchain to Blockchains in Life Sciences and Health Care
https://www2.deloitte.com/content/dam/Deloitte/us/Documents/life-sciences-health-care/us-lshc-tech-trends2-blockchain.pdf

Simpson’s Paradox: the perils of hidden bias.

How Simpson’s Paradox Confounds Research Findings And Why Knowing Which Groups To Segment By Can Reverse Study Findings By Eliminating Bias.

Introduction
 
The misinterpretation of statistics or even the “mis”-analysis of data can occur for a variety of reasons and to a variety of ends. This article will focus on one such phenomenon
contributing to the drawing of faulty conclusion from data – Simpson’s Paradox.


At times a situation arises where the outcomes of a clinical research study depict the inverse of expected (or essentially correct) outcomes. Depending upon the statistical approach, this could affect means, proportions or relational trends among other statistics.
Some examples of this occurrence are a negative difference when a positive difference was anticipated, a positive trend when a negative one would have been more intuitive – or vice versa. Another example commonly pertains to the cross tabulation of proportions, where condition A is proportionally greater over all, yet when stratified by a third variable, condition B is greater in all cases . All of these examples can be said to be instances of Simpson’s paradox. Essentially Simpson’s paradox represents the possibility of supporting opposing hypotheses – with the same data.
Simpson’s paradox can be said to occur due to the effects of confounding, where a confounding variable is characterised by being related to both the independent variable and the outcome variable, and unevenly distributed across levels of the independent variable. Simpson’s paradox can also occur without confounding in the context of non-collapsability. 
For more information on the nuances of confounding versus non-collapsability in the context of Simpson’s paradox, see here.

 In a sense, Simpson’s paradox is merely an apparent paradox, and can be more accurately described as a form of bias. This bias most often results from a lack of insight into how an unknown lurking variable, so to speak, is impacting upon the relationship between two variables of interest. Simpson’s paradox highlights the fact that taking data at face value and utilising it to inform clinical decision making can often be highly misleading. The chances of Simpson’s paradox (or bias) impacting the statistical analysis can be greatly reduced in many cases by a careful approach that has been informed by proper knowledge of the subject matter. This highlights the benefit of close collaboration between researcher and statistician in informing an optimal statistical methodology that can be adapted on a per case basis.

The following three part series explores hypothetical clinical research scenarios in which Simpson’s paradox can manifest.

Part 1

Simpson’s Paradox in correlation and linear regression

​​

​Scenario and Example

A nutritionist would like to investigate the relationships between diet and negative health outcomes. As higher weight has been previously associated with negative health outcomes, the research sets out to investigate the extent to which increased caloric intake contributes to weight gain. In researching the relationship between calorie intake and weight gain for a particular dietary regime, the nutritionist uncovers a rather unanticipated negative trend. As caloric intake increases the weight of participants appears to go down. The nutritionist therefore starts recommending higher calorie intake as a way to dramatically lose weight. Weight does appear to go down with calorie intake, however if we stratify the data by different age groupings, a positive trend between weight and calorie intake emerges for each age group. While overall elderly have the lowest calorie intake, they also have the highest weight, and teens have the highest calorie intake but the lowest weight, this accounts for the negative trend but does not give an honest picture of the impact of calories on weight. In order to gain an accurate picture of the relationship between weight and calorie intake we have to know which variable to group or stratify the data by, and in this case it’s age. Once the data is stratified by five separate age categories a positive trend between calories and weight emerges in each of the 5 categories. In general, the answer to which variable to stratify by or control for isn’t typically this obvious and in most cases and requires some theoretical background and a thorough examination of the available data including associated variables for which the information is at hand.


Remedy

In the above example, age shows a negative relationship to the independent variable, calories, but a positive relationship to the dependent variable, weight. It is for this reason that a bit of data exploration and assumption checking before any hypothesis testing is so essential. Even with these practices in place it is possible to overlook the source of confounding and caution is always encouraged.
 
Randomisation and Stratification:
In the context of a randomised controlled trial (RTC), the data should be randomly assigned to treatment groups as well as stratified by any pertinent demographic and other factors so that these are evenly distributed across treatment arms (levels of the independent variable). This approach can help to minimise, although not eliminate the chances of bias occurring in any such statistical context, predictive modelling or otherwise.

Linear Structural Equation Modelling:
 If the data at hand is not randomised but observational, a different approach should be taken to detect causal effects in light of potential confounding or non-collapsability. One such approach is linear structural equation modelling where each variable is generated as a linear function of it’s parents, using a directed acyclic graph (DAG) with weighted edges. This is a more sophisticated and ideal approach to simply adjusting for x number of variables, which is needed in the absence of a randomisation protocol.

Heirachical regression:
This example illustrated an apparent negative trend of the overall data masking a positive trend In each individual subgroup, in practice, the reverse can also occur.
In order to avoid drawing misguided conclusion from the data the correct statistical approach must be entertained, a hierarchical regression controlling for a number of potential confounding factors could avoid drawing wrong conclusion due to Simpson’s paradox.

 

Article: Sarah Seppelt Baker


Reference:
The Simpson’s paradox unraveled, Hernan, M, Clayton, D, Keiding, N., International Journal of Epidemiology, 2011.

Part 2

Simpson’s Paradox in 2 x 2 tables and proportions


​Scenario and Example

Simpson’s paradox can manifest itself in the analysis of proportional data and two by two tables. In the following example two pharmaceutical cancer treatments are compared by a drug company utilising a randomised controlled clinical trial design. The company wants to test how the new drug (A) compares to the standard drug (B) already widely in clinical use.  1000 patients were randomly allocated to each group. A chi squared test of remission rates between the two drug treatments is highly statistically significant, indicating that the new drug A is the more effective choice. At first glance this seems reasonable, the sample size is fairly large and equal number of patients have been allocated to each groups.
Drug  Treatment
A
B
Remisson Yes
798 (79.8%)
705 (70.5%)
Remission No
202
295
Total sample size
1000
1000
The chi-square statistic for the difference in remission rates between treatment groups is 23.1569. The p-value is < .00001. The result is significant at p < .05.


When we take a closer look, the picture changes. It turns out the clinical trial team forgot to take into account the patients stage of disease progression at the commencement of treatment. The table below shown that drug A was allocated to far more patients with stage II cancer (79.2%) and drug B was allocated to far more patients with stage IV cancer (79.8%). 

Stage II
Stage IV
Drug Treatment
A
B
A
B
Remission Yes
697 (87.1%)
195 (92.9%)
101 (50.5%)
510 (64.6%)
Remission No
103
15
99
280
Total sample size
800
210
200
790
The chi-square statistic for the difference in remission rates between treatment groups for patients with stage II disease progression at treatment outset is 5.2969. The p-value is .021364. The result is significant at p < .05.


The chi-square statistic for the difference in remission rates between treatment groups for patients with stage IV disease progression at treatment outset is 13.3473. The p-value is .000259. The result is significant at p < .05.

Unfortunately the analysis of tabulated data is no less prone to bias in results akin to Simpson’s Paradox than continuous data. Given that stage II cancer is easier to treat than stage IV, this has given drug A an unfair advantage and has naturally lead to a higher remission rate overall for drug A. When the treatment groups are divided by disease progression categories and reanalysed, we can see that remission rates are higher for drug B in both stage II and stage IV baseline disease progression. The resulting chi squared statistics are wildly different to the first and statistically significant in the opposite direction to the first analysis.  In causal terms, stage of disease progression affects difficulty of treatment and likelihood of remission. Patients at a more advanced stage of disease, ie stage IV, will be harder to treat than patients at stage II. In order for a fair comparison between two treatments, patients stage of disease progression needs to be taken into account. In addition to this some drugs may be more efficacious at one stage or the other, independent of the overall probabilities of achieving remission at either stage. 

Remedy

Randomisation and Stratification:
Of course in this scenario, stage of disease progression is not the only variable that needs to be accounted for in order to insure against biased results. Demographic variables such as age, sex socio-economic status and geographic location are some examples of variables that should be controlled for in any similar analysis. As with the scenario in part 1, this can be achieved is through stratified random allocation of patients to treatment groups at the outset of the study. Using a randomised controlled trial design where subjects are randomly allocated to each treatment group as well as stratified by pertinent demographic and diagnostic variables will reduce the chances of inaccurate study results occurring due to bias.

Further examples of Simpson’s Paradox in 2 x 2 tables

Simpson’s paradox in case control and cohort studies

Case control and cohort studies also involve analyses which rely on the 2×2 table. The calculation of their corresponding measures of association the odds ratio and relative risk, respectively, is unsurprisingly not immune to the effect of bias and in much the same way as the chi square example above. This time, a reversed odds ratio or relative risk in the opposite direction can occur if the pertinent bias has not been accounted and controlled for.

Simpson’s paradox in meta-analysis of case control studies

Following on from the example above, this form of bias can pose further problems in the context of meta-analysis. When combining results from numerous case control studies the confounders in question may or may not have been identified or controlled for consistently across all studies and some studies will likely have identified different confounders to the same variable of interest. The odds ratios produced by the different studies can therefore be incompatible and lead to erroneous conclusions. Meta-analysis can therefore fall prey to ecological fallacy as a result of systematic bias, where the odds ratio for the combined studies is in the opposite direction to the odds ratios of the separate studies. Imbalance in treatment arm size has also been found to act as a confounder in the context of meta-analysis of randomised controlled trials. Other methodological differences between studies may also be at play, such as differences in follow-up times between studies or a very low proportion of observed events occurring in some studies, potentially due to a shorted follow-up time.

That’s not to say that meta-analysis cannot be performed on these studies, inter-study variation is of-course more common than not, as with all other analytical contexts it is necessary to proceed with a high level of caution and attention to detail. On the whole an approach of simply pooling study results is not reliable, the use of more sophisticated meta-analytic techniques, such as random effects models or Bayesian random effects models that use a Markov chain algorithm for estimating the posterior distributions, are required to mitigate inherent limitations of the meta-analytic approach. Random-effects models assume the presence of study-specific variance which is a latent variable to be partitioned. Bayesian random-effects models can come in parametric, non-parametric or semi-parametric varieties, referring to the shape of the distributions of study-specific effects.
​​
For more information on Simpson’s paradox in meta-analysis, see here.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-8-34

For more information on how to minimise bias in meta-analysis, see here.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3868184/

https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.4780110202

Part 3

Simpson’s Paradox & Cox Proportional Hazard Models

Time to event data is common in clinical science and epidemiology, particularly in the context of survival analysis. Unfortunately the calculation of hazard rate in survival analysis is not immune to Simpson’s Paradox as the mathematics behind Simpson’s paradox is essentially the mathematics of conditional probability. In-fact Simpson’s paradox in this context has the interesting characteristic  of holding for some intervals of the time variable (failure time T) but not others. In this case Simpson’s paradox would be observed across the effect of variable Y on the relationship between variable X and time interval T. The proportional hazards model can be seen as an extension of 2 by 2 tables, given that the type of data is similar is used, the difference being that time is typically is as much an outcome of interest in relation to some factor Y. In this context Y could be said to be a covariate to X.

Scenario and example
 
A 2017 paper describes a scenario whereby the death rate due to tuberculosis was lower in Richmond than New York for both African-Americans and for Caucasian-Americans, yet lower in New York than Richmond when the two ethnic groups were combined.
For more details on this example as well as the mathematics behind it see here.
For more examples of Simpson’s paradox in Cox regression see here.


Site specific bias

Factors contributing to bias in survival models can be different to those in more straightforward contexts. Many clinical and epidemiological studies include data from multiple sites. More often than not there is heterogeneity across sites. This heterogeneity can come is various forms and can result in within and between–site clustering, or correlation, of observations on site specific variables. This clustering, if not controlled for, can lead to Simpson’s paradox in the form of hazard rate reversal, across some or all of time T, and has been found to be a common explanation of the phenomenon in this context. Site clustering can occur on the patient level, for example, due to site specific selection procedures for the recruitment of patients (lead by the principal investigators individual to each site), or differences in site specific treatment protocols. Site specific differences can occur intra or internationally and in the international case can be due, for example, to differences in national treatment guidelines or differences in drug availability between countries. Resource availability can also differ between sites whether intra or internationally. In any time to event analysis involving multiple sites (such as the Cox regression model) a site-level effect should be taken into account and controlled for in order to avoid bias-related inferential errors.
 


Remedy
 

Cox regression Model including site as a fixed covariate:
Site should be included as a covariate in order to account for site specific dependence of observations.

Cox regression Model treating site as a stratification variable:
In cases where one or more covariates violate the Proportional Hazards (PH) assumption as indicated by a lack of independence of scaled Schonefeld residuals to time, stratification may be more appropriate. Another option in this case is to add a time-varying covariate to the model. The choice made in this regard will depend on the sampling nuances of each particular study.

Cox shared frailty model:
In specific conditions the Cox shared frailty model may be more appropriate. This approach involves treating subjects from the same site as having the same frailty and requires that each subjects is not clustered across more than one level two unit. While it is not appropriate for multi-membership multi-level data, it can be useful for more straight forward scenarios.

In tailoring the approach to the specifics of the data, appropriate model adjustments should produce hazard ratios that more accurately estimate the true risk.