P Values, Confidence Intervals and Clinical Trials

P values are so ubiquitous in clinical research that it’s easy to take for granted that they are being understood and interpreted correctly. After-all, one might say, they are just simple proportions and it’s not brain surgery. At times, however, its’ the simplest of things that are easiest to overlook. In fact, the definitions and interpretations of p values are sufficiently subtle that even a minute pivot from an exact definition can lead to interpretations that are wildly misleading.

In the case of clinical trials, p values have a momentous impact on decision making in terms of whether or not to pursue and invest further into the development and marketing of a given therapeutic. In the context of clinical practice p values drive treatment decisions for patients as they essentially comprise the foundational evidence upon which these treatment decisions are made. This is perhaps as it should be, as long as the definition of p values and their interpretations are sound.

A counter-point to this is the bias towards publishing only studies with a statistically significant p value, as well as the fact that many studies are not sufficiently reproducible or reproduced. This leaves clinicians with an impression that evidence for a given treatment is stronger than the full picture would suggest. This however is a publishing issue rather than an issue of significance tests themselves. This article focusses on interpretation issues only.

As p values apply to the interpretation of both parametric and non-parametric tests in much the same way, this article will focus on parametric examples.

Interpreting p values in superiority/difference study designs

This refers to studies where we are seeking to find a difference between two treatment groups or between a single group measured at two time points. In this case the null hypothesis is that there is no difference between the two treatment groups or no effect of the treatment, as the case may be.

According to the significance testing framework all p values are calculated based upon an assumption that the null hypothesis is true. If a study yields a p value of 0.05, this means that we would expect to see a difference between the two groups at least as extreme as the observed effect 5% of the time; if the study were to be repeated. In other words, if there is no true difference between the two treatment groups and we ran the experiment 20 times on 20 independent samples from the same population, we would expect to see a result this extreme once out of the 20 times.

This of course is not a very helpful way of looking at things if our goal is to make a statement about treatment effectiveness. The inverse likely makes more intuitive sense: if were were to run this study 20 times on distinct patient samples from the same population, 19 out of 20 times we would not expect a result this extreme if there was no true effect. Based on the rarity of the observed effect, we conclude that likelihood of the null hypothesis being the optimal explanation of the data is sufficiently low that we can reject it. Thus our alternative research hypothesis, that there is a difference between the two treatments, is likely to be true. As the p value does not tell us whether the difference is a positive or negative direction, care should of course be taken to confirm which of the treatments holds the advantage.

P values in non-inferiority or equivalence studies.

In non-inferiority and equivalence studies a non-statistically significant p value can be a welcome result, as can a very low p value where the differences were not clinically significant, or where the new treatment is shown to be superior to the standard treatment. By only requiring the treatment not to be inferior, more power is retained and a smaller sample size can be used.

The interpretation of the p value is much the same as for superiority studies, however the implications are different. In these types of studies it is ideal for the confidence intervals for the individual treatment effects to be narrow as this provides certainty that the estimates obtained are accurate in the absence of a statistically significant difference between the two estimates.

While alternatives to p values exist, such as Bayesian statistics, these statistics have limitations of their own and are subject to the same propensity for misuse and misinterpretation as frequentist statistics are. Thus it remains important to take caution in interpreting all statistical results.

What p values do not tell you

A p value of 0.05 is not the same as saying that there is only a 5% chance that the treatment wont work. Whether or not the treatment works in the individual is another probability entirely. It is also not the same as saying there is a 5% chance of the null hypothesis being true. The p value is a statistic that is based on the assumption that the null hypothesis is true and on that basis gives the likelihood of the observed result.

Nor does the p value represent the chance of making a type 1 error. As each repetition of the same experiment produces a different p value, it does not make sense to characterise the p value as the chance of incorrectly rejecting the null hypothesis ie making a type one error. Instead, an alpha cut-off point of 0.05 should be seen as indicating a result rare enough under the null hypothesis that we are now willing to reject the null as the most likely explanation given the data. Under a type-one error alpha of 0.05 this decision is expected to be wrong 5% of the time, regardless of the p value achieved in the statistical test. The relationship between the critical alpha and statistical power is illustrated below.

Another misconception is that a small p value provides strong support for a given research hypothesis. In reality a small p value does not necessarily translate to a big effect, nor a clinically meaningful one. The p value indicates a statistically significant result, however it says nothing about the magnitude of the effect or whether this result is clinically meaningful in the context of the study. A p value of 0.00001 may appear to be a very satisfactory result, however if the difference observed between the two groups is very small then this is not always the case. All it would be saying is that “we are really really sure that there is only minimal difference between the two treatments”, which in a superiority design may not be as desired.

Minimally important difference (MID)

This is where the importance of pre-defining a minimally important difference (MID) becomes evident. The MID, or clinically meaningful difference. should be defined and quantified in the design stage before the study is to be undertaken. In the case of clinical studies this should generally be done in consultation with the clinician or disease expert concerned.

The MID may take different forms depending on whether a study is a superiority design, versus an equivalence or non-inferiority design. In the case of a superiority design or where the goal of the study is to detect a difference, the MID is the threshold of minimum difference at which we would be willing to consider the new treatment worth pursuing over the standard treatment or control being used as the comparator. In the case of a non-inferiority design the MID would be the minimum lower threshold at which we would still consider the new treatment as equally effective or useful as the standard treatment. Equivalence design on the other hand may sometimes rely on an interval around the standard treatment effect.

When interpreting results of clinical studies it is of primary importance to keep a clinically meaningful difference in mind, rather than defaulting to the p value in isolation. In cases where the p value is statistically significant, it is important to ask whether the difference between comparison groups is also as large as the MID or larger.

Confidence Intervals

All statistical tests that involve p values can produce a corresponding confidence interval for the estimates. Unlike p values, confidence intervals do not rely on an assumption of the null hypothesis but rather on the assumption that the sample approximates the population of interest. A common estimate in clinical trials where confidence intervals become important is the treatment effect. Very often this translates to the difference in means of a surrogate endpoint between two groups, however confidence intervals are also important to consider for individual group means/ treatment effects, which are an estimate of the population means of the endpoint in these distinct groups/treatment categories.

Confidence interval for the mean

A 95% confidence interval of the estimate of the mean indicates that, if this study were to be repeated, the mean value is expected to fall within this interval 95% of the time. While this estimate is based on the real mean of the study sample our interest remains in making inferences about the wider population who might later be subject to this treatment. Thus inferentially the observed mean and it’s confidence interval are both considered an estimate of the population values.

In a nutshell the confidence interval indicates how sure we can be of the accuracy of the estimate. A narrower interval indicates greater confidence and a wider interval less. The p value of the estimate indicates how certain we can be of this result, ie the interval itself.

Confidence interval for the mean difference, treatment effects or difference in treatment effects

The mean difference in treatment effect between two groups is an important estimate in any comparison study, from superiority to non-inferiority clinical trial designs. Treatment response is mainly ascertained from repeated measures of surrogate endpoint data on the individual level. One form of mean difference is repeated measures data from the same individuals at different time points, these individuals’ differences could then be compared between two independent treatment groups. In the context of clinical trials, confidence intervals of the mean difference can relate to an individual’s treatment effect or to group differences in treatment effects.

A 95% Confidence interval of the mean difference in treatment effect indicates that 95 per cent of the time, if this study were to be repeated, the true difference in treatment effect between the groups is expected to fall within this interval. A confidence interval containing zero indicates that a statistically significant difference between the two groups has not been found. Namely, if part of the time the true population value representing the difference is expected to fall above zero on the number line and part of the time to fall below zero, indicating a difference in the opposite direction, we cannot be sure whether one group is higher or lower than the other.

Much ho-hum has been made of p values in recent years but they are here to stay. While alternatives to p values exist, such as Bayesian methods, these statistics have limitations of their own and are subject to the same propensity for misuse and misinterpretation as frequentist statistics are. Thus it remains important to take caution in interpreting all statistical results.

Sources and further reading:

Gao, P-Values – A chronic conundrum, BMC Medical Research Methodology (2020), 20:167
https://doi.org/10.1186/s12874-020-01051-6

The Royal College of Ophthalmologists, The clinician’s guide to p values, confidence intervals, and magnitude of effects, Eye (2022) 36:341–342; https://doi.org/10.1038/s41433-021-01863-w

The role of Biostatisticians, Bioinformaticians & other Data Experts in Clinical Research

As a medical researcher or a small enterprise in the life sciences industry, you are likely to encounter many experts using statistical and computational techniques to study biological, clinical and other health data. These experts can come from a variety of fields such as biostatistics, bioinformatics, biometrics, clinical data science and epidemiology. Although these fields do overlap in certain ways they differ in purpose, focus, and application. All four areas listed above focus on analysing and interpreting either biological, clinical data or public health data but they typically do so in different ways and with different goals in mind. Understanding these differences can help you choose the most appropriate specialists for your research project and get the most out of their expertise. This article will begin with a brief description of these disciplines for the sake of disambiguation, then focus on biostatistics and bioinformatics, with a particular overview of the roles of biostatisticians and bioinformatics scientists in clinical trials.

Biostatisticians

Biostatisticians use advanced biostatistical methods to design and analyse pre-clinical experiments, clinical trials, and observational studies predominantly in the medical and health sciences. They can also work in ecological or biological fields which will not be the focus of this article. Biostatisticians tend to work on varied data sets, including a combination of medical, public health and genetic data in the context of clinical studies. Biostatisticians are involved in every stage of a research project, from planning and designing the study, to collecting and analysing the data, to interpreting and communicating the results. They may also be involved in developing new statistical methods and software tools. In the UK the term “medical statistician” has been in common use over the past 40 years to describe a biostatistician, particularly one working in clinical trials, but it is becoming less used due to the global nature of the life sciences industry.

Bioinformaticians

Bioinformaticians use computational and statistical techniques to analyse and interpret large datasets in the life sciences. They often work with multi-omics data such as genomics, proteomics transcriptomics data and use tools such as large databases, algorithms, and specialised software programs to analyse and make sense of sequencing and other data. Bioinformaticians develop analysis pipelines and fine-tune methods and tools for analysing biological data to fit the evolving needs of researchers.

Clinical data scientists

Data scientists use statistical and computational modelling methods to make predictions and extract insights from a wide range of data. Often, data is real-world big data of which it might not be practical to analyse using other methods. In a clinical development context data sources could include medical records, epidemiological or public health data, prior clinical study data, or IOT and IOB sensor data. Data scientists may combine data from multiple sources and types. Using analysis pipelines, machine learning techniques, neural networks, and decision tree analysis this data can be made sense of. The better the quality of the input data the more precise and accurate any predictive algorithms can be.

Statistical programmers

Statistical programmers help statisticians to efficiently clean and prepare data sets and mock TFLs in preparation for analysis. They set up SDTM and ADaM data structures in preparation for clinical studies. Quality control of data and advanced macros for database management are also key skills.

Biometricians

Biometricians use statistical methods to analyse data related to the characteristics of living organisms. They may work on topics such as growth patterns, reproductive success, or the genetic basis of traits. Biometricians may also be involved in developing new statistical methods for analysing data in these areas. Some use the terms biostatistician and biometrician interchangeably however for the purpose of this article they remain distinct.

Epidemiologists

Epidemiologists study the distribution and determinants of diseases in populations. Using descriptive, analytical, or experimental techniques, such as cohort or case-control studies, they identify risk factors for diseases, evaluate the effectiveness of public health interventions, as well as track or model the spread of infectious diseases. Epidemiologists use data from laboratory testing, field studies, and publicly available health data. They can be involved in developing new public health policies and interventions to prevent or control the spread of diseases.

Clinical trials and the role of data experts

Clinical trials involve testing new treatments, interventions, or diagnostic tests in humans. These studies are an important step in the process of developing new medical therapies and understanding the effectiveness and safety of existing treatments.

Biostatisticians are crucial to the proper design and analysis of clinical trials. So that optimal study design can take place, they may first have to conduct extensive meta-analysis of previous clinical studies and RWE generation based on available real-world data sets or R&D results. They may also be responsible for managing the data and ensuring its quality, as well as interpreting and communicating the results of the trial. From developing the statistical analysis plan and contributing to the study protocol, to final analysis and reporting, biostatisticians have a role to play across the project time-line.

During a clinical trial, statistical programmers may prepare data sets to CDISC standards and pre-specified study requirements, maintain the database, as well as develop and implement standard SAS code and algorithms used to describe and analyse the study data.

Bioinformaticians may be involved in the design and analysis stages of clinical trials, particularly if the trial design involves the use of large data sets such as sequencing data for multi-omics analysis. They may be responsible for managing and analysing this data, as well as developing software tools and algorithms to support the analysis.

Data scientists may be involved in designing and analysing clinical trials at the planning stage, as well as in developing new tools and methods. The knowledge gleaned from data science models can be used to improve decision-making across various contexts, including life sciences R&D and clinical trials. Some applications include optimising the patient populations used in clinical trials; feasibility analysis using simulation of site performance, region, recruitment and other variables, to evaluate the impacts of different scenarios on project cost and timeline.

Biometricians and epidemiologists may also contribute to clinical trials, particularly if the trial is focused on a specific population or on understanding the factors that influence the incidence or severity of a disease. They may contribute to the design of the study, collecting and analysing the data, or interpreting the results.

Overall, the role of these experts in clinical trials is to use their varied expertise in statistical analysis, data management, and research design to help understand the safety and effectiveness of new treatments and interventions.

The role of biostatistician in clinical trials

Biostatisticians may be responsible for developing the study protocol, determining the sample size, producing the randomisation schedule, and selecting the appropriate statistical methods for analysing the data. They may also be responsible for managing the data and ensuring its quality, as well as interpreting and communicating the results of the trial.

SDTM data preparation

The Study Data Tabulation Model (SDTM) is a data standard that is used to structure and organize clinical study data in a standardized way. Depending on how a CRO is structured, either biostatisticians, statistical programmers, or both will be involved in mapping the data collected in a clinical trial to the SDTM data set, which involves defining the structure and format of the data and ensuring that it is consistent with the standard. This helps to ensure that the data is organised in a way that is universally interpretable. This process involves working with the research team to ensure the appropriate variables and categories are defined before reviewing and verifying the data to ensure that it is accurate, complete and in line with industry standards. Typically the SDTM data set will be established early at the protocol phase and populated later once trial data is accumulated.

Creating and analysing the ADaM dataset

In clinical trials, the Analysis Data Model (ADaM) is a data set model used to structure and organize clinical trial data in a standardized way for the purpose of statistical analysis. ADaM data sets are used to store the data that will be analysed as part of the clinical trial, and are typically created from the Study Data Tabulation Model (SDTM) data sets, which contain the raw data collected during the trial. This helps to ensure the reliability and integrity of the data, and makes it easier to analyse and interpret the results of the trial.

Biostatisticians and statistical programmers are responsible for developing ADaM data sets from the SDTM data, which involves selecting the relevant variables and organizing them in a way that is appropriate for the particular statistical analyses that will be conducted. While statistical programmers may create derived variables, produce summary statistics, TFLs, and organise the data into appropriate datasets and domains, biostatisticians are responsible for conducting detailed statistical analyses of the data and interpreting the results. This may include tasks such as testing hypotheses, identifying patterns and trends in the data, and developing statistical models to understand the relationships between the data and the research questions the trial seeks to answer.

The role of biostatisticians, specifically, in developing ADaM data sets from SDTM data is to use their expertise in statistical analysis and research design to guide statistical programmers in ensuring that the data is organised, structured, and formatted in a way that is appropriate for the analyses that will be conducted, and to help understand and interpret the results of the trial.

A Biostatistician’s role in study design & planning

Biostatisticians play a critical role in the design, analysis, and interpretation of clinical trials. The role of the biostatistician in a clinical trial is to use their expertise in statistical analysis and research design to help ensure that the trial is conducted in a scientifically rigorous and unbiased way, and to help understand and interpret the results of the trial. Here is a general overview of the tasks that a biostatistician might be involved in during the different stages of a clinical trial:

Clinical trial design: Biostatisticians may be involved in designing the clinical trial, including determining the study objectives, selecting the appropriate study population, and developing the study protocol. They are responsible for determining the sample size and selecting the appropriate statistical methods for analysing the data. Often in order to carry out these tasks, preparatory analysis will be necessary in the form of detailed meta-analysis or systematic review.

Sample size calculation: Biostatisticians are responsible for determining the required sample size for the clinical trial. This is an important step, as the sample size needs to be large enough to detect a statistically significant difference between the treatment and control groups, but not so large that the trial becomes unnecessarily expensive or time-consuming. Biostatisticians use statistical algorithms to determine the sample size based on the expected effect size, the desired level of precision, and the expected variability of the data. This information is informed by expert opinion and simulation of the data from previous comparable studies.

Randomisation schedules: Biostatisticians develop the randomisation schedule for the clinical trial, which is a plan for assigning subjects to the treatment and control groups in a random and unbiased way. This helps to ensure that the treatment and control groups are similar in terms of their characteristics, which helps to reduce bias or control for confounding factors that might affect the results of the trial.

Protocol development: Biostatisticians are involved in developing the statistical and methodological sections of the clinical trial protocol, which is a detailed plan that outlines the objectives, methods, and procedures of the study. In addition to outlining key research questions and operational procedures the protocol should include information on the study population, the interventions being tested, the outcome measures, and the data collection and analysis methods.

Data analysis: Biostatisticians are responsible for analysing the data from the clinical trial, including conducting interim analyses and making any necessary adjustments to the protocol. They play a crucial role in interpreting the results of the analysis and communicating the findings to the research team and other stakeholders.

Final analysis and reporting: Biostatisticians are responsible for conducting the final analysis of the data and preparing the final report of the clinical trial. This includes summarising the results, discussing the implications of the findings, and making recommendations for future research.

The role of bioinformatician in biomarker-guided clinical studies.

Biomarkers are biological characteristics that can be measured and used to predict the likelihood of a particular outcome, such as the response to a particular treatment. Biomarker-guided clinical trials use biomarkers as a key aspect of the study design and analysis. In biomarker-guided clinical trials where the biomarker is based on genomic sequence data, bioinformaticians may play a particularly important role in managing and analysing the data. Genomic and other omics data is often large and complex, and requires specialised software tools and algorithms to analyse and interpret. Bioinformaticians develop and implement these tools and algorithms, as well as for managing and analysing the data to identify patterns and relationships relevant to the trial. Bioinformaticians use their expertise in computational biology to to help understand the relationship between multi-omics data and the outcome of the trial, and to identify potential biomarkers that can be used to guide treatment decisions.

Processing sequencing data is a key skill of bioinformaticians that involves several steps, which may vary depending on the specific goals of the analysis and the type of data being processed. Here is a general overview of the steps that a bioinformatician might take to process sequencing data:

  1. Data pre-processing: Cleaning and formatting the data so that it is ready for analysis. This may include filtering out low-quality data, correcting errors, and standardizing the format of the data.
  2. Mapping: Aligning the sequenced reads to a reference genome or transcriptome in order to determine their genomic location. This can be done using specialized software tools such as Bowtie or BWA.
  3. Quality control: Checking the quality of the data and the alignment, and identifying and correcting any problems that may have occurred during the sequencing or mapping process. This may involve identifying and removing duplicate reads, or identifying and correcting errors in the data.
  4. Data analysis: Using statistical and computational techniques to identify patterns and relationships in the data such as identifying genetic variants, analysing gene expression levels, or identifying pathways or networks that are relevant to the study.
  5. Data visualization: Creating graphs, plots, and other visualizations to help understand and communicate the results of the analysis.

Once omics data has been analysed, the insights obtained can be used for tailoring therapeutic products to patient populations in a personalised medicine approach.

A changing role of data experts in life sciences R&D and clinical research

Due to the need for better therapies and health solutions, researchers are currently defining diseases at more granular levels using multi-omics insights from DNA sequencing data which allows differentiation between patients in the biomolecular presentation of their disease, demographic factors, and their response to treatment. As more and more of the resulting therapies reach the market the health care industry will need to catch up in order to provide these new treatment options to patients.

Even after a product receives regulatory approval, payers can opt not to reimburse patients, so financial benefit should be demonstrated in advance where possible. Patient reported outcomes and other health outcomes are becoming important sources of data to consider in evidence generation. Evidence provided to payers should aim to demonstrate financial as well as clinical benefit of the product.

In this context, regulators are becoming aware of the need for innovation in developing new ways of collecting treatment efficacy and other data used to assess novel products for regulatory approval. The value of observational studies and real-world-data sources as a supplement clinical trial data is being acknowledged as a legitimate and sometimes necessary part of the product approval process. Large scale digitisation now makes it easier to collect patient-centric data directly from clinical trial participants and users via devices and apps. Establishing clear evidence expectations from regulatory agencies then Collaborating with external stakeholders, data product experts, and service-providers to help build new evidence-building approaches.

Expert data governance and quality control is crucial to the success of any new methods to be implemented analytically. Data from different sources, such as IOT sensor data, electronic health records, sequencing data for multi-omics analysis, and other large data sets, has to be combined cautiously and with robust expert standards in place.

From biostatistics, bioinformatics, data science, CAS, and epidemiology for public heath or post-market modelling; a bespoke team of integrated data and analytics specialists is now as important to a product development project as the product itself to gaining competitiveness and therefore success in the marketplace. Such a team should be applying a combination of established data collection methodologies eg. clinical trials and systematic review, and innovative methods such as machine learning models that draw upon a variety of real world data sources to find a balance between advancing important innovation and mitigating risk.

Medical Device Categorisation, Classification and Regulation in the United Kingdom

Contributor: Sana Shaikh

In this article

  • Overview of medical device categorisations and classifications for regulatory purposes in the United Kingdom
  • Summary of medical devices categorisations based on type, usage and risk potential during use as specified in the MDR and IVDR.
  • The class of medical device and its purpose determines the criteria required to meet regulatory approval. All medical devices in the UK must have a UKCA or CE marking depending on the legislation the device has been certified under.
  • Explanation of risk classifications for general medical devices and active implantable devices
  • Explanation of risk classifications for in vitro diagnostics

In the UK and EU medical devices are regulated under the Medical Devices Regulation (MDR) or In Vitro Diagnostics Regulation (IVDR) depending upon which category they fall under. In the UK it is the Medicines and Healthcare Products Regulatory Agency (MHRA) that is responsible for new product approval and market surveillance activities related to medical devices and other therapeutics, such as pharmaceuticals, intended for use in patients within the UK. The equivalent regulatory agency in the EU is the European Regulatory Agency (EMA). The MHRA also manages the Early Access to Medicines Scheme (EAMS) to enable patients access to pre-market therapeutics that are yet to receive regulatory approval where their medical needs are currently unmet by existing options. To qualify for EAMS a medicine must be designated as a Promising Innovative Medicine (PIM) based on early clinical data.

Having a thorough understanding of the classification and class of your medical device is vital for it to undergo the appropriate assessment route and be approved and ready for market. While the scope of medical devices is incredibly broad, for regulatory purposes they tend to be classified based on device type, duration of use and level of risk. Which risk class a device falls into will be determined in a large part by device type and duration of use, as both of these factors influence the level of risk to the patient. All medical devices in the UK must be designated a category and a risk classification in order to undertake the regulatory approval process.

Category (type) of Medical Device

The MHRA categorises medical devices into the following 5 categories:

  • Non-invasive – Devices which do not enter the body
  • Invasive – Devices which in whole or part are inserted into the body’s orifices (including the external eyeball surface) or through the surface or the body such as the skin.
  • Surgically invasive – Devices used or inserted surgically that penetrate the body through the surface of the body, such as through the skin.
  • Active – Devices requiring an external source of power, including stand-alone software.
  • Implantable – Devices intended to be totally or partially introduced into the human body (including to replace an epithelial surface or the surface of the eye) by surgical intervention and to remain in place for a period of time.

Duration of use category

Medical devices are then further categorised based upon their intended duration of use under normal circumstances.

  • Transient – intended for less than 60 minutes of continuous use.
  • Short term – intended for between 60 minutes to 30 days of continuous use.
  • Long term – intended for more than 30 days continuous use.

More information to aid accurate medical device categorisation in the UK and EU can be downloaded here: Medical devices: how to comply with the legal requirements in Great Britain – GOV.UK (www.gov.uk)

UKCA Mark & Conformity Assessment

Further to these use, duration and risk categories the HPRA designates 3 additional categories for the purposes of UKCA Mark and conformity assessment. These categories for the are:

  • General medical devices – most medical devices fall into this category.
  • Active implantable devices – devices powered by implants or partial implants intended to remain in the human body after a procedure.
  • In vitro diagnostics medical devices (IVDs) – equipment or system used in vitro to examine specimens from the human body.

UKCA mark and conformity assessment and subsequent labelling is a crucial procedure for a device to enter the UK market for use by patients. It should be noted that the UKCA mark is not recognised in the EU or Northern Ireland, who instead recognise the CE mark. Great Britain will not recognise the CE mark after 30 June 2023, thus it will be important to have both the UKCA and CE mark for widespread distribution of a medical device. These incompatibilities seem to have arisen largely as a result of Brexit.

Risk classification categories for general medical devices and active implantable devices

In The UK and EU there are 4 official risk-related classes for medical devices. These classes apply to both general medical devices and active implantable devices. As noted previously, the class a device falls into is largely informed by the category and the intended duration of use for the device.

  • Class I , which includes the subclasses Class Is (sterile no measuring function), Class Im (measuring function), and Class Ir (devices to be reprocessed or reused). Low risk of illness/injury resulting from use. Only self-assessment required to meet regulatory approval.
  • Class IIa Low to medium risk of illness/injury resulting from use. Notified Body approval required.
  • Class IIb Medium to high risk of illness/injury resulting from use. Notified Body approval required.
  • Class III high potential risk of illness/injury resulting from use. Notified Body approval required.

More details on these classes can be found below.

In Vitro Diagnostic Medical Devices (IVDs)

The IVDR categorise IVDs in to the following categories for the purpose of obtaining regulatory approval in Great Britain. IVDs do not harm patients directly in the same way that other medical devices can and are thus subject to different risk assessment.

  • General IVD medical devices
  • IVDs for self-testing – intended to be using by an individual at home.
  • IVDs stated in Part IV of the UK MDR 2002, Annex II List B
  • IVDs stated in Part IV of the UK MDR 2002, Annex II List A

A more detailed explanation of these categories can be found towards the end of this article.

The EU and Northern Ireland has moved away from this list style of classification and has recently implemented the following risk classes. There are 4 IVDR risk classes outlined in Annex VIII. It seems likely that Great Britain may follow this in future.

Risk Classes for IVDs

  • Class A – Laboratory devices, instruments and receptacles.
  • Class B – All devices not covered in the other classes.
  • Class C – High risk devices presenting a lower direct risk to the patient population. Includes diagnostic devices where failure to accurately diagnose could be life-threatening. Covers companion diagnostics, genetic screening and some self-testing.
  • Class D – Devices that pose a high direct risk to the patient population, and in some cases the wider population, relating to life threatening conditions, transmissible agents in blood, biological materials for transplantation in to the human body and other similar materials.

Risk categories for general medical devices and active implantable medical devices in detail

Class I devices

These are generally regarded as low risk devices and pose little risk of illness and injury. Such devices have minimal contact with patients and the lowest impact on patient health outcomes. To self-certify your product, you must confirm that it is a class I device1,3. This may involve carrying out clinical evaluations, notifying the Medicines and Healthcare products Regulatory Agency (MHRA) of proposals to perform clinical investigations, preparing technical documentation and drawing up a declaration of conformity1. In cases where the device includes sterile products or measuring functions, approval from a UK Approved Body may still be necessary3. Devices in this category include thermometers, stethoscopes, bandages and surgical masks.

Class IIa & IIb devices

Class IIa devices are generally regarded as medium risk devices and pose moderate risk of illness and injury. Both class IIa and IIb devices must be declared as such by applying to a UK Approved Body and performing a conformity assessment3, 4. For class IIa and IIb devices, there are several assessments. These include examining and testing the product or a homogenous batch of products, auditing the production quality assurance system, auditing the final inspection and testing or auditing the full quality assurance system3. include dental fillings, surgical clamps and tracheotomy tubes4 Class IIb devices include lung ventilators and bone fixation plates4.

Class III devices

These are considered high risk devices and pose substantial risk of illness and injury. Devices in this category are essential for sustaining human life and Due to the high-risk associated with class III devices, they are subject to the strictest regulations. In addition to the class IIa and IIb assessments, class III devices require a design dossier examination3. include pacemakers, ventilators, drug-coated stents and spinal disc cages.

Risk Categories for In Vitro Diagnostics in detail

These include but are not limited to reagents, instruments, software and systems intended for in vitro examination of specimens such as tissue donations and blood4. Most IVDs do not require intervention from a UK Approved Body5. However, for IVDs that are considered essential to health, involvement of a UK Approved Body is necessary5. The specific conformity assessment procedure depends on the category of IVD concerned5.

General IVDs

These are considered a low risk to patients and include clinical chemistry analysers, specimen receptacles and prepared selective culture media4. For general IVDs, involvement from a UK Approved Body is not required5. Instead, relevant provisions in the UK MDR 2002 must be met and self-declared prior to adding a UKCA mark to the device5,6.

IVDs for self-testing

These represent a low-to-medium risk to patients and include pregnancy self-testing, urine test strips and cholesterol self-testing4. In addition to conforming to requirements for general IVDs, applications for IVDs involved in self-testing must be sent to a UK Approved Body5. This enables examination of the design of the device, such as how suitable it is for non-professional users5.

IVDs stated in Part IV of the UK MDR 2002, Annex II List B

These represent medium-to-high risk to patients and include blood glucose self-testing, PSA screening and HLA typing4. Applications for devices in this category must be sent to a UK Approved Body5. This can enable auditing of technical documentation and the quality management system6.

IVDs stated in Part IV of the UK MDR 2002, Annex II List A.

These represent the highest risk to patients and include Hepatitis B blood-donor screening, ABO blood grouping and HIV blood diagnostic tests4. Due to the high risk associated with IVDs in this category, applications for devices in this category must be sent to a UK Approved Body5. By doing so, an audit of the quality management system can be performed as well as a design dossier review6. In addition, the UK Approved Body must verify each product or batch of products prior to being placed on the market5,6.

Proposed updates to medical device categories in the UK

Due to the quickly evolving state of medical technology, many items that did not previously count as a medical device, such as software and AI, are now needing to be considered as such. New proposals have been put forward as potential amendments to the existing regulations and risk classifications to accommodate newer technologies and devices. Among other proposed changes the following list of novel devices has been recommended for upgrade to the classification of highest risk Class III.

  • Active implantable medical devices and their accessories
  • in vitro fertilisation (IVF) and Assisted reproduction technologies (ART)
  • Surgical meshes
  • total or partial joint replacements
  • spinal disc replacements and other medical devices that come into contact with the spinal column
  • medical devices containing nano-materials
  • medical devices containing substances that will be introduced to the human body by one of various methods of absorption in order to achieve their intended function.
  • Active therapeutic devices with an integrated diagnostic function determining patient management such as closed loop or automated systems.

With the shift to a higher risk classification will come increased demand of clinical evidence and clinical testing, including clinical trials, in order for these devices to meet regulatory approval and reach the market. While an increased burden for the manufacturer this will be to the benefit patient safety and satisfaction for the end users. A full list of the proposed changes, including those outside of Class III, can be found here: Chapter 2: Classification – GOV.UK (www.gov.uk)

Medical devices are incredibly heterogenous, ranging from therapeutics and surgical tools to diagnostics and medical imaging software including machine learning and AI. Accordingly, medical device research and development often requires an interdisciplinary approach. During R&D, it is important to consider for whom the device is intended, how it will be used, and under what circumstances. Similarly, it is crucial to understand the risk status of the device. By considering these attributes, the device can be successfully assessed through the appropriate regulatory approval pathway.

References

Factsheet: medical devices overview – GOV.UK (www.gov.uk)

[1] https://www.gov.uk/government/collections/guidance-on-class-1-medical-devices

[2] https://www.gov.uk/guidance/medical-devices-how-to-comply-with-the-legal-requirements

[3] https://www.gov.uk/guidance/medical-devices-conformity-assessment-and-the-ukca-mark

[4] https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/640404/MDR_IVDR_guidance_Print_13.pdf[5] https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/946260/IVDD_legislation_guidance_-_PDF.pdf

[5] https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/946260/IVDD_legislation_guidance_-_PDF.pdf

[6] diagnostic medical devices IVD

Complex Adaptive Systems (CAS) Approach to Biomedicine & Public Health

While a majority of biomedical and public health research still maintains a linear reductive approach to arrive at empiric insight, reality is in most cases neither of these things. A complex adaptive systems approach, like reality, is non-linear and high dimensional. There are many benefits from taking a linear cause-effect reductivist approach in that the complex problem and it’s solution becomes simplified into terms that can be understood and predicted.  Where this falls short is that predictions don’t often hold up in real world examples in which the outcomes tend to seem unpredictable.

Genomics, proteomics, transcriptomics and other “omics” techniques have generated an unprecedented amount of molecular genetics data. This data can be combined with larger scale data, from cell processes to environmental influences in disease states, to produce highly sophisticated multi-level models of contributing factors in human health and disease.

An area that is currently seeing an evolution into a more personalised nuanced approach, albeit still linear, is clinical trials. By introducing a biomarker component to clinical trials, for example to evaluate drug efficacy, the number of dimensions to the problem is slightly increased in order to arrive at more targeted and accurate solutions. More specifically, the number of patient sub-categories in the clinical trials increases to accommodate various biomarker groups which may respond more of less well to a different pharmacological approach to the same disease. Increasing the dimensions of the problem beyond this would, for now not be feasible or even helpful. On the other hand, understanding the interplay between biomolecular processes and environmental interactions in order to gain insight into disease processes themselves and thereby, which biochemical pathways for oncology drugs to target, is something that clearly benefits from a non-linear approach.

Another example of a system that benefits from a non-linear approach is public health service provision and the desire to garner insights into changes that increase prevention, early intervention and treatment effectiveness as well as reduce service cost for the government and patient. Both of the above examples require attention to both macro and micro processes.

Some components of complex adaptive systems: Connectivity, self organisation, emergent, fractal patterns, non-linear, governed by feedback loops, adaptive, nested systems, stochastic, simple rules, chaotic behaviour, iteration, sub optimal, requisite variety, may be considered optimised at the edge of chaos.

Whether modelling clinical health services networks or biological processes, complex adaptive systems consist of several common characteristics.

Components of complex adaptive systems

Massive interdependencies and elaborate connectivity

The complex adaptive systems approach shifts emphasis away from studying individual parts (such as seen in conventional medical science which produces notably fragmented results) to characterising the organisation of these parts in terms of their inherently dynamic interactions. CAS are open rather than closed systems because it is exogenous elements impacting on the system that cause the disruption required for growth.

Complex adaptive systems can be understood by relations and networks. System processes occur in networks that link component entities or agents. This approach emphasises that structures are dynamic and it is the process of becoming rather than the being itself that is of empirical interest.

Necessarily transdisciplinary or multi-disciplinary

A complex adaptive systems approach requires a transdisciplinary approach. The collaboration of numerous disparate experts is required in the combining of myriad biological, physical and societal based sciences into a holistic model. This model should aim to represent pertinent simultaneous top-down and bottom up processes that reveal contexts and relationships within the observed system dynamics.

Self-organising, emergent behaviour

Complex adaptive systems are selt-organising in the sense that observed patterns are open ended, potentially unfinished and cannot be predicted by the conventional definition. Rules of cause and effect are context dependent and can’t be applied in a rigid sense.

A self organising dynamic structure, which can be identified as a pattern, emerges as a result of individual spontaneous interactions between individual agents or elements. This pattern then impacts the interactions of individuals in a continual top down, bottom up symbiosis.

While linear models represent a reductionist, closed conceptualisation of the phenomena under analysis, a complex systems approach embraces high dimensionality true to the myriad real world phenomena composing a system. This requires that the system be treated as open and of uncertain ontology and thus lacking predictive capacity with regards to the outcomes of system dynamics.

As an emergent phenomena, the complex adaptive system can be understood by interacting with it rather than through analysis or static modelling. This approach is concerned with “state change” or to evaluate “how things are becoming”, rather than “how thing are”. How did today’s state emerge from yesterday’s trajectories and process dynamics?

Fractal engagement. Fractal engagement entails that the system as a whole orientates through multiple actions. The same data can produce frameworks at the level of responsibility of every individual agent. Using public health intervention as an example, Individual agents make decisions, based on the data, as to what changes they can make tomorrow within their own sphere of competence, rather than overarching changes being dictated and determined in a top down way, or by others.

Feedback loops

Feedback loops link individual parts into an overaching dynamic structure. Feedback loops are self-reinforcing and can be positive or negative.

Negative feedback loops are stabalising in that they have a dampening effect on oscillations that causes the system or component to move closer to equilibrium. Positive feedback loops are morphogenic and increase the frequency and amplitude of oscillations driving the system away from homeostasis and leading to changes in the underlying structure or the system.

Positive feedback loops, while facilitating growth and adaptation, tend towards chaos and decay and are thus crucially counterbalanced by simultaneously operating negative feedback loops. Evolution is supposed to occur as a series of phase transitions, back and forth, from ordered to disordered states.

Both top-down and bottom-up “causality”

While CAS models describe elements in terms of possibilities and probabilities, rather than cause and effect in the linear sense, there is a clear interplay between top down and bottom up causality and influence on the dynamic flows and trajectories of any system. This is very much a mirror of real world systems. One example of this is the human body where both conscious thought (top down) and biomolecular processes such as hormonal and neurochemical fluctuations (bottom up) effect mood, which in turn has a lot of flow on effects down stream that cause changes that shirt the system back and forth from health to disease. One such manifestation of this is stress induced illness of various kinds. As a social example, we can of course find many examples of top down and bottom up causation in a public heath or epidemiological setting.

This has been a non-exhaustive description of just some key components of complex adaptive systems. The main purpose is to differentiate the CAS paradigm from the more mainstream biomedical research paradigm and approach. For a deeper dive into the concepts mentioned see the references below.

References:

https://core-cms.prod.aop.cambridge.org/core/services/aop-cambridge core/content/view/F6F59CA8879515E3178770111717455A/9781108498692c7_100-118.pdf/role_of_the_complex_adaptive_systems_approach.pdf

Carmichael T., Hadžikadić M. (2019) The Fundamentals of Complex Adaptive Systems. In: Carmichael T., Collins A., Hadžikadić M. (eds) Complex Adaptive Systems. Understanding Complex Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-20309-2_1

https://www.health.org.uk/sites/default/files/ComplexAdaptiveSystems.pdf

Milanesi, L., Romano, P., Castellani, G., Remondini, D., & Liò, P. (2009). Trends in modeling Biomedical Complex Systems. BMC bioinformatics10 Suppl 12(Suppl 12), I1. https://doi.org/10.1186/1471-2105-10-S12-I1

Sturmberg J. P. (2021). Health and Disease Are Dynamic Complex-Adaptive States Implications for Practice and Research. Frontiers in psychiatry12, 595124. https://doi.org/10.3389/fpsyt.2021.595124

Master Protocols for Clinical Trials

Part 1: Basket & Umbrella Trial Designs

Introduction

As the clinical research landscape becomes ever more complex and interdisciplinary alongside an evolving genomic and biomolecular understanding of disease, the statistical design component that underpins this research must adapt to accommodate this. Accuracy of evidence and speed with which novel therapeutics are brought to market remain hurdles to be surmounted.

While efficacy studies or non-inferiority clinical trials in the drug development space traditionally only included broad disease states usually with patients randomised to a dual arm of new treatment compared to an existing standard treatment. Due to patient biomarker heterogeneity, effective treatments could be left unsupported by evidence. Similarly treatments found effective in a clinical trial don’t always translate to show real world effectiveness in a broader range of patients.

Our current ability to assess individual genomic, proteomic and transcriptomic data and other patient bio-markers for disease, as well as immunologic and receptor site activity, has shown that different patients respond differently to the same treatment and, the same disease may benefit from different treatments in different patients – thus the beginnings of precision medicine.  In addition to this is the scenario where a single therapeutic may be effective against a number of different diseases or subclasses of a disease based on the agent’s mechanism of action on molecular processes common to the disease states under evaluation.

Master protocols, or complex innovative designs, are designed to pool resources to avoid redundancy and test multiple hypotheses under one clinical trial, rather than multiple clinical trials being carried out separately over a longer period of time.

Due to this fairly novel evolution in the clinical research paradigm and also due to inherent flexibility within each study design, conflicting information related to the definition and characterisation of master protocols such as basket and umbrella clinical trials as well as cases in the published literature where the terms “basket” and “umbrella” trials have been used interchangeably or are ill-defined exists. For this reason a brief definition and overview of basket and umbrella clinical trials is included in the paragraphs that follow. Based on systematic reviews of existing research it seeks the clarity of consensus, before detailing some key statistical and operational elements of each design.

Master protocols for bio-marker based clinical trials.
Diagram of a basket trial design.

Basket trial:

A basket clinical trial design consists of a targeted therapy, such as a drug or treatment device, that is being tested on multiple disease states characterised by a common molecular process that is impacted by the treatment’s mechanism of action. These disease states could also share a common genetic or proteomic alteration that researchers are looking to target.

Basket trials can be either exploratory or confirmatory and range from full randomised, controlled double-blinded designs to single arm designs, or anything in between. Single arm designs are an option when feasibility is limited and are more focused on the pre-clinical stage of determining efficacy or whether a particular treatment has clear-cut commercial potential evidenced by a sizable enough retreat in disease symptomology. Depending on the nuances of the patient populations being evaluated final study data may be analyses by pooling disease states or by each disease state separately. Basket trials allow drug development companies to target the lowest hanging fruit in terms of treatment efficacy, focusing resources on therapeutics with the highest potential of success in terms of real patient outcomes.

Master protocol umbrella trial
Diagram of an umbrella trial design.

Umbrella trial:

An umbrella clinical trial design consists of multiple targeted treatments of a single disease where patients can be sub-categorised into biomarker subgroups defined by molecular characteristics that may lend themselves to one treatment over another.

Umbrella trials can be randomised, controlled double-blind studies that in which each intervention and control pair is analysed independently of other treatments in the trial, or where feasibility issues dictate, they can be conducted without a control group with results analysed together in-order to compare the different treatments directly.

Umbrella trials may be useful when a treatment has shown efficacy in some patients and not others, they increase the potential for confirmatory trial success by honing in on patient sub-populations that are most likely to benefit due to biomarker characteristics, rather than grouping all patients together as a whole.

Basket & Umbrella trials compared:

Both basket and umbrella trials are typically biomarker guided. The difference being that basket trials aim to evaluate tissue-agnostic treatments to multiple diseases based on common molecular characteristics, whereas umbrella trials aim to evaluate nuanced treatment approaches to the same disease based on differing molecular characteristics between patients.

Biomarker guided trials have an additional feasibility constraint to non-biomarker guided trials in that the size of the eligible patient pool is reduced in proportion to the prevalence of the biomarker/s of interest within that patient pool. This is why master protocol methodology becomes instrumental in enabling these appropriately complex research questions to be pursued.

Statistical Concepts and considerations of basket and umbrella Trials

Effect size

Basket and umbrella trials generally require a larger effect size than traditional clinical trials, in order to achieve statistical significance. This is in a large part due to the smaller sample sizes and higher variance that comes with that. While patient heterogeneity in terms of genomic or molecular diversity, and thus expected treatment outcome, has been reduced by the precision targeting of the trial design, there is a certain degree of between-patient heterogeneity that can only be expected when relying on treatment arms of very small sample sizes.

If resources, including time, are tight then basket trials enable drug developers to focus on less risky treatments that are more likely to end in profitability. It should be noted that this does not always mean that the treatments that are rejected by basket trials are truly clinically ineffective. A single arm exploratory basket trial could end up rejecting a potential new treatment that, if subject to a standard trial with more drawn out patient acquisition and a larger sample size, would have been deemed effective at a narrower effect size.

Screening efficiency

If researchers carry out separate clinical studies for each biomarker of interest, then a separate screening sample needs to be recruited for each study. The rarer the biomarker, the larger the recruited screening sample would need to find enough people with the biomarker to participate in the study. This number needs to be multiplied by the number of biomarkers. A benefit of master protocols is that a single sample of people can be screened for multiple biomarkers at once, greatly reducing the required screening sample size.

 For example, researchers interested in 4 different biomarkers could collectively reduce the required screening sample by three quarters compared to conducting separate clinical studies for each biomarker. This maximisation of resources can be particularly helpful when dealing with rare biomarkers or diseases.

Patient allocation considerations

If relevant biomarkers are not mutually exclusive a patient could fit into multiple biomarker groups for which treatment is being assessed in the study. In this scenario a decision has to be made as to which category the patient will be assigned and the decision process may occur at random where appropriate. If belonging to two overlapping biomarker groups is problematic in terms of introducing bias in small sample sizes, or if several patients have the same overlap, then a decision may be made to collapse the two biomarkers into a single group or eliminate one of the groups. If a rare genetic mutation is a priority focus in the study then feasibility would dictate that the patient be assigned to this biomarker group.

Sample Size calculations

Generally speaking, sample size calculation for basket trials should be based on the overall cohort, whereas sample size calculations for umbrella trials are typically undertaken individually for each treatment.

Basket and umbrella trials can be useful in situations where a smaller sample size is more feasible due to specifics of the patient population under investigation. Statistically designing for this smaller sample size typically comes at the cost of necessitating a greater effect size (difference between treatment and control) and this translates to lower overall study power and greater chance of type 1 error (false negative result) when compared to a standard clinical trial design. Despite these limitations master protocols such as basket or umbrella trials allow to evaluation of certain treatments to the highest possible level of evidence that otherwise might be too heterogeneous or rare to evaluate using a traditional phase II or III trial.

Randomisation and control

Randomised controlled designs are recommended for confirmatory analysis of an established treatment or target of interest. The control group typically treats patients with the established standard of care for their particular disease or, in the absence of one, placebo.

In master basket trials the established standard of care is likely to differ by disease or disease sub-type. For this reason it may be necessary for randomised controlled basket trials pair a control group with each disease sub-group rather than just incorporating a single overall control group and potentially pooling results from all diseases under one statistical analysis of treatment success. Instead it is worth considering if each disease type and corresponding control pair could be analysed separately to enhance statistical robustness in a truly randomised controlled methodology.

Single arm (non-randomised designs) are sometimes necessary for exploratory analysis of potential treatments or targets. These designs often require a greater margin of success (treatment efficacy) to be statistically significant as a trade-off for a smaller sample size required.

Blinding

To increase the quality of evidence, all clinical studies should be double blinded where possible.

To truly evaluate the effectiveness of a treatment without undue bias from a statistical perspective double-blinding is recommended.

Aside from increased risk of type 2 error that may be inherent in master protocol designs, there is a greater potential for statistical bias to be introduced. Bias can introduce itself in a myriad of ways and results in a reduction in the quality of evidence that a study can produce. Two key sources of bias are lack of randomisation (mentioned above) and lack of blinding.

Single armed trials do not include a control arm and therefore patients cannot be randomised to a treatment arm where double-blinding of patients, practitioners, researchers and data managers etc will prevent various types of bias creeping in to influence the study outcomes. With so many factors at play it is important not to overlook the importance of study blinding and implement it whenever feasible to do so.

If the priority is getting a new treatment or product to market fast to benefit patients and potentially save lives, accommodating this bias can be a necessary trade-off. It is after-all typically quite a challenge to have clinical data and patient populations that are at homogeneous and matched to any great degree, and this reality is especially noticeable with rare diseases or rare biomarkers.

Biomarker Assay methodology

The reliability of biologic variables included in a clinical trial should be assessed, for example the established sensitivity and specificity of particular assays needs to be taken into account. When considering patient allocation by biomarker group, the degree of potential inaccuracy of this allocation can have a significant impact on trial results, particularly when there is a small sample size. If the false positive rate of a biomarker assay is too high this will result in the wrong patients qualifying for treatment arms, in some cases this may reduce the statistical power of the study.

A further consideration of assay methodology pertains to the potential for non-uniform bio-specimen quality at different collection sites which may bias study results. A monitoring framework should be considered in order to mitigate this.

Patient tissue samples required for assays, can inhibit feasibility and increase time and cost in the short term and make study reproducibility more complicated. While this is important to note these techniques are in many cases necessary in effectively assessing treatments based on our contemporary understanding a many disease states such as cancer within the modern oncology paradigm. Without incorporating this level of complexity and personalisation into clinical research it will not be possible to develop evidence based treatments that translate into real-world effectiveness and thus widespread positive outcomes for patients.

Data management and statistical analysis

The ability to statistically analyse multiple research hypotheses at once within a single dataset increases efficiency at the biostatisticians end and allows frameworks for greater reproducibility of the methodology and final results, compared to the execution and analysis of multiple separate clinical trials testing the same hypotheses. Master protocols also enable increased data sharing and collaboration between sites and stakeholders.

Deloitte research estimated that master protocols can save clinical trials 12-15% in cost and 13-18% in study duration. These savings of course apply to situations where master protocols were a good fit for the clinical research context, rather than to the blanket application of these study designs across any or all clinical studies. Applying a master protocol study design to the wrong clinical study could actually end up increasing required resources and costs without benefit, therefore it is important to assess whether a master protocol study design is indeed the optimal approach for the goals of a particular clinical study or studies.

umbrella trials for precision medicine
Master protocols for precision medicine.

References:

Bitterman DS, Cagney DN, Singer LL, Nguyen PL, Catalano PJ, Mak RH. Master Protocol Trial Design for Efficient and Rational Evaluation of Novel Therapeutic Oncology Devices. J Natl Cancer Inst. 2020 Mar 1;112(3):229-237. doi: 10.1093/jnci/djz167. PMID: 31504680; PMCID: PMC7073911.

Lesser N, Na B, Master protocols: Shifting the drug development paradigm, Deloitte Center for Health solutions

Lai TL, Sklar M, Thomas, N, Novel clinical trial solutions and statistical methods in the era of precision medicine, Technical Report No. 2020-06, June 2020

Renfro LA, Sargent DJ. Statistical controversies in clinical research: basket trials, umbrella trials, and other master protocols: a review and examples. Ann Oncol. 2017 Jan 1;28(1):34-43. doi: 10.1093/annonc/mdw413. PMID: 28177494; PMCID: PMC5834138.

Park, J.J.H., Siden, E., Zoratti, M.J. et al. Systematic review of basket trials, umbrella trials, and platform trials: a landscape analysis of master protocols. Trials 20, 572 (2019). https://doi.org/10.1186/s13063-019-3664-1

Distributed Ledger Technology for Clinical & Life Sciences Research: some Use-Cases for Blockchain & Directed Acyclic Graphs

Applications of blockchain and other distributed ledger technology (DLT) such as directed acyclic graphs (DAG) to clinical trials and life sciences research are rapidly emerging.

Distributed ledger technology (DLT) such as blockchain has a myriad of use-cases in life sciences and clinical research.
Distributed ledger technology (DLT) has the potential to solve a myriad of problems that currently plague data collection, management and access processes in clinical and life sciences research, including clinical trials. DLT is an innovative approach to operating in environments where trust and integrity is paramount by paradoxically removing the need for trust in any individual component and providing full transparency as to the micro-environment of the platform operations as a whole.Currently the two forms of DLT predominating are blockchain and directed acyclic graphs (DAGs). While quite distinct from one another, in theory the two technologies are intended to serve similar purposes, or were developed to address the same goals. In practice, blockchain and DAGs may have optimal use-cases that differ in nature from one another, or be better equipped to serve different goals – the nuance of which to be determined on a case by case basis.

Bitcoin is the first known example of blockchain, however blockchain goes well beyond the realms of bitcoin and cryptocurrency use cases. One of the earliest and currently predominating DAG DLT platforms is IOTA which has proved itself in a plethora of use cases that go well beyond what blockchain could currently achieve, particularly within the realm of the internet of things (IOT). In fact Iota has been developing an industry data marketplace active since 2017 which makes it possible to store, sell via micro-transactions and access data streams via web browser. For the purposes of this article we will focus on DLT applications in general and include use-cases in which blockchain or DAGs can be employed interchangeably. Before we begin, what is Distributed Ledger  technology?

The Iota Tangle has already been implemented in a plethora of use cases that may be beneficially translated to clinical and life sciences research.

Source: iota.org Iota’s Tangle is an example of directed acyclic graph (DAG) digital ledger technology. Iota has been operating an industry data marketplace since 2017.
​DLT is a decentralised digital system which can be used to store data and record transactions in the form of a ledger or smart contract. Smart contracts can be set up to form a pipeline of conditioned (if-then) events, or transactions, much like an escrow in finance, which are shared across nodes on the network. Nodes are used to both store data and process transactions, with multiple (if not all) nodes accommodating each transaction – hence the decentralisation. Transactions themselves are a form of dynamic data, while a data set is an example of static data. Both blockchain and DAGs employ advanced cryptography algorithms which as of today render them un-hackable. This is a huge benefit in the context of sensitive data collection such as patient medical records or confidential study data. It means that data can be kept secure, private, untampered with, and shared efficiently with whomever requires access. Because each interaction or transaction is recorded this enables the integrity of the data to be upheld in what is considered a “trustless” exchange. Because data is shared on multiple nodes for all involved to witness across the network, records become harder to manipulate of change in an underhanded way. This is important in the collection of patient records or experimental data that is destined for statistical analysis. Any alterations to data that are made are recorded across the network for all participants to see, enabling true transparency. All transactions can come in the form of smart contracts which are time stamped and tied to a participant’s identity via the use of digital signatures.

In this sense DLT is able to speed up transactions and processes, while reducing cost, due to the removal of a middle-man or central authority overseeing each transaction, or transfer of information. DLT can be public or private in nature. A private blockchain, for example, does have trusted intermediary who decides who is to have access to the blockchain, who can participate on the network, which data can be viewed by which participants. In the context of clinical and life sciences research this could be a consortium of interested parties, ie the research team, or an industry regulator or governing body. In a private blockchain, the transactions themselves remain decentralised, while the blockchain itself has built in permission layers that allow full or partial visibility of data depending upon the stakeholder. This is necessary in the context of sharing anonymised patient data and blinding in randomised controlled trials.
Blockchain and Hashgraph are two examples of distributed ledger technology (DLT) with applications which could achieve interoperability across healthcare,  medicine, insurance, clinical trials and life sciences research.

Source: Hedera Hashgraph whitepaper. Blockchain and Hashgraph are two examples of distributed ledger technology (DLT).
Due to the immutable nature of each ledger transaction, or smart contract, stakeholders are unable to alter or delete study data without a consensus over the whole network. In this situation, an additional transaction recorded and time-stamped on the blockchain while the original transaction, that recorded the data to be altered in its original form, remains intact. This property helps to reduce the incidence of human error, such as data entry error, as well as any underhanded alterations with the potential to sway study outcomes.

In a clinical trials context the job of the data monitoring committee, and any other form of auditing  becomes much more straight forward. DLT also allow for complete transparency in all financial transactions associated with the research. Funding bodies can see exactly where all funds are being allocated and at what time points. In-fact every aspect of the research supply-chain, from inventory to event tracking, can be made transparent to the desired entities. Smart contracts operate among participants in the blockchain and also between the trusted intermediary and the DLT developer whose services have been contracted for building the platform framework, such as the private blockchain. The services contracts will need to be negotiated in advance so that the platform is tailored to adequately conform to individualised study needs. Once processes are in place and streamlines the platform can be replicated in comparable future studies.

DLT can address the problem of duplicate records in study data or patient records, make longitudinal data collection more consistent and reliable across multiple life cycles. Many disparate stakeholders, from doctor to insurer or researcher, can share in the same patient data source while maintaining patient privacy and improving data security. Patients can retain access to the data and decide with whom to share it with, which clinical studies to participate in and when to give or withdraw consent.

DLT, such as blockchain or DAGs, can improve collaboration by making the sharing of technical knowledge easier and centralising data or medical records, in the sense that they are located on the same platform as every other transaction taking place. This results in easier shared access by key stakeholders, shortening of negotiation cycles due to improved coordination and making established clinical research processes more consistent and replicable.

From a statisticians perspective, DLT should result in data of higher integrity which yields statistical analysis of greater accuracy and produces research with more reliable results that can be better replicated and validated in future research. Clinical studies will be streamlined due to the removal of much bureaucracy and therefore more time and cost effective to implement as a whole. This is particularly important in a micro-environment with many moving parts and disparate stakeholders such as the clinical trials landscape.


References and further reading:

From Clinical Trials to Highly Trustable Clinical Trials: Blockchain in Clinical Trials, a Game Changer for Improving Transparency?
https://www.frontiersin.org/articles/10.3389/fbloc.2019.00023/full#h4

Clinical Trials of Blockchain

Blockchain technology for improving clinical research quality
https://trialsjournal.biomedcentral.com/articles/10.1186/s13063-017-2035-z

Blockchain to Blockchains in Life Sciences and Health Care
https://www2.deloitte.com/content/dam/Deloitte/us/Documents/life-sciences-health-care/us-lshc-tech-trends2-blockchain.pdf

Simpson’s Paradox: the perils of hidden bias.

How Simpson’s Paradox Confounds Research Findings And Why Knowing Which Groups To Segment By Can Reverse Study Findings By Eliminating Bias.

Introduction
 
The misinterpretation of statistics or even the “mis”-analysis of data can occur for a variety of reasons and to a variety of ends. This article will focus on one such phenomenon
contributing to the drawing of faulty conclusion from data – Simpson’s Paradox.


At times a situation arises where the outcomes of a clinical research study depict the inverse of expected (or essentially correct) outcomes. Depending upon the statistical approach, this could affect means, proportions or relational trends among other statistics.
Some examples of this occurrence are a negative difference when a positive difference was anticipated, a positive trend when a negative one would have been more intuitive – or vice versa. Another example commonly pertains to the cross tabulation of proportions, where condition A is proportionally greater over all, yet when stratified by a third variable, condition B is greater in all cases . All of these examples can be said to be instances of Simpson’s paradox. Essentially Simpson’s paradox represents the possibility of supporting opposing hypotheses – with the same data.
Simpson’s paradox can be said to occur due to the effects of confounding, where a confounding variable is characterised by being related to both the independent variable and the outcome variable, and unevenly distributed across levels of the independent variable. Simpson’s paradox can also occur without confounding in the context of non-collapsability. 
For more information on the nuances of confounding versus non-collapsability in the context of Simpson’s paradox, see here.

 In a sense, Simpson’s paradox is merely an apparent paradox, and can be more accurately described as a form of bias. This bias most often results from a lack of insight into how an unknown lurking variable, so to speak, is impacting upon the relationship between two variables of interest. Simpson’s paradox highlights the fact that taking data at face value and utilising it to inform clinical decision making can often be highly misleading. The chances of Simpson’s paradox (or bias) impacting the statistical analysis can be greatly reduced in many cases by a careful approach that has been informed by proper knowledge of the subject matter. This highlights the benefit of close collaboration between researcher and statistician in informing an optimal statistical methodology that can be adapted on a per case basis.

The following three part series explores hypothetical clinical research scenarios in which Simpson’s paradox can manifest.

Part 1

Simpson’s Paradox in correlation and linear regression

​​

​Scenario and Example

A nutritionist would like to investigate the relationships between diet and negative health outcomes. As higher weight has been previously associated with negative health outcomes, the research sets out to investigate the extent to which increased caloric intake contributes to weight gain. In researching the relationship between calorie intake and weight gain for a particular dietary regime, the nutritionist uncovers a rather unanticipated negative trend. As caloric intake increases the weight of participants appears to go down. The nutritionist therefore starts recommending higher calorie intake as a way to dramatically lose weight. Weight does appear to go down with calorie intake, however if we stratify the data by different age groupings, a positive trend between weight and calorie intake emerges for each age group. While overall elderly have the lowest calorie intake, they also have the highest weight, and teens have the highest calorie intake but the lowest weight, this accounts for the negative trend but does not give an honest picture of the impact of calories on weight. In order to gain an accurate picture of the relationship between weight and calorie intake we have to know which variable to group or stratify the data by, and in this case it’s age. Once the data is stratified by five separate age categories a positive trend between calories and weight emerges in each of the 5 categories. In general, the answer to which variable to stratify by or control for isn’t typically this obvious and in most cases and requires some theoretical background and a thorough examination of the available data including associated variables for which the information is at hand.


Remedy

In the above example, age shows a negative relationship to the independent variable, calories, but a positive relationship to the dependent variable, weight. It is for this reason that a bit of data exploration and assumption checking before any hypothesis testing is so essential. Even with these practices in place it is possible to overlook the source of confounding and caution is always encouraged.
 
Randomisation and Stratification:
In the context of a randomised controlled trial (RTC), the data should be randomly assigned to treatment groups as well as stratified by any pertinent demographic and other factors so that these are evenly distributed across treatment arms (levels of the independent variable). This approach can help to minimise, although not eliminate the chances of bias occurring in any such statistical context, predictive modelling or otherwise.

Linear Structural Equation Modelling:
 If the data at hand is not randomised but observational, a different approach should be taken to detect causal effects in light of potential confounding or non-collapsability. One such approach is linear structural equation modelling where each variable is generated as a linear function of it’s parents, using a directed acyclic graph (DAG) with weighted edges. This is a more sophisticated and ideal approach to simply adjusting for x number of variables, which is needed in the absence of a randomisation protocol.

Heirachical regression:
This example illustrated an apparent negative trend of the overall data masking a positive trend In each individual subgroup, in practice, the reverse can also occur.
In order to avoid drawing misguided conclusion from the data the correct statistical approach must be entertained, a hierarchical regression controlling for a number of potential confounding factors could avoid drawing wrong conclusion due to Simpson’s paradox.

 

Article: Sarah Seppelt Baker


Reference:
The Simpson’s paradox unraveled, Hernan, M, Clayton, D, Keiding, N., International Journal of Epidemiology, 2011.

Part 2

Simpson’s Paradox in 2 x 2 tables and proportions


​Scenario and Example

Simpson’s paradox can manifest itself in the analysis of proportional data and two by two tables. In the following example two pharmaceutical cancer treatments are compared by a drug company utilising a randomised controlled clinical trial design. The company wants to test how the new drug (A) compares to the standard drug (B) already widely in clinical use.  1000 patients were randomly allocated to each group. A chi squared test of remission rates between the two drug treatments is highly statistically significant, indicating that the new drug A is the more effective choice. At first glance this seems reasonable, the sample size is fairly large and equal number of patients have been allocated to each groups.
Drug  Treatment
A
B
Remisson Yes
798 (79.8%)
705 (70.5%)
Remission No
202
295
Total sample size
1000
1000
The chi-square statistic for the difference in remission rates between treatment groups is 23.1569. The p-value is < .00001. The result is significant at p < .05.


When we take a closer look, the picture changes. It turns out the clinical trial team forgot to take into account the patients stage of disease progression at the commencement of treatment. The table below shown that drug A was allocated to far more patients with stage II cancer (79.2%) and drug B was allocated to far more patients with stage IV cancer (79.8%). 

Stage II
Stage IV
Drug Treatment
A
B
A
B
Remission Yes
697 (87.1%)
195 (92.9%)
101 (50.5%)
510 (64.6%)
Remission No
103
15
99
280
Total sample size
800
210
200
790
The chi-square statistic for the difference in remission rates between treatment groups for patients with stage II disease progression at treatment outset is 5.2969. The p-value is .021364. The result is significant at p < .05.


The chi-square statistic for the difference in remission rates between treatment groups for patients with stage IV disease progression at treatment outset is 13.3473. The p-value is .000259. The result is significant at p < .05.

Unfortunately the analysis of tabulated data is no less prone to bias in results akin to Simpson’s Paradox than continuous data. Given that stage II cancer is easier to treat than stage IV, this has given drug A an unfair advantage and has naturally lead to a higher remission rate overall for drug A. When the treatment groups are divided by disease progression categories and reanalysed, we can see that remission rates are higher for drug B in both stage II and stage IV baseline disease progression. The resulting chi squared statistics are wildly different to the first and statistically significant in the opposite direction to the first analysis.  In causal terms, stage of disease progression affects difficulty of treatment and likelihood of remission. Patients at a more advanced stage of disease, ie stage IV, will be harder to treat than patients at stage II. In order for a fair comparison between two treatments, patients stage of disease progression needs to be taken into account. In addition to this some drugs may be more efficacious at one stage or the other, independent of the overall probabilities of achieving remission at either stage. 

Remedy

Randomisation and Stratification:
Of course in this scenario, stage of disease progression is not the only variable that needs to be accounted for in order to insure against biased results. Demographic variables such as age, sex socio-economic status and geographic location are some examples of variables that should be controlled for in any similar analysis. As with the scenario in part 1, this can be achieved is through stratified random allocation of patients to treatment groups at the outset of the study. Using a randomised controlled trial design where subjects are randomly allocated to each treatment group as well as stratified by pertinent demographic and diagnostic variables will reduce the chances of inaccurate study results occurring due to bias.

Further examples of Simpson’s Paradox in 2 x 2 tables

Simpson’s paradox in case control and cohort studies

Case control and cohort studies also involve analyses which rely on the 2×2 table. The calculation of their corresponding measures of association the odds ratio and relative risk, respectively, is unsurprisingly not immune to the effect of bias and in much the same way as the chi square example above. This time, a reversed odds ratio or relative risk in the opposite direction can occur if the pertinent bias has not been accounted and controlled for.

Simpson’s paradox in meta-analysis of case control studies

Following on from the example above, this form of bias can pose further problems in the context of meta-analysis. When combining results from numerous case control studies the confounders in question may or may not have been identified or controlled for consistently across all studies and some studies will likely have identified different confounders to the same variable of interest. The odds ratios produced by the different studies can therefore be incompatible and lead to erroneous conclusions. Meta-analysis can therefore fall prey to ecological fallacy as a result of systematic bias, where the odds ratio for the combined studies is in the opposite direction to the odds ratios of the separate studies. Imbalance in treatment arm size has also been found to act as a confounder in the context of meta-analysis of randomised controlled trials. Other methodological differences between studies may also be at play, such as differences in follow-up times between studies or a very low proportion of observed events occurring in some studies, potentially due to a shorted follow-up time.

That’s not to say that meta-analysis cannot be performed on these studies, inter-study variation is of-course more common than not, as with all other analytical contexts it is necessary to proceed with a high level of caution and attention to detail. On the whole an approach of simply pooling study results is not reliable, the use of more sophisticated meta-analytic techniques, such as random effects models or Bayesian random effects models that use a Markov chain algorithm for estimating the posterior distributions, are required to mitigate inherent limitations of the meta-analytic approach. Random-effects models assume the presence of study-specific variance which is a latent variable to be partitioned. Bayesian random-effects models can come in parametric, non-parametric or semi-parametric varieties, referring to the shape of the distributions of study-specific effects.
​​
For more information on Simpson’s paradox in meta-analysis, see here.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-8-34

For more information on how to minimise bias in meta-analysis, see here.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3868184/

https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.4780110202

Part 3

Simpson’s Paradox & Cox Proportional Hazard Models

Time to event data is common in clinical science and epidemiology, particularly in the context of survival analysis. Unfortunately the calculation of hazard rate in survival analysis is not immune to Simpson’s Paradox as the mathematics behind Simpson’s paradox is essentially the mathematics of conditional probability. In-fact Simpson’s paradox in this context has the interesting characteristic  of holding for some intervals of the time variable (failure time T) but not others. In this case Simpson’s paradox would be observed across the effect of variable Y on the relationship between variable X and time interval T. The proportional hazards model can be seen as an extension of 2 by 2 tables, given that the type of data is similar is used, the difference being that time is typically is as much an outcome of interest in relation to some factor Y. In this context Y could be said to be a covariate to X.

Scenario and example
 
A 2017 paper describes a scenario whereby the death rate due to tuberculosis was lower in Richmond than New York for both African-Americans and for Caucasian-Americans, yet lower in New York than Richmond when the two ethnic groups were combined.
For more details on this example as well as the mathematics behind it see here.
For more examples of Simpson’s paradox in Cox regression see here.


Site specific bias

Factors contributing to bias in survival models can be different to those in more straightforward contexts. Many clinical and epidemiological studies include data from multiple sites. More often than not there is heterogeneity across sites. This heterogeneity can come is various forms and can result in within and between–site clustering, or correlation, of observations on site specific variables. This clustering, if not controlled for, can lead to Simpson’s paradox in the form of hazard rate reversal, across some or all of time T, and has been found to be a common explanation of the phenomenon in this context. Site clustering can occur on the patient level, for example, due to site specific selection procedures for the recruitment of patients (lead by the principal investigators individual to each site), or differences in site specific treatment protocols. Site specific differences can occur intra or internationally and in the international case can be due, for example, to differences in national treatment guidelines or differences in drug availability between countries. Resource availability can also differ between sites whether intra or internationally. In any time to event analysis involving multiple sites (such as the Cox regression model) a site-level effect should be taken into account and controlled for in order to avoid bias-related inferential errors.
 


Remedy
 

Cox regression Model including site as a fixed covariate:
Site should be included as a covariate in order to account for site specific dependence of observations.

Cox regression Model treating site as a stratification variable:
In cases where one or more covariates violate the Proportional Hazards (PH) assumption as indicated by a lack of independence of scaled Schonefeld residuals to time, stratification may be more appropriate. Another option in this case is to add a time-varying covariate to the model. The choice made in this regard will depend on the sampling nuances of each particular study.

Cox shared frailty model:
In specific conditions the Cox shared frailty model may be more appropriate. This approach involves treating subjects from the same site as having the same frailty and requires that each subjects is not clustered across more than one level two unit. While it is not appropriate for multi-membership multi-level data, it can be useful for more straight forward scenarios.

In tailoring the approach to the specifics of the data, appropriate model adjustments should produce hazard ratios that more accurately estimate the true risk.