Sex Differences in Clinical Trial Recruiting

The following article investigates several systematic reviews into sex and gender representation in individual clinical trial patient populations. In these studies sex ratios are assessed and evaluated by various factors such as clinical trial phase, disease type under investigation and disease burden in the population. Sex differences in the reporting of safety and efficacy outcomes are also investigated. In many cases safety and efficacy outcomes are pooled, rather than reported individually for each sex, which can be problematic when findings are generalised to the wider population. In order to get the dosage right for different body compositions and avoid unforeseen outcomes in off label use or when a novel therapeutic first reaches the market, it is important to report sex differences in clinical trials. Due to the unique nuances of disease types and clinical trial phases it is important to realise that a 50-50 ratio of male to female is not always the ideal or even appropriate in every clinical study design. Having the right sex balance in your clinical trial population will improve the efficiency and cost-effectiveness of your study. Based upon the collective findings a set of principles are put forth to guide the researcher in determining the appropriate sex ratio for their clinical trial design.

Sex difference by clinical trial phase

  • variation in sex enrolment ratios for clinical trial phases
  • females less likely to participate in early phases, due to increased risk of adverse events
  • under-representation of women in phase III when looking at disease prevalence

It has been argued that female representation in clinical trials is lacking, despite recent efforts to mitigate the gap. US data from 2000-2020 suggests that trial phase has the greatest variation in enrolment when compared to other factors, with median female enrolment being 42.9%, 44.8%, 51.7%, and 51.1% for phases I, I/II to II, II/III to III, and IV4. This shows that median female enrolment gradually increases as trials progress, with the difference in female enrolment between the final phases II/III to III and IV being <1%. Additional US data on FDA approved drugs including trials from as early as 1993 report that female participation in clinical trials is 22%, 48%, and 49% for trial phases I, II, and III respectively2. While the numbers for participating sexes are almost equal in phases II and III, women make up only approximately one fifth of phase I trial populations in this dataset2. The difference in reported participation for phase I trials between the datasets could be due to an increase in female participation in more recent years. The aim of a phase I trial is to evaluate safety and dosage, so it comes as no surprise that women, especially those of childbearing age, are often excluded due to potential risks posed to foetal development.

In theory, women can be included to a greater extent as trial phases progress and the potential risk of severe adverse events decreases. By the time a trial reaches phase III, it should ideally reflect the real-world disease population as much as possible. European data for phase III trials from 2011-2015 report 41% of participants being female1, which is slightly lower than female enrolment in US based trials. 26% of FDA approved drugs have a >20% difference between the proportion of women in phase II & III clinical trials and the prevalence of women in the US with the disease2, and only one of these drugs shows an over-representation of women.

Reporting of safety and efficacy by sex difference

  • Both safety and efficacy results tend to differ by sex.
  • Reporting these differences is inconsistent and often absent
  • Higher rates of adverse events in women are possibly caused by less involvement or non stratification in dose finding and safety studies.
  • There is a need to enforce analysis and reporting of sex differences in safety and efficacy data

Sex differences in response to treatment regarding both efficacy and safety have been widely reported. Gender subgroup analyses regarding efficacy can reveal whether a drug is more or less effective in one sex than the other. Gender subgroup analyses for efficacy are available for 71% of FDA approved drugs, and of these 11% were found to be more efficacious in men and 7% in women2. Alternatively, only 2 of 22 European Medicines Agency approved drugs examined were found to have efficacy differences between the sexes1. Nonetheless, it is important to study the efficacy of a new drug on all potential population subgroups that may end up taking that drug.

The safety of a treatment also differs between the sexes, with women having a slightly higher percentage (p<0.001) of reported adverse events (AE) than men for both treatment and placebo groups in clinical trials1. Gender subgroup analyses regarding safety can offer insights into the potential risks that women are subjected to during treatment. Despite this, gender specific safety analyses are available for only 45% of FDA approved drugs, with 53% of these reporting more side effects in women2. On average, women are at a 34% increased risk of severe toxicity for each cancer treatment domain, with the greatest increased risk being for immunotherapy (66%). Moreover, the risk of AE is greater in women across all AE types, including patient-reported symptomatic (female 33.3%, male 27.9%), haematologic (female 45.2%, male 39.1%) and objective non-haematologic (female 30.9%, male 29.0%)3. These findings highlight the importance of gender specific safety analyses and the fact that more gender subgroup safety reporting is needed. More reporting will increase our understanding of sex-related AE and could potentially allow for sex-specific interventions in the future.

Sex differences by disease type and burden

  • Several disease categories have recently been associated with lower female enrolment
  • Men are under-represented as often as women when comparing enrolment to disease burden proportions
  • There is a need for trial participants to be recruited on a case-by-case basis, depending on the disease.

Sex differences by disease type

When broken down by disease type, the sex ratio of clinical trial participation shows a more nuanced picture. Several disease categories have recently been associated with lower female enrolment, compared to other factors including trial phase, funding, blinding, etc4. Women comprised the smallest proportions of participants in US-based trials between 2000-2020 for cardiology (41.4%), sex-non-specific nephrology and genitourinary (41.7%), and haematology (41.7%) clinical trials4. Despite women being

proportionately represented in European phase III clinical studies between 2011-2015 for depression, epilepsy, thrombosis, and diabetes, they were significantly under-represented for hepatitis C, HIV, schizophrenia, hypercholesterolaemia, and heart failure and were not found to be overrepresented in trials for any of the disease categories examined1. This shows that the gap in gender representation exists even in later clinical trial phases when surveying disease prevalence, albeit to a lesser extent. Examining disease burden shows that the gap is even bigger than anticipated and includes the under-representation of both sexes.

Sex Differences by Disease Burden

It is not until the burden of disease is considered that men are shown to be under-represented as often as women. Including burden of disease can depict proportionality relative to the variety of disease manifestations between men and women. It can be measured as disability-adjusted life years (DALYs), which represent the number of healthy years of life lost due to the disease. Despite the sexes each making up approximately half of clinical trial participants overall in US-based trials between 2000-2020, all disease categories showed an under-representation of either women or men relative to disease burden, except for infectious disease and dermatologic clinical trials4. Women were under-represented in 7 of 17 disease categories, with the greatest under-representation being in oncology trials, where the difference between the number of female trial participants and corresponding DALYs is 3.6%. Men were under-represented compared with their disease burden in 8 of 17 disease categories, with the greatest difference being 11.3% for musculoskeletal disease and trauma trials.4 Men were found to be under-represented to a similar extent to women, suggesting that the under-representation of either sex could be by coincidence. Alternatively, male under-representation could potentially be due to the assumption of female under-representation leading to overcorrection in the opposite direction. It should be noted that these findings would benefit from statistical validation, although they illustrate the need for clinical trial participants to be recruited on a case-by-case basis, depending on the disease.

Takeaways to improve your patient sample in clinical trial recruiting:

  1. Know the disease burden/DALYs of your demographics for that disease.
  2. Try to balance the ratio of disease burden to the appropriate demographics for your disease
  3. Aim to recruit patients based on these proportions
  4. Stratify clinical trial data by the relevant demographics in your analysis. For example: toxicity, efficacy, adverse events etc should always be analyses separately for male and female to come up wit the respective estimates.
  5. Efficacy /toxicity etc should always be reported separately for male and female. reporting difference by ethnicity is also important as many diseases differentially affect certain ethnicity and the corresponding therapeutics can show differing degrees of efficacy and adverse events.

The end goal of these is that medication can be more personalised and any treatment given is more likely to help and less likely to harm the individual patient.

Conclusions

There is room for improvement in the proportional representation of both sexes in clinical trials and knowing a disease demographic is vital to planning a representative trial. Assuming the under-representation is on the side of female rather than male may lead to incorrect conclusions and actions to redress the balance. Taking demographic differences in disease burden into account when recruiting trial participants is needed. Trial populations that more accurately depict the real-world populations will allow a therapeutic to be tailored to the patient.

Efficacy and safety findings highlight the need for clinical study data to be stratified by sex, so that respective estimates can be determined. This enables more accurate, sex/age appropriate dosing that will maximise treatment efficacy and patient safety, as well as minimise the chance of adverse events. This also reduces the risks associated with later off label use of drugs and may avoid modern day tragedies resembling the thalidomide tragedy. Moreover, efficacy and adverse events should always be reported separately for men and women, as the evidence shows their distinct differences in response to therapeutics.

References:

1. Dekker M, de Vries S, Versantvoort C, Drost-van Velze E, Bhatt M, van Meer P et al. Sex Proportionality in Pre-clinical and Clinical Trials: An Evaluation of 22 Marketing Authorization Application Dossiers Submitted to the European Medicines Agency. Frontiers in Medicine. 2021;8.

2. Labots G, Jones A, de Visser S, Rissmann R, Burggraaf J. Gender differences in clinical registration trials: is there a real problem?. British Journal of Clinical Pharmacology. 2018;84(4):700-707.

3. Unger J, Vaidya R, Albain K, LeBlanc M, Minasian L, Gotay C et al. Sex Differences in Risk of Severe Adverse Events in Patients Receiving Immunotherapy, Targeted Therapy, or Chemotherapy in Cancer Clinical Trials. Journal of Clinical Oncology. 2022;40(13):1474-1486.

4. Steinberg J, Turner B, Weeks B, Magnani C, Wong B, Rodriguez F et al. Analysis of Female Enrollment and Participant Sex by Burden of Disease in US Clinical Trials Between 2000 and 2020. JAMA Network Open. 2021;4(6):e2113749.

Latent Variable Modelling And The Chi Squared Exact Fit Statistic

Latent variable modelling and the chi squared exact fit statistic

Latent variable models are exploratory statistical models used extensively throughout clinical and experimental research in medicine and the life sciences in general. Psychology and neuroscience are two key sub-disciplines where latent variable models are routinely employed to answer a myriad of research questions from the impact of personality traits on success metrics in the workplace (1) to measuring inter-correlated activity of neural populations in the human brain based on neuro-imaging data (2). Through latent variable modelling, dispositions, states or process which must be inferred rather than directly measured can be linked causally to more concrete measurements.
Latent variable models are exploratory or confirmatory in nature in the sense that they are designed to uncover causal relationships between observable or manifest variables and corresponding latent variables in an inter-correlated data set. They use structural equation modelling (SEM) and more specifically factor analysis techniques to determine these causal relationships which and allow the testing of numerous multivariate hypotheses simultaneously. A key assumption of SEM is that the model is fully correctly specified. The reason for this is this is that one small misspecification can affect all parameter estimations in the model, rendering inaccurate approximations which can combine in unpredictable ways (3).

With any postulated statistical model it is imperative to assess and validate the model fit before concluding in favour of the integrity of the model and interpreting results. The acceptable way to do this across all structural equation models is the chi squared (χ²) statistic.

A statistically significant χ² statistic is indicative of the following:

  • A systematically miss-specified model with the degree of misspecification a function of the χ² value.
  • The set of parameters specified in the model do not adequately fit the data and thus that the parameter estimates of the model are inaccurate. As χ² operates on the same statistical principles as the parameter estimation, it follows that in order to trust the parameter estimates of the model we must also trust the χ², or vice versa.
  •  As a consequence there is a need for an investigation of where these misspecification have occurred and a potential readjustment of the model to improve its accuracy.

While one or more incorrect hypotheses may have caused the model misspecification, the misspecification could equally have resulted from other causes. It is important to thus investigate the causes of a significant model fit test . In order to properly do this the following should be evaluated:

  • Heterogeneity:
  •  Does the causal model vary between sub groups of subjects?
  • Are there any intervening within subject variables?
  • Independence:
  • Are the observations truly independent?
  • Latent variable models involve two key assumptions: that all manifest variables are independent after controlling for any latent variables and, an individual’s position on a manifest variable is the result of that individual’s position on the corresponding latent variable (3).
  • Multivariate normality:
  • Is the multivariate normality assumption satisfied?


The study:

A 2015 meta-analysis of 75 latent variable studies drawn from 11 psychology journals has highlighted a tendency in clinical researchers to ignore the χ² exact fit statistic when reporting and interpreting the results of the statistical analysis of latent variable models (4).
97% of papers reported at least one appropriate model, despite the fact that 80% of these did not pass the criteria for model fit and the χ² exact fit statistic was ignored. Only 2% of overall studies concluded that the model doesn’t fit at all and one of these interpreted a model anyway (4).
Reasons for ignoring the model fit statistic: overly sensitive to sample size, penalises models when number of variables is high, general objection to the logic of exact fit hypothesis. Overall broach consensus of preference for Approximate fit indices (AFI).
AFI are instead applied in these papers to justify the models. This typically leads to questionable conclusions. In all just 41% of studies reported χ² model fit results. 40% of the studies that failed to report a p value for the reported χ² value did report a degrees of freedom. When this degrees of freedom was used to cross check the unreported p values, all non-reported p values were in fact significant.
The model fit function was usually generated through maximum likelihood methods, however 43% of studies failed to report which fit function was used.
A further tendency to accept the approximate fit hypothesis when in fact there was no or little evidence of approximate fit. This lack of thorough model examination empirical evidence of questionable validity. 30% of studies showed custom selection of more lax cut-off criteria for the approximate fit statistics than was conventionally acceptable, while 53% failed to report on cut-off criteria at all.
Assumption testing for univariate normality was assessed in only 24% of studies (4).
Further explanation of  χ² and model fit:

The larger the data set the more that increasingly trivial discrepancies are detected as a source of model misspecification. This does not mean that trivial discrepancies become more important to the model fit calculation, it means that the level of certainty with which these discrepancies can be considered important has increased. In other words, the statistical power has increased. Model misspecification can be the result of both theoretically relevant and irrelevant/peripheral causal factors which both need to be equally addressed. A significant model fit statistic indicating model misspecification is not trivial just because the causes of the misspecification are trivial. It is instead the case that trivial causes are having a significant effect and thus there is a significant need for them to be addressed. The χ² model fit test is the most sensitive way to detect misspecification in latent variable models and should be adhered to above other methods even when sample size is high. In the structural equation modelling context of multiple hypotheses, a rejection of model fit does not result in the necessary rejection of each of the models hypotheses (4).
Problems with AFI:

The AFI statistic does provide a conceptually heterogeneous set of fit indices for each hypothesis, however none of these indices are accompanied by a critical value or significance level and all except one arise from unknown distributions. The fit indices are a function of χ² but unlike the χ²  fit statistic they do not have a verified statistical basis nor do they present a statistically rigorous test of model fit. Despite this satisfactory AFI values across hypotheses are being used to justify the invalidity of a significant χ² test.
Mote Carlo simulations of AFI concluded that it is not possible to determine universal cut off criteria in any forms of model tested.  Using AFI, the probability of correctly rejecting a mis-specified model decreased with increasing sample size. This is the inverse of the  statistic. Another problem with AFI compared to χ²  is that the more severe the model misspecification or correlated errors, the more unpredictable the AFI become. Again this is the inverse of what happens with the χ²  statistic (4).
The take away:

Based on the meta-analysis the following best practice principles are recommended in addition to adequate attention to the statistical assumptions of heterogeneity, independence and multivariate normality outlined above:

  1. Pay attention to distributional assumptions.
  2. Have a theoretical justification for your model.
  3. Avoid post hoc model modifications such as dropping indicators, allowing cross-loadings and correlated error terms.
  4. Avoid confirmation bias.
  5. Use an adequate estimation method.
  6. Recognise the existence of equivalence models.
  7. Justify causal inferences.
  8. Use clear reporting that is not selective.

Image:  

Michael Eid, Tanja Kutscher,  Stability of Happiness, 2014 Chapter 13 – Statistical Models for Analyzing Stability and Change in Happiness
​https://www.sciencedirect.com/science/article/pii/B9780124114784000138

​References:
(1). Latent Variables in Psychology and the Social Sciences

(2) Structural equation modelling and its application to network analysis in functional brain imaging
https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm.460020104

(3) Chapter 7: Assumptions in Structural Equation modelling
https://psycnet.apa.org/record/2012-16551-007

(4) A cautionary note on testing latent variable models
https://www.frontiersin.org/articles/10.3389/fpsyg.2015.01715/full

Do I need a Biostatistician?

Do I need a Biostatistician?

“…. half of current published peer-reviewed clinical research papers … contain at least one statistical error… When just surgical related papers were analysed, 78% were found to contain statistical errors.”

Peer reviewed published research is the go to source for clinicians and researchers to advance their knowledge on the topic at hand. It also currently the most reliable way available to do this. The rate of change in standard care and exponential development and implementation of innovative treatments and styles of patient involvement makes keeping up with the latest research paramount. (1)

Unfortunately, almost half of current published peer-reviewed clinical research papers have been shown to contain at least one statistical error, likely resulting in incorrect research conclusions being drawn from the results. When just surgical related papers were analysed, 78% were found to contain statistical errors due to incorrect application of statistical methods. (1)

Compared to 20 years ago all forms of medical research require the application of increasingly complex methodology, acquire increasingly varied forms of data, and require increasingly sophisticated approaches to statistical analysis. Subsequently the meta-analyses required to synthesise these clinical studies are increasingly advanced. Analytical techniques that would have previously sufficed and are still widely taught are now no longer sufficient to address these changes. (1)

The number of peer reviewed clinical research publications has increased over the past 12 years. Parallel to this, the statistical analyses contained in these papers are increasingly complex, as is the sophistication with which they are applied. For example, t tests and descriptive statistics were the go to statistical methodology for many highly regarded articles published in the 1970’s and 80’s. To rely on those techniques today would be insufficient, both in terms of being scientifically satisfying and in, in all likelihood, in meeting the current peer-review standards. (1)

Despite this, some concerning research has noted that these basic parametric techniques are actually currently still being misunderstood and misapplied reasonably frequently in contemporary research. They are also being increasingly relied upon (in line with the increase in research output) when in fact more sophisticated and modern analytic techniques would be better equipped and more robust in answering given research questions. (1)

Another contributing factor to statistical errors is of course ethical in nature. An recent online survey consulting biostatisticians in America revealed that inappropriate requests to change or delete data to support a hypothesis were common, as was the desire to mould the interpretation of statistical results of to fit in with expectations and established hypotheses, rather than interpreting results impartially. Ignoring violations of statistical assumptions that would deem to chosen statistical test inappropriate, and not reporting missing data that would bias results were other non-ethical requests that were reported. (2)

The use of incorrect statistical methodology and tests leads to incorrect conclusions being widely published in peer reviewed journals. Due to the reliance of clinical practitioners and researchers on these conclusions, to inform clinical practice and research directions respectively, the end result is a stunting of knowledge and a proliferation of unhelpful practices which can harm patients. (1)

Often these errors are a result of clinicians performing statistical analyses themselves without first consulting a biostatistician to design the study, assess the data and perform any analyses in an appropriately nuanced manner. Another problem can arise when researchers rely on the statistical techniques of a previously published peer-reviewed paper on the same topic. It is often not immediately apparent whether a statistician has been consulted on this established paper. Thus it is not necessarily certain whether the established paper has taken the best approach to begin with. This typically does not stop it becoming a benchmark for future comparable studies or deliberate replications. Further to this it can very often be the case that the statistical methods used have since been improved upon and other more advanced or more robust methods are now available. It can also be the case that small differences in the study design or collected data between the established study and the present study mean that the techniques used in the established study are not the most optimal techniques to address the statistical needs of present study, even if the research question is the same or very similar.

Another common scenario which can lead to the implementation of non-ideal statistical practices is under-budgeting for biostatisticians on research grant applications. Often biostatisticians are on multiple grants, each with a fairly low amount of funding allocated to the statistical component due to tight or under budgeting. This limits the statistician’s ability to focus substantially on a specific area and make a more meaningful contribution in that domain. A lack of focus prevents them from becoming a expert at this particular niche and engage in innovation.This in turn can limit the quality of the science as well as the career development of the statistician.

In order to reform and improve the state and quality of clinical and other research today, institutions and individuals must assign more value to the role of statisticians in all stages of the research process. Two ways to do this are increased budgeting for and in turn increased collaboration with statistical professionals.


References:

(1) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6106004/

​(2) https://annals.org/aim/article-abstract/2706170/researcher-requests-inappropriate-analysis-reporting-u-s-survey-consulting-biostatisticians