As a medical researcher or a small enterprise in the life sciences industry, you are likely to encounter many experts using statistical and computational techniques to study biological, clinical and other health data. These experts can come from a variety of fields such as biostatistics, bioinformatics, biometrics, clinical data science and epidemiology. Although these fields do overlap in certain ways they differ in purpose, focus, and application. All four areas listed above focus on analysing and interpreting either biological, clinical data or public health data but they typically do so in different ways and with different goals in mind. Understanding these differences can help you choose the most appropriate specialists for your research project and get the most out of their expertise. This article will begin with a brief description of these disciplines for the sake of disambiguation, then focus on biostatistics and bioinformatics, with a particular overview of the roles of biostatisticians and bioinformatics scientists in clinical trials.
Biostatisticians use advanced biostatistical methods to design and analyse pre-clinical experiments, clinical trials, and observational studies predominantly in the medical and health sciences. They can also work in ecological or biological fields which will not be the focus of this article. Biostatisticians tend to work on varied data sets, including a combination of medical, public health and genetic data in the context of clinical studies. Biostatisticians are involved in every stage of a research project, from planning and designing the study, to collecting and analysing the data, to interpreting and communicating the results. They may also be involved in developing new statistical methods and software tools. In the UK the term “medical statistician” has been in common use over the past 40 years to describe a biostatistician, particularly one working in clinical trials, but it is becoming less used due to the global nature of the life sciences industry.
Bioinformaticians use computational and statistical techniques to analyse and interpret large datasets in the life sciences. They often work with multi-omics data such as genomics, proteomics transcriptomics data and use tools such as large databases, algorithms, and specialised software programs to analyse and make sense of sequencing and other data. Bioinformaticians develop analysis pipelines and fine-tune methods and tools for analysing biological data to fit the evolving needs of researchers.
Clinical data scientists
Data scientists use statistical and computational modelling methods to make predictions and extract insights from a wide range of data. Often, data is real-world big data of which it might not be practical to analyse using other methods. In a clinical development context data sources could include medical records, epidemiological or public health data, prior clinical study data, or IOT and IOB sensor data. Data scientists may combine data from multiple sources and types. Using analysis pipelines, machine learning techniques, neural networks, and decision tree analysis this data can be made sense of. The better the quality of the input data the more precise and accurate any predictive algorithms can be.
Statistical programmers help statisticians to efficiently clean and prepare data sets and mock TFLs in preparation for analysis. They set up SDTM and ADaM data structures in preparation for clinical studies. Quality control of data and advanced macros for database management are also key skills.
Biometricians use statistical methods to analyse data related to the characteristics of living organisms. They may work on topics such as growth patterns, reproductive success, or the genetic basis of traits. Biometricians may also be involved in developing new statistical methods for analysing data in these areas. Some use the terms biostatistician and biometrician interchangeably however for the purpose of this article they remain distinct.
Epidemiologists study the distribution and determinants of diseases in populations. Using descriptive, analytical, or experimental techniques, such as cohort or case-control studies, they identify risk factors for diseases, evaluate the effectiveness of public health interventions, as well as track or model the spread of infectious diseases. Epidemiologists use data from laboratory testing, field studies, and publicly available health data. They can be involved in developing new public health policies and interventions to prevent or control the spread of diseases.
Clinical trials and the role of data experts
Clinical trials involve testing new treatments, interventions, or diagnostic tests in humans. These studies are an important step in the process of developing new medical therapies and understanding the effectiveness and safety of existing treatments.
Biostatisticians are crucial to the proper design and analysis of clinical trials. So that optimal study design can take place, they may first have to conduct extensive meta-analysis of previous clinical studies and RWE generation based on available real-world data sets or R&D results. They may also be responsible for managing the data and ensuring its quality, as well as interpreting and communicating the results of the trial. From developing the statistical analysis plan and contributing to the study protocol, to final analysis and reporting, biostatisticians have a role to play across the project time-line.
During a clinical trial, statistical programmers may prepare data sets to CDISC standards and pre-specified study requirements, maintain the database, as well as develop and implement standard SAS code and algorithms used to describe and analyse the study data.
Bioinformaticians may be involved in the design and analysis stages of clinical trials, particularly if the trial design involves the use of large data sets such as sequencing data for multi-omics analysis. They may be responsible for managing and analysing this data, as well as developing software tools and algorithms to support the analysis.
Data scientists may be involved in designing and analysing clinical trials at the planning stage, as well as in developing new tools and methods. The knowledge gleaned from data science models can be used to improve decision-making across various contexts, including life sciences R&D and clinical trials. Some applications include optimising the patient populations used in clinical trials; feasibility analysis using simulation of site performance, region, recruitment and other variables, to evaluate the impacts of different scenarios on project cost and timeline.
Biometricians and epidemiologists may also contribute to clinical trials, particularly if the trial is focused on a specific population or on understanding the factors that influence the incidence or severity of a disease. They may contribute to the design of the study, collecting and analysing the data, or interpreting the results.
Overall, the role of these experts in clinical trials is to use their varied expertise in statistical analysis, data management, and research design to help understand the safety and effectiveness of new treatments and interventions.
The role of biostatistician in clinical trials
Biostatisticians may be responsible for developing the study protocol, determining the sample size, producing the randomisation schedule, and selecting the appropriate statistical methods for analysing the data. They may also be responsible for managing the data and ensuring its quality, as well as interpreting and communicating the results of the trial.
SDTM data preparation
The Study Data Tabulation Model (SDTM) is a data standard that is used to structure and organize clinical study data in a standardized way. Depending on how a CRO is structured, either biostatisticians, statistical programmers, or both will be involved in mapping the data collected in a clinical trial to the SDTM data set, which involves defining the structure and format of the data and ensuring that it is consistent with the standard. This helps to ensure that the data is organised in a way that is universally interpretable. This process involves working with the research team to ensure the appropriate variables and categories are defined before reviewing and verifying the data to ensure that it is accurate, complete and in line with industry standards. Typically the SDTM data set will be established early at the protocol phase and populated later once trial data is accumulated.
Creating and analysing the ADaM dataset
In clinical trials, the Analysis Data Model (ADaM) is a data set model used to structure and organize clinical trial data in a standardized way for the purpose of statistical analysis. ADaM data sets are used to store the data that will be analysed as part of the clinical trial, and are typically created from the Study Data Tabulation Model (SDTM) data sets, which contain the raw data collected during the trial. This helps to ensure the reliability and integrity of the data, and makes it easier to analyse and interpret the results of the trial.
Biostatisticians and statistical programmers are responsible for developing ADaM data sets from the SDTM data, which involves selecting the relevant variables and organizing them in a way that is appropriate for the particular statistical analyses that will be conducted. While statistical programmers may create derived variables, produce summary statistics, TFLs, and organise the data into appropriate datasets and domains, biostatisticians are responsible for conducting detailed statistical analyses of the data and interpreting the results. This may include tasks such as testing hypotheses, identifying patterns and trends in the data, and developing statistical models to understand the relationships between the data and the research questions the trial seeks to answer.
The role of biostatisticians, specifically, in developing ADaM data sets from SDTM data is to use their expertise in statistical analysis and research design to guide statistical programmers in ensuring that the data is organised, structured, and formatted in a way that is appropriate for the analyses that will be conducted, and to help understand and interpret the results of the trial.
A Biostatistician’s role in study design & planning
Biostatisticians play a critical role in the design, analysis, and interpretation of clinical trials. The role of the biostatistician in a clinical trial is to use their expertise in statistical analysis and research design to help ensure that the trial is conducted in a scientifically rigorous and unbiased way, and to help understand and interpret the results of the trial. Here is a general overview of the tasks that a biostatistician might be involved in during the different stages of a clinical trial:
Clinical trial design: Biostatisticians may be involved in designing the clinical trial, including determining the study objectives, selecting the appropriate study population, and developing the study protocol. They are responsible for determining the sample size and selecting the appropriate statistical methods for analysing the data. Often in order to carry out these tasks, preparatory analysis will be necessary in the form of detailed meta-analysis or systematic review.
Sample size calculation: Biostatisticians are responsible for determining the required sample size for the clinical trial. This is an important step, as the sample size needs to be large enough to detect a statistically significant difference between the treatment and control groups, but not so large that the trial becomes unnecessarily expensive or time-consuming. Biostatisticians use statistical algorithms to determine the sample size based on the expected effect size, the desired level of precision, and the expected variability of the data. This information is informed by expert opinion and simulation of the data from previous comparable studies.
Randomisation schedules: Biostatisticians develop the randomisation schedule for the clinical trial, which is a plan for assigning subjects to the treatment and control groups in a random and unbiased way. This helps to ensure that the treatment and control groups are similar in terms of their characteristics, which helps to reduce bias or control for confounding factors that might affect the results of the trial.
Protocol development: Biostatisticians are involved in developing the statistical and methodological sections of the clinical trial protocol, which is a detailed plan that outlines the objectives, methods, and procedures of the study. In addition to outlining key research questions and operational procedures the protocol should include information on the study population, the interventions being tested, the outcome measures, and the data collection and analysis methods.
Data analysis: Biostatisticians are responsible for analysing the data from the clinical trial, including conducting interim analyses and making any necessary adjustments to the protocol. They play a crucial role in interpreting the results of the analysis and communicating the findings to the research team and other stakeholders.
Final analysis and reporting: Biostatisticians are responsible for conducting the final analysis of the data and preparing the final report of the clinical trial. This includes summarising the results, discussing the implications of the findings, and making recommendations for future research.
The role of bioinformatician in biomarker-guided clinical studies.
Biomarkers are biological characteristics that can be measured and used to predict the likelihood of a particular outcome, such as the response to a particular treatment. Biomarker-guided clinical trials use biomarkers as a key aspect of the study design and analysis. In biomarker-guided clinical trials where the biomarker is based on genomic sequence data, bioinformaticians may play a particularly important role in managing and analysing the data. Genomic and other omics data is often large and complex, and requires specialised software tools and algorithms to analyse and interpret. Bioinformaticians develop and implement these tools and algorithms, as well as for managing and analysing the data to identify patterns and relationships relevant to the trial. Bioinformaticians use their expertise in computational biology to to help understand the relationship between multi-omics data and the outcome of the trial, and to identify potential biomarkers that can be used to guide treatment decisions.
Processing sequencing data is a key skill of bioinformaticians that involves several steps, which may vary depending on the specific goals of the analysis and the type of data being processed. Here is a general overview of the steps that a bioinformatician might take to process sequencing data:
- Data pre-processing: Cleaning and formatting the data so that it is ready for analysis. This may include filtering out low-quality data, correcting errors, and standardizing the format of the data.
- Mapping: Aligning the sequenced reads to a reference genome or transcriptome in order to determine their genomic location. This can be done using specialized software tools such as Bowtie or BWA.
- Quality control: Checking the quality of the data and the alignment, and identifying and correcting any problems that may have occurred during the sequencing or mapping process. This may involve identifying and removing duplicate reads, or identifying and correcting errors in the data.
- Data analysis: Using statistical and computational techniques to identify patterns and relationships in the data such as identifying genetic variants, analysing gene expression levels, or identifying pathways or networks that are relevant to the study.
- Data visualization: Creating graphs, plots, and other visualizations to help understand and communicate the results of the analysis.
Once omics data has been analysed, the insights obtained can be used for tailoring therapeutic products to patient populations in a personalised medicine approach.
A changing role of data experts in life sciences R&D and clinical research
Due to the need for better therapies and health solutions, researchers are currently defining diseases at more granular levels using multi-omics insights from DNA sequencing data which allows differentiation between patients in the biomolecular presentation of their disease, demographic factors, and their response to treatment. As more and more of the resulting therapies reach the market the health care industry will need to catch up in order to provide these new treatment options to patients.
Even after a product receives regulatory approval, payers can opt not to reimburse patients, so financial benefit should be demonstrated in advance where possible. Patient reported outcomes and other health outcomes are becoming important sources of data to consider in evidence generation. Evidence provided to payers should aim to demonstrate financial as well as clinical benefit of the product.
In this context, regulators are becoming aware of the need for innovation in developing new ways of collecting treatment efficacy and other data used to assess novel products for regulatory approval. The value of observational studies and real-world-data sources as a supplement clinical trial data is being acknowledged as a legitimate and sometimes necessary part of the product approval process. Large scale digitisation now makes it easier to collect patient-centric data directly from clinical trial participants and users via devices and apps. Establishing clear evidence expectations from regulatory agencies then Collaborating with external stakeholders, data product experts, and service-providers to help build new evidence-building approaches.
Expert data governance and quality control is crucial to the success of any new methods to be implemented analytically. Data from different sources, such as IOT sensor data, electronic health records, sequencing data for multi-omics analysis, and other large data sets, has to be combined cautiously and with robust expert standards in place.
From biostatistics, bioinformatics, data science, CAS, and epidemiology for public heath or post-market modelling; a bespoke team of integrated data and analytics specialists is now as important to a product development project as the product itself to gaining competitiveness and therefore success in the marketplace. Such a team should be applying a combination of established data collection methodologies eg. clinical trials and systematic review, and innovative methods such as machine learning models that draw upon a variety of real world data sources to find a balance between advancing important innovation and mitigating risk.