Blog Post

The Devil’s Advocate: Stata for Clinical Study Design, Data Processing, & Statistical Analysis of Clinical Trials.

Stata is a powerful statistical analysis software that offers some advantages for clinical trial and medtech use cases compared to the more widely used SAS software. Stata provides an intuitive and user-friendly interface that facilitates efficient data management, data processing and statistical analysis. Its agile and concise syntax allows for reproducible and transparent analyses, enhancing the overall research process with more readily accessible insights.

Distinct from R, which incorporates S based coding, both Stata and SAS have used C based programming languages since 1985.  All three packages can parse full Python within their environment for advanced machine learning capabilities, in addition to those available natively. In Stata’s case this is achieved through the pystata python package. Despite a common C based language, there are tangible differences between Stata and SAS syntax. Stata generally needs less lines of code on average compared to SAS to perform the same function and thus tends to be more concise. Stata also offers more flexibility to how you code as well as more informative error statements which makes debugging a quick and easy process, even for beginners.

When it comes to simulations and more advanced modelling our experience had been that the Basic Edition of Stata (BE) is faster and uses less memory to perform the same task compared to Base SAS. Stata BE certainly has more inbuilt capabilities than you would ever need for the design and analysis of advanced clinical trials and sophisticated statistical modelling of all types. There is also the additional benefit of thousands of user-built packages, such as the popular WinBugs, that can be instantly installed as add-ons at no extra cost. Often these packages are designed to make existing Stata functions even more customisable for immense flexibility and programming efficiency.  Both Stata and SAS represent stability and reliability and have enjoyed widespread industry adoption. SAS has been more widely adopted by big pharma and Stata more-so with public health and economic modelling. 

It has been nearly a decade since the Biostatistics Collaboration of Australia (BCA) which determines Biostatistics education nationwide has transitioned from teaching SAS and R as part of their Masters of Biostatistics programs to teaching Stata and R. This transition initially was made in anticipation of an industry-wide shift from SAS to Stata. Whether their predictions were accurate or not, the case for Stata use in clinical trials remains strong.

Stata is almost certainly a superior option for bootstrapped life science start-ups and SMEs. Stata licencing fees are in the low hundreds of pounds with the ability to quickly purchase over the Stata website, while SAS licencing fees span the tens to hundreds of thousands and often involve a drawn-out process just to obtain a precise quote.

Working with a CRO that is willing to use Stata means that you can easily re-run any syntax provided from the study analysis to verify or adapt it later. Of course, open-source software such as R is also available, however Stata has the advantage of a reduced learning curve being both user-friendly and sufficiently sophisticated.

Stata for clinical trials

  1. Industry Adoption:

Stata has gained significant popularity and widespread adoption in the field of clinical research. It is commonly used by researchers, statisticians, and healthcare professionals for the statistical analysis of clinical data.

2. Regulatory Compliance and CDISC standardisation:

Stata provides features and capabilities that support regulatory compliance requirements in clinical trials. While it may not have the same explicit recognition from CDISC as SAS, Stata does lend itself well to CDISC compliance and offers tools for documentation, data tracking, and audit trails to ensure transparency and reproducibility in analyses.

3. Comprehensive Statistical Procedures:

A key advantage of Stata is its extensive suite of built-in statistical functions and commands specifically designed for clinical trial data analysis. Stata offers a wide range of methods for handling missing data, performing power calculations, and of course a wide range of methods for analysing clinical trial data; from survival analysis methods, generalized linear models, mixed-effects models, causal inference, and Bayesian simulation for adaptive designs. Preparatory tasks for clinical trials such as meta-analysis, sample size calculation and randomisation schedules are arguably easier to achieve in Stata than SAS. These built-in functionalities empower researchers to conduct various analyses within a single software environment.

4. Efficient Data Management:

Stata excels in delivering agile data management capabilities, enabling efficient data handling, cleaning, and manipulation. Its intuitive data manipulation commands allow researchers to perform complex transformations, merge datasets, handle missing data, and generate derived variables seamlessly.

Perhaps the greatest technical advantage of Stata over SAS in the context of clinical research is usability and greater freedom to keep open and refer to multiple data sets with multiple separate analyses at the same time. While SAS can keep many data sets in memory for a single project, Stata can keep many data sets in siloed memory for simultaneous use in different windows to enable viewing or working on many different projects at the same time. This approach can make workflow easier because no data step is required to identify which data set you are referring to, instead the appropriate sections of any data sets can be merged with the active project as needed and due to siloing, which works similarly to tabs in a browser, you do not get the log, data or output of one project mixed up with another. This is arguably an advantage for biostatisticians and researchers alike who typically do need to compare unrelated data sets or the statistical results from separate studies side-by-side.

5. Interactive and Reproducible Analysis:

Stata provides an interactive programming environment that allows users to perform data analysis in a step-by-step manner. The built-in “do-file” functionality facilitates reproducibility by capturing all commands and results, ensuring transparency and auditability of the analysis process. The results and log window for each data set prints out the respective syntax required item by item. This syntax can easily be pasted into the do-file or the command line to edit or repeat the command with ease. SAS on the other hand tends to separate the results from the syntax used to derive it.

6. Graphics and Visualization:

While not traditionally known for this, Stata actually offers a wide range of powerful and customizable graphical capabilities. Researchers can generate high-quality publication standard  plots and charts of any description needed to visualise clinical trial results Common examples include survival curves, forest plots, spaghetti and diagnostic plots. Stata also has built-in options to perform all necessary assumption and model checking for statical model development.

These visualisations facilitate the exploration and presentation of complex data patterns, as well as the presentation, and communication of findings. There are many user-created customisation add-ons for data visualisation that rival what is possible in R customisation.

The one area of Stata that users may find limiting is that it is only possible to display one graph at a time per active data set. This means that you do need to copy graphs as they are produced and save them into a document to compare multiple graphs side by side.

7. Active User Community and Support:

Like SAS, Stata has a vibrant user community comprising researchers, statisticians, and experts who actively contribute to discussions, share knowledge, and provide support. StataCorp, the company behind Stata, offers comprehensive documentation, online resources, and user forums, ensuring users have access to valuable support and assistance when needed. Often the resources available for Stata are more direct and more easily searchable than what is available for SAS when it comes to solving customisation quandaries. This is of course bolstered by the availability of myriad instant package add-ons.

Stata’s active and supportive user community is a notable advantage. Researchers can access extensive documentation, online forums, and user-contributed packages, which promote knowledge sharing and facilitate problem-solving. Additionally, Stata’s reputable technical support ensures prompt assistance for any software-related queries or challenges.

While SAS and Stata have their respective strengths, Stata’s increasing industry adoption, statistical capabilities, data management features, reproducibility, visualisation add-ons, and support community make it a compelling choice for clinical trial data analysis.

As it stands, SAS remains the most widely used software in big-pharma for clinical trial data analysis. Stata however offers distinct advantages in terms of user-friendliness, tailored statistical functionalities, advanced graphics, and a supportive user community. Consider adopting Stata to streamline your clinical trial analyses and unlock its vast potential for gaining insights from research outcomes. An in-depth overview of Stata 18 can be found here. A summary of it’s features for biostatisticians can be found here.

Further reading:

Using Stata for Handling CDISC Complient Data Sets and Outputs (lexjansen.com)

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts