Dynamic Systems Modelling and Complex Adaptive Systems (CAS) Techniques in Biomedicine and Public Health

Dynamical systems modelling is a mathematical approach to studying the behaviour of systems that change over time. These systems can be physical, biological, economic, or social in nature, and they are typically characterized by a set of variables that evolve according to certain rules or equations.

CAS (Complex Adaptive Systems) models are a specific type of dynamical systems model that are used to study systems that are complex, adaptive, and composed of many interconnected parts. These systems are often found in natural and social systems, and they are characterized by a high degree of uncertainty, nonlinearity, and emergence.

To build a dynamical systems model, one typically starts by identifying the variables that are relevant to the system being studied and the relationships between them. These relationships are usually represented by a set of equations or rules that describe how the variables change over time. The model is then simulated or analysed to understand the system’s behaviour under different conditions and to make predictions about its future evolution.

CAS models are often used to study systems that exhibit emergent behaviour, where the behaviour of the system as a whole is more than the sum of its parts. These models can help us understand how complex systems self-organize, adapt, and evolve over time, and they have applications in fields such as biology, economics, social science, and computer science.

Whatever the approach, a model is intended to represent the real system, but there are limits to the application of models. The reliability of any model often falls short when attempting to operate within and apply the parameter boundaries of the model to any real life context.

 The previous article outlined some basic characteristics of complex adaptive systems (CAS). The CAS approach to modelling real world phenomena requires a different approach to the more conventional predictive modelling paradigm. Complex adaptive systems such as ecosystems, biological systems, or social systems require looking at interacting elements and observing the patterns that arise, creating boundary conditions from these patterns, running experiments or simulations, and responding to the outcomes in an adaptive way.

To further delineate the complex systems domain in practical terms we can use the Cynefin framework developed by David Snowden et al. to contrast the Simple, Complicated, Complex and Chaotic domains. For the purpose of this article the Chaotic domain will be ignored.

Enabling constraints of CAS models

In contrast to complex domain is the “known” or “simple” domain represented by ordered systems such as a surgical operating theatre or clinical trials framework. These ordered systems are rigidly constrained and can be planned and designed in advance based upon prior knowledge. In this context best practice can be applied because an optimal way of doing things is pre-determined.

The intermediary between the simple and complex domains is the “knowable” or ” complicated” domain. An example of such is the biostatistical analysis of basic clinical data. Within a complicated system there is a right answer that we can discover and design for. In this domain we can apply good practice based on expert advice (not best practice) as a right and wrong way of doing things can be determined with analysis.

Complex domain represents a system that is in flux and not predictable in the linear sense. A complex adaptive system can be operating in a state that is anywhere from equilibrium to the edge of chaos. In order to understand the system state one should perform experiments that probe relationships between entities. Due to the lack of linearity, multiple simultaneous experimental probes should occur in parallel, not in sequence, with the goal of better understanding processes. Emergent practice is determined in line with observed, evolving patterns. Ideally, decentralised Interpretation of data should be distributed to system users themselves rather than determined by a single expert in a centralised fashion.

As opposed to operating from a pre-supposed framework, the CAS structure should be allowed to emerge from the data under investigation. This avoids the confirmation bias that occurs when data are fitted to a predefined framework regardless of whether this framework best represents the data being modelled. Following on from this, model boundaries should also be allowed to emerge from the data itself.

Determining unarticulated needs from clusters of agent anecdotes or data points is a method of determining where improvement needs to occur in service provision systems. Yet this method forms an analogy that is mimicked in biological systems as well if an ABM was to be applied in a biomolecular context.

In understanding CAS, dispositionality of system states rather than linear causality should be the focus . Rather than presuming an inherent certainty as to “if I do A, B will result”, instead dispositional states arise as a result of A, which may result in B, but the evolution of which cannot be truly predicted.

“The longer you hold things in a state of transition the more options you’ve got” linear iterations based on a defined requirement with a degree of ambiguity which should be explored rather than eliminated. The opposite of standard statistical approach.

CAS modelling should include real-time feedback loops over multiple agents to avoid cognitive bias. In CAS modelling, every behaviour or interaction will produce unintended consequences, for this reason, David Snowden suggests, Small, fast experiments should be run in parallel, so that any bad, unintended consequences can be mitigated and the good ones amplified.

Modes of analysis and modelling:

System dynamics models (SDM)

  • A SDM simulates the movements of entities within the system and can be used to investigate macro behaviour of the system.
  • Changes to system state variables over time are modelled using differential equations.
  • SDMs are multi-dimentional, non-linear and include feedback mechanisms.

  • Visual representations of the model can be produced using stock and flow diagrams to summarise interdependencies between key system variables.
  • Dynamic hypotheses of the system model can be represented in a causal loop diagram
  • SDM is appropriate for modelling aggregate flows, trends, sub-system behaviour.

Agent based models (ABM)

  • ABMs can be used to investigate micro behaviour of the system from more of a bottom-up perspective through Intricate flows of individual based activity.
  • State changes of individual agents are simulated by ABMs rather than the broader entites captured by SDM
  • Multiple types of agent are operating within the same complex adaptive system modelled
  • Data within the ABM can be aggregated to infer more macro or top-down system behaviour.

Agents within the ABM can make decisions, engage in behaviour defined by simple rules and attributes, learn from experience and from feedback from interactions with other agents or the modelled environment. This is as true in models of human systems as it is with molecular scale systems. In both examples agents can par take in communication on a one to one, one to many and one to location basis. Previously popular models such as discrete event simulation (DES) was implemented to model passive agents at a finite time rather than active “decision makers” over dynamic periods that are a feature of ABMs.

Hybrid Models

  • Both ABM and SDM are complimentary techniques for simulating micro and macro level behaviour of complex adaptive systems and therefore engaging in exploratory style analysis of such systems.
  • Hybrid models emulate individual agent variability as well as variability in the the behaviour of aggregate entities they compose.
  • Simulate macro and micro level system behaviour in many areas of investigation such as health service provision, biomedical science.

Hybrid models have the ability to combine two or more types of simulation within the same model. These models can combine SDMs and ABMs, or other techniques, to address both top-down and bottom-up micro and macro dynamics in a single model that more closely captures whole system behaviour. This has the potential to elevate many of the necessary trade-offs of using one of the simulation types alone.

As software capability develops we are seeing an increased application of hybrid modelling techquiques. Previously wide-spread techniques such as DES and Markov models, which are one-dimentional, uni-directional, linear, are now proving inadequate in the task of modelling the complex adaptive and dynamic world we inhabit.

Model Validation Techniques

SDMs and ABMs are not fitted to observed data but instead use both qualitative and quantitative real world data to inform and develop the model and it’s parameters as a simulation of real world phenomena. For this reason model validation of SDMs and ABMs should be even more rigorous than for more traditional models such as maximum likelihood or least squares methods. Sensitivity analysis and validation tests such as behavioural validity tests can be used to compare model output against real-world data from organisations or experiments, relevant to the scale of the model being validated.

Structure of the model such as

  • Checking how the model behaves when subject to extreme parameter values.
  • Things like dimensional consistency, boundary adequacy, mass balance
  • Sensitivity analysis – how sensitive is the model to changes in key parameters.

Network Analysis

Data accrual from diverse data sources challenges and limitations

While complex systems theory has origins in the mathematics chaos theory, there are many examples contemporaneously where complex systems theory has been divorced form the mathematics and statistical modelling and applied in diverse fields such as business and healthcare or social services provision. Mathematical modelling adds validity to complex systems analysis. The problem with completing solely qualitative analysis without the empiricism of mathematical modelling, simulation and checking against a variety of real world data sets, the results

Latent Variable Modelling And The Chi Squared Exact Fit Statistic

Latent variable modelling and the chi squared exact fit statistic

Latent variable models are exploratory statistical models used extensively throughout clinical and experimental research in medicine and the life sciences in general. Psychology and neuroscience are two key sub-disciplines where latent variable models are routinely employed to answer a myriad of research questions from the impact of personality traits on success metrics in the workplace (1) to measuring inter-correlated activity of neural populations in the human brain based on neuro-imaging data (2). Through latent variable modelling, dispositions, states or process which must be inferred rather than directly measured can be linked causally to more concrete measurements.
Latent variable models are exploratory or confirmatory in nature in the sense that they are designed to uncover causal relationships between observable or manifest variables and corresponding latent variables in an inter-correlated data set. They use structural equation modelling (SEM) and more specifically factor analysis techniques to determine these causal relationships which and allow the testing of numerous multivariate hypotheses simultaneously. A key assumption of SEM is that the model is fully correctly specified. The reason for this is this is that one small misspecification can affect all parameter estimations in the model, rendering inaccurate approximations which can combine in unpredictable ways (3).

With any postulated statistical model it is imperative to assess and validate the model fit before concluding in favour of the integrity of the model and interpreting results. The acceptable way to do this across all structural equation models is the chi squared (χ²) statistic.

A statistically significant χ² statistic is indicative of the following:

  • A systematically miss-specified model with the degree of misspecification a function of the χ² value.
  • The set of parameters specified in the model do not adequately fit the data and thus that the parameter estimates of the model are inaccurate. As χ² operates on the same statistical principles as the parameter estimation, it follows that in order to trust the parameter estimates of the model we must also trust the χ², or vice versa.
  •  As a consequence there is a need for an investigation of where these misspecification have occurred and a potential readjustment of the model to improve its accuracy.

While one or more incorrect hypotheses may have caused the model misspecification, the misspecification could equally have resulted from other causes. It is important to thus investigate the causes of a significant model fit test . In order to properly do this the following should be evaluated:

  • Heterogeneity:
  •  Does the causal model vary between sub groups of subjects?
  • Are there any intervening within subject variables?
  • Independence:
  • Are the observations truly independent?
  • Latent variable models involve two key assumptions: that all manifest variables are independent after controlling for any latent variables and, an individual’s position on a manifest variable is the result of that individual’s position on the corresponding latent variable (3).
  • Multivariate normality:
  • Is the multivariate normality assumption satisfied?

The study:

A 2015 meta-analysis of 75 latent variable studies drawn from 11 psychology journals has highlighted a tendency in clinical researchers to ignore the χ² exact fit statistic when reporting and interpreting the results of the statistical analysis of latent variable models (4).
97% of papers reported at least one appropriate model, despite the fact that 80% of these did not pass the criteria for model fit and the χ² exact fit statistic was ignored. Only 2% of overall studies concluded that the model doesn’t fit at all and one of these interpreted a model anyway (4).
Reasons for ignoring the model fit statistic: overly sensitive to sample size, penalises models when number of variables is high, general objection to the logic of exact fit hypothesis. Overall broach consensus of preference for Approximate fit indices (AFI).
AFI are instead applied in these papers to justify the models. This typically leads to questionable conclusions. In all just 41% of studies reported χ² model fit results. 40% of the studies that failed to report a p value for the reported χ² value did report a degrees of freedom. When this degrees of freedom was used to cross check the unreported p values, all non-reported p values were in fact significant.
The model fit function was usually generated through maximum likelihood methods, however 43% of studies failed to report which fit function was used.
A further tendency to accept the approximate fit hypothesis when in fact there was no or little evidence of approximate fit. This lack of thorough model examination empirical evidence of questionable validity. 30% of studies showed custom selection of more lax cut-off criteria for the approximate fit statistics than was conventionally acceptable, while 53% failed to report on cut-off criteria at all.
Assumption testing for univariate normality was assessed in only 24% of studies (4).
Further explanation of  χ² and model fit:

The larger the data set the more that increasingly trivial discrepancies are detected as a source of model misspecification. This does not mean that trivial discrepancies become more important to the model fit calculation, it means that the level of certainty with which these discrepancies can be considered important has increased. In other words, the statistical power has increased. Model misspecification can be the result of both theoretically relevant and irrelevant/peripheral causal factors which both need to be equally addressed. A significant model fit statistic indicating model misspecification is not trivial just because the causes of the misspecification are trivial. It is instead the case that trivial causes are having a significant effect and thus there is a significant need for them to be addressed. The χ² model fit test is the most sensitive way to detect misspecification in latent variable models and should be adhered to above other methods even when sample size is high. In the structural equation modelling context of multiple hypotheses, a rejection of model fit does not result in the necessary rejection of each of the models hypotheses (4).
Problems with AFI:

The AFI statistic does provide a conceptually heterogeneous set of fit indices for each hypothesis, however none of these indices are accompanied by a critical value or significance level and all except one arise from unknown distributions. The fit indices are a function of χ² but unlike the χ²  fit statistic they do not have a verified statistical basis nor do they present a statistically rigorous test of model fit. Despite this satisfactory AFI values across hypotheses are being used to justify the invalidity of a significant χ² test.
Mote Carlo simulations of AFI concluded that it is not possible to determine universal cut off criteria in any forms of model tested.  Using AFI, the probability of correctly rejecting a mis-specified model decreased with increasing sample size. This is the inverse of the  statistic. Another problem with AFI compared to χ²  is that the more severe the model misspecification or correlated errors, the more unpredictable the AFI become. Again this is the inverse of what happens with the χ²  statistic (4).
The take away:

Based on the meta-analysis the following best practice principles are recommended in addition to adequate attention to the statistical assumptions of heterogeneity, independence and multivariate normality outlined above:

  1. Pay attention to distributional assumptions.
  2. Have a theoretical justification for your model.
  3. Avoid post hoc model modifications such as dropping indicators, allowing cross-loadings and correlated error terms.
  4. Avoid confirmation bias.
  5. Use an adequate estimation method.
  6. Recognise the existence of equivalence models.
  7. Justify causal inferences.
  8. Use clear reporting that is not selective.


Michael Eid, Tanja Kutscher,  Stability of Happiness, 2014 Chapter 13 – Statistical Models for Analyzing Stability and Change in Happiness

(1). Latent Variables in Psychology and the Social Sciences

(2) Structural equation modelling and its application to network analysis in functional brain imaging

(3) Chapter 7: Assumptions in Structural Equation modelling

(4) A cautionary note on testing latent variable models

Do I need a Biostatistician?

Do I need a Biostatistician?

“…. half of current published peer-reviewed clinical research papers … contain at least one statistical error… When just surgical related papers were analysed, 78% were found to contain statistical errors.”

Peer reviewed published research is the go to source for clinicians and researchers to advance their knowledge on the topic at hand. It also currently the most reliable way available to do this. The rate of change in standard care and exponential development and implementation of innovative treatments and styles of patient involvement makes keeping up with the latest research paramount. (1)

Unfortunately, almost half of current published peer-reviewed clinical research papers have been shown to contain at least one statistical error, likely resulting in incorrect research conclusions being drawn from the results. When just surgical related papers were analysed, 78% were found to contain statistical errors due to incorrect application of statistical methods. (1)

Compared to 20 years ago all forms of medical research require the application of increasingly complex methodology, acquire increasingly varied forms of data, and require increasingly sophisticated approaches to statistical analysis. Subsequently the meta-analyses required to synthesise these clinical studies are increasingly advanced. Analytical techniques that would have previously sufficed and are still widely taught are now no longer sufficient to address these changes. (1)

The number of peer reviewed clinical research publications has increased over the past 12 years. Parallel to this, the statistical analyses contained in these papers are increasingly complex, as is the sophistication with which they are applied. For example, t tests and descriptive statistics were the go to statistical methodology for many highly regarded articles published in the 1970’s and 80’s. To rely on those techniques today would be insufficient, both in terms of being scientifically satisfying and in, in all likelihood, in meeting the current peer-review standards. (1)

Despite this, some concerning research has noted that these basic parametric techniques are actually currently still being misunderstood and misapplied reasonably frequently in contemporary research. They are also being increasingly relied upon (in line with the increase in research output) when in fact more sophisticated and modern analytic techniques would be better equipped and more robust in answering given research questions. (1)

Another contributing factor to statistical errors is of course ethical in nature. An recent online survey consulting biostatisticians in America revealed that inappropriate requests to change or delete data to support a hypothesis were common, as was the desire to mould the interpretation of statistical results of to fit in with expectations and established hypotheses, rather than interpreting results impartially. Ignoring violations of statistical assumptions that would deem to chosen statistical test inappropriate, and not reporting missing data that would bias results were other non-ethical requests that were reported. (2)

The use of incorrect statistical methodology and tests leads to incorrect conclusions being widely published in peer reviewed journals. Due to the reliance of clinical practitioners and researchers on these conclusions, to inform clinical practice and research directions respectively, the end result is a stunting of knowledge and a proliferation of unhelpful practices which can harm patients. (1)

Often these errors are a result of clinicians performing statistical analyses themselves without first consulting a biostatistician to design the study, assess the data and perform any analyses in an appropriately nuanced manner. Another problem can arise when researchers rely on the statistical techniques of a previously published peer-reviewed paper on the same topic. It is often not immediately apparent whether a statistician has been consulted on this established paper. Thus it is not necessarily certain whether the established paper has taken the best approach to begin with. This typically does not stop it becoming a benchmark for future comparable studies or deliberate replications. Further to this it can very often be the case that the statistical methods used have since been improved upon and other more advanced or more robust methods are now available. It can also be the case that small differences in the study design or collected data between the established study and the present study mean that the techniques used in the established study are not the most optimal techniques to address the statistical needs of present study, even if the research question is the same or very similar.

Another common scenario which can lead to the implementation of non-ideal statistical practices is under-budgeting for biostatisticians on research grant applications. Often biostatisticians are on multiple grants, each with a fairly low amount of funding allocated to the statistical component due to tight or under budgeting. This limits the statistician’s ability to focus substantially on a specific area and make a more meaningful contribution in that domain. A lack of focus prevents them from becoming a expert at this particular niche and engage in innovation.This in turn can limit the quality of the science as well as the career development of the statistician.

In order to reform and improve the state and quality of clinical and other research today, institutions and individuals must assign more value to the role of statisticians in all stages of the research process. Two ways to do this are increased budgeting for and in turn increased collaboration with statistical professionals.


(1) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6106004/

​(2) https://annals.org/aim/article-abstract/2706170/researcher-requests-inappropriate-analysis-reporting-u-s-survey-consulting-biostatisticians

Transforming Skewed Data: How to choose the right transformation for your distribution

​Innumerable statistical tests exist for application in hypothesis testing based on the shape and nature of the pertinent variable’s distribution. If however the intention is to perform a parametric test – such as ANOVA, Pearson’s correlation or some types of regression – the results of such a test will be more valid if the distribution of the dependent variable(s) approximates a Gaussian (normal) distribution and the assumption of homoscedasticity is met. In reality data often fails to conform to this standard, particularly in cases where the sample size is not very large. As such, data transformation can serve as a useful tool in readying data for these types of analysis by improving normality, homogeneity of variance or both.For the purposes of Transforming Skewed Data, the degree of skewness of a skewed distribution can be classified as moderate, high or extreme. Skewed data will also tend to be either positively (right) skewed with a longer tail to the right, or negatively (left) skewed with a longer tail to the left. Depending upon the degree of skewness and whether the direction of skewness is positive or negative, a different approach to transformation is often required. As a short-cut, uni-modal distributions can be roughly classified into the following transformation categories:

This article explores the transformation of a positively skewed distribution with a high degree of skewness. We will see how four of the most common transformations for skewness – square root, natural log, log to base 10, and inverse transformation – have differing degrees of impact on the distribution at hand. It should be noted that the inverse transformation is also known as the reciprocal transformation. In addition to the transformation methods offered in the table above Box-Cox transformation is also an option for positively skewed data that is >0. Further the Yeo-Johnson transformation is an extension of the Box-Cox transformation which does not require the original data values to be positive or >0.
The following example takes medical device sales in thousands for a sample of 2000 diverse companies. The histogram below indicates that the original data could be classified as “high(er)” positive skewed.
​The skew is in fact quite pronounced – the maximum value on the x axis extends beyond 250 (the frequency of sales volumes beyond 60 are so sparse as to make the extent of the right tail imperceptible) – it is however the highly leptokurtic distribution that that lends this variable to be better classified as high rather than extreme. It is in fact log-normal – convenient for the present demonstration. From inspection it appears that the log transformation will be the best fit in terms of normalising the distribution.

​​Starting with a more conservative option, the square root transformation, a major improvement in the distribution is achieved already. The extreme observations contained in the right tail are now more visible. The right tail has been pulled in considerably and a left tail has been introduced. The kurtosis of the distribution has reduced by more than two thirds.

​A natural log transformation proves to be an incremental improvement yielding the following results:
​This is quite a good outcome – the right tail has been reduced considerably while the left tail has extended along the number line to create symmetry. The distribution now roughly approximates a normal distribution. An outlier has emerged at around -4.25, while extreme values of the right tail have been eliminated. The kurtosis has again reduced considerably.

Taking things a step further and apply a log to base 10 transformation yields the following:
​In this case the right tail has been pulled in even further and the left tail extended less than the previous example. Symmetry has improved and the extreme value in the left tail has been bought closer in to around -2. The log to base ten transformation has provided an ideal result – successfully transforming the log normally distributed sales data to normal.

In order to illustrate what happens when a transformation that is too extreme for the data is chosen, an inverse transformation has been applied to the original sales data below.
​Here we can see that the right tail of the distribution has been brought in quite considerably to the extent of increasing the kurtosis. Extreme values have been pulled in slightly but still extend sparsely out towards 100. The results of this transformation are far from desirable overall.

Some thing to note is that in this case the log transformation has caused data that was previously greater than zero to now be located on both sides of the number line. ​Depending upon the context, data containing zero may become problematic when interpreting or calculating the confidence intervals of un-back-transformed data.  As  log(1)=0,  any data containing values <=1 can be made >0 by adding a constant to the original data so that the minimum raw value becomes >1 . Reporting un-back-transformed data can be fraught at the best of times so back-transformation of transformed data is recommended. Further information on back-transformation can be found here. 

Adding a constant to data is not without it’s impact on the transformation. As the below example illustrates the effectiveness of the log transformation on the above sales data is effectively diminished in this case by the addition of a constant to the original data.

​​​Depending on the subsequent intentions for analysis  this may be the preferred outcome for your data –  it is certainly an adequate improvement and has rendered the data approximately normal for most parametric testing purposes.

Taking the transformation a step further and applying the inverse transformation to the sales + constant data, again, leads to a less optimal result for this particular set of data – indicating that the skewness of the original data is not quite extreme enough to benefit from the inverse transformation.

​​It is interesting to note that the peak of the distribution has been reduced whereas an increase in leptokurtosis occurred for the inverse transformation of the raw distribution. This serves to illustrate how a small alteration in the data can completely change the outcome of a data transformation without necessarily changing the shape of the original distribution.

There are many varieties of distribution, the below diagram depicting only the most frequently observed. If common data transformations have not adequately ameliorated your skewness, it may be more reasonable to select a non-parametric hypothesis test that is based on an alternate distribution.

​Image credit: cloudera.com

Accessing Data Files when Using SAS via Citrix Receiver

 ​A SAS licence can be prohibitively expensive for many use cases. Installing the software can also take up a surprising amount of hard disk space and memory. For this reason many individuals with light or temporary usage needs choose to access a version of SAS which is licenced to their institution and therefore shared across many users. Are you trying unsuccessfully to access an SAS remotely via your institution using Citrix receiver? This step-by-step guide might help.

SAS syntax can differ based on whether a remote versus local server is used. An example of a local server is the computer you are physically using. When you have SAS installed on the PC you are using, you are accessing it locally. A remote server, on the other hand, allows you to access SAS without having SAS installed on your PC.

Client software such as Citrix Receiver, allows you to access SAS, and other software, from a remote server. Citrix Receiver is often used by university students, new and/or light users. SAS requires different syntax in order to enable the remote server to access data files on a local computer.
For the purpose of this example we are assuming that the data file we wish to access is located, locally, on a drive of the computer we are using.

It can be difficult to find the syntax for this on Google, where search results deal more with accessing remote data (libraries) using local SAS than the other way around. This can be a source of frustration for new users and SAS Technical Support are not always able to advise on the specifics of using SAS via Citrix Receiver.

The “INFILE”statement and the “PROC IMPORT” statement are two popular options for reading a data file into SAS. INFILE offers the greater flexibility and control, by way of manual specification of data characteristics. PROC IMPORT offers greater automation, in some cases at the risk of formatting error. The INFILE statement must always be used in the context of a DATA step, whereas PROC IMPORT acts as a stand-alone procedure. The ​document below shows syntax for the INFILE statement and PROC IMPORT procedure for local SAS compared to access via Citrix Receiver.

how to open a data file whe… by api-310702664

If you cannot see the document, please make sure that you are viewing the website in desktop mode.

​In SAS University Edition data file inputing difficulties can occur for a different reason. In order for the LIBNAME statement to run without error, a shared folder must first be defined. If you are using SAS University Edition, and experiencing an error when inputting data, the following videos may be helpful:
How to Set LIBNAME File Path (SAS University Edition) 
Accessing Data Files Via Citrix Receiver: for SAS University Edition

Troubleshooting check-list:

  • Was the “libref” appropriately assigned?
  • Was the file location referred to appropriately based on the user context?
  • ​Was the correct data file extension used?

While impractical for larger data sets, if all else fails, one can copy and paste the data from a data file into SAS using the ‘DATALINES’ function.