Transforming Skewed Data

​Innumerable statistical tests exist for application in hypothesis testing based on the shape and nature of the pertinent variable’s distribution. If however the intention is to perform a parametric test – such as ANOVA, Pearson’s correlation or some types of regression – the results of such a test will be more valid if the distribution of the dependent variable(s) approximates a Gaussian (normal) distribution and the assumption of homoscedasticity is met. In reality data often fails to conform to this standard, particularly in cases where the sample size is not very large. As such, data transformation can serve as a useful tool in readying data for these types of analysis by improving normality, homogeneity of variance or both.

For the purposes of Transforming Skewed Data, the degree of skewness of a skewed distribution can be classified as moderate, high or extreme. Depending upon the degree of skewness and whether the direction of skewness is positive or negative, a different approach to transformation is often required. As a short-cut, uni-modal distributions can be roughly classified into the following transformation categories:


This article explores the transformation of a positively skewed distribution with a high degree of skewness. The following example takes medical device sales in thousands for a sample of 2000 diverse companies. The histogram below indicates that the original data could be classified as “high(er)” positive skewed.
​The skew is in fact quite pronounced – the maximum value on the x axis extends beyond 250 (the frequency of sales volumes beyond 60 are so sparse as to make the extent of the right tail imperceptible) – it is however the highly leptokurtic distribution that that lends this variable to be better classified as high rather than extreme. It is in fact log-normal – convenient for the present demonstration. From inspection it appears that the log transformation will be the best fit in terms of normalising the distribution.

​​Starting with a more conservative option, the square root transformation, a major improvement in the distribution is achieved already. The extreme observations contained in the right tail are now more visible. The right tail has been pulled in considerably and a left tail has been introduced. The kurtosis of the distribution has reduced by more than two thirds.

​A natural log transformation proves to be an incremental improvement yielding the following results:
​This is quite a good outcome – the right tail has been reduced considerably while the left tail has extended along the number line to create symmetry. The distribution now roughly approximates a normal distribution. An outlier has emerged at around -4.25, while extreme values of the right tail have been eliminated. The kurtosis has again reduced considerably.

Taking things a step further and apply a log to base 10 transformation yields the following:
​In this case the right tail has been pulled in even further and the left tail extended less than the previous example. Symmetry has improved and the extreme value in the left tail has been bought closer in to around -2. The log to base ten transformation has provided an ideal result – successfully transforming the log normally distributed sales data to normal.

In order to illustrate what happens when a transformation that is too extreme for the data is chosen, an inverse transformation has been applied to the original sales data below.
​Here we can see that the right tail of the distribution has been brought in quite considerably to the extent of increasing the kurtosis. Extreme values have been pulled in slightly but still extend sparsely out towards 100. The results of this transformation are far from desirable overall.

Some thing to note is that in this case the log transformation has caused data that was previously greater than zero to now be located on both sides of the number line. ​Depending upon the context, data containing zero may become problematic when interpreting or calculating the confidence intervals of un-back-transformed data.  As  log(1)=0,  any data containing values <=1 can be made >0 by adding a constant to the original data so that the minimum raw value becomes >1 . Reporting un-back-transformed data can be fraught at the best of times so back-transformation of transformed data is recommended. Further information on back-transformation can be found here. 

Adding a constant to data is not without it’s impact on the transformation. As the below example illustrates the effectiveness of the log transformation on the above sales data is effectively diminished in this case by the addition of a constant to the original data.

​​​Depending on the subsequent intentions for analysis  this may be the preferred outcome for your data –  it is certainly an adequate improvement and has rendered the data approximately normal for most parametric testing purposes.

Taking the transformation a step further and applying the inverse transformation to the sales + constant data, again, leads to a less optimal result for this particular set of data – indicating that the skewness of the original data is not quite extreme enough to benefit from the inverse transformation.

​​It is interesting to note that the peak of the distribution has been reduced whereas an increase in leptokurtosis occurred for the inverse transformation of the raw distribution. This serves to illustrate how a small alteration in the data can completely change the outcome of a data transformation without necessarily changing the shape of the original distribution.

There are many varieties of distribution, the below diagram depicting only the most frequently observed. If common data transformations have not adequately ameliorated your skewness, it may be more reasonable to select a non-parametric hypothesis test that is based on an alternate distribution.

​Image credit: cloudera.com

Article: Sarah Seppelt Baker

23 Replies to “Transforming Skewed Data”

  1. www.xmc.pl Enterprise

    Have you ever considered about adding a little bit more than just your articles? I mean, what you say is fundamental and everything. However think about if you added some great photos or video clips to give your posts more, “pop”! Your content is excellent but with pics and clips, this blog could definitely be one of the best in its field. Good blog!

    Reply
  2. Credits

    first-rate evening, I’m a college English major and I’m learning a lot about writing by reading online world. I in reality enjoy your style of writing. It’s very easy to understand but with brilliant details. Your choice of words makes it easy to check out and understand. That’s a huge portion of writing. Your viewers have to be able to understand what you’re saying and it has to be fascinating. You need to challenge your viewers , so they will come back for more. You do a fine job with all of these tittle. Thx!

    Reply
  3. Leroy

    hi!,I like your writing very much! percentage we keep in touch more approximately your post on AOL?
    I require an expert on this space to solve my problem.
    Maybe that is you! Taking a look forward to
    peer you.

    Reply
  4. Alecia

    Great goods from you, man. I have understand your stuff previous to and you’re just extremely fantastic.
    I really like what you have acquired here, certainly like what you are saying and
    the way in which you say it. You make it enjoyable and you still care for to keep it
    sensible. I cant wait to read much more from you. This is really
    a terrific website.

    Reply
  5. Bradford

    Every weekend i used to pay a quick visit this web page, as i wish for enjoyment, since this this site conations
    really good funny information too.

    Reply
  6. Richelle

    Oh my goodness! Impressive article dude! Many thanks, However I am encountering issues with your RSS.
    I don’t understand the reason why I cannot subscribe
    to it. Is there anybody else getting the same RSS problems?
    Anyone that knows the answer will you kindly respond?
    Thanks!!

    Reply
  7. Duane

    Oh my goodness! Impressive article dude! Many thanks, However I am
    going through difficulties with your RSS. I don’t understand
    the reason why I can’t join it. Is there anybody
    getting similar RSS issues? Anyone who knows the solution will you kindly respond?
    Thanx!!

    Reply
  8. Adrian

    It’s a shame you don’t have a donate button! I’d definitely donate to this superb blog!
    I suppose for now i’ll settle for bookmarking and adding your RSS feed
    to my Google account. I look forward to new updates and will talk about this website with my Facebook group.

    Chat soon!

    Reply
  9. Jovita

    It’s actually a great and useful piece of info. I am happy that you simply shared this helpful information with us.
    Please keep us up to date like this. Thank you for
    sharing.

    Reply
  10. Adrian

    Hey there, You’ve done an incredible job. I’ll definitely digg it and personally recommend to my friends.
    I’m sure they’ll be benefited from this website.

    Reply
  11. Zulma

    You are so interesting! I don’t think I have read anything like this before.
    So good to find somebody with genuine thoughts on this subject matter.
    Seriously.. thank you for starting this up.
    This site is something that’s needed on the
    web, someone with a bit of originality!

    Reply
  12. Geraldo

    I got this web site from my buddy who shared with me regarding this web page and at the moment this
    time I am browsing this web site and reading very informative
    content at this place.

    Reply
  13. Helath.M106.COM

    Thats excellent and very nicely written.Often I tend not to make comments on the web, however Ive to say that this site actually made me want to. Actually excellent little bit of material

    Reply
  14. Anke

    Excellent blog here! Also your web site loads up fast!
    What host are you using? Can I get your affiliate link to your host?

    I wish my web site loaded up as quickly as yours lol

    Reply
  15. Derrick

    I like what you guys are usually up too. This sort of clever
    work and exposure! Keep up the superb works guys I’ve
    added you guys to my personal blogroll.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *