Degrees of Freedom in Statistics


Minitab is the leading global provider of software and services for quality improvement and statistics education. Quality. Analysis. Results. For more information visit Minitab.com This podcast is available at KeithBower.com Hello, today I’m going to talk about the idea of statistical degrees of freedom [df]. You come across this a lot in the design & analysis of experiments. If you look at a classical definition of what df are, they’ll talk in terms of the number of independent pieces of information that are provided to you from a given dataset. I like to explain it this way: Let’s say that you’ve got 3 datapoints, just the numbers 1, 2 and 3. You want to compute the sample variance (of course if you take the positive square root of the sample variance you get the sample standard deviation.) So to compute that, let’s say that I’ve got a column for these values; just the numbers 1, 2 and 3. In the next column I’m going to compute a difference from each observation… I’m going to subtract the sample mean. So here in the next column I’ve got x minus xbar. Of course the mean of values 1, 2 and 3 is just 2. In the first row I’ve got 1 minus 2=-1, [in the next row] I’ve got 2 minus 2=0. Now, in the third row I know that that value has to be 1. I know that for a fact because you can show that the sum of (the deviations from the mean) is zero. There’s a paper on my website called “why divide by n-1?” and I show you the proof of that, mathematically. So if I know that there’s this restriction in place, I didn’t need all 3 rows of data. I only needed 2 of them. That’s why, when you compute the sample variance, hence the sample standard deviation, you divide by n-1. You lose 1 df because of that restriction. When you look at Analysis of Variance [ANOVA] tables there are many assumptions that go into it, as far as these degrees of freedom are concerned. It can be quite tricky in practice to evaluate df’s. If you’re considering a factor as being fixed or random effects, and you’re looking at interaction effects in particular, then these df can be tricky to obtain. It used to be, back in the day, that a homework assignment would be working out these df. Nowadays software packages make it pretty straightforward – provided that the assumptions are correct! This is an area of concern that’s very important to me because if you’re not allowed to do many runs in a designed experiment, but you’ve got many factors that you want to investigate, you’d have something called a “saturated design” – you wouldn’t have enough degrees of freedom left over to go into the error term for you to look at a signal-to-noise ratio (like when you’re looking at an F-test). In those situations, when I know that I’m going to be limited – my degrees of freedom are going to be small – I will start looking at a Fractional Factorial design. I refer you to the Podcast that addresses Fractional Factorials. Because I know that I wouldn’t be able to obtain the results to validly look at a signal-to-noise ratio. I wouldn’t have enough df. So I hope this has given you an idea as to what this topic is all about. As I point out, [you’d be advised to] check out the Podcast on Fractional Factorials, and on my website I have a paper called “why divide by n-1?” – that goes into more detail as well. So I hope this has been useful. Of course if you have any questions on this or anything else, please feel free to email them to me through my website, KeithBower.com For more information on statistical methods for quality improvement, visit KeithBower.com

Leave a Reply

Your email address will not be published. Required fields are marked *