# Standard Deviation & Degrees of Freedom Explained | Statistics Tutorial #1 | MarinStatsLectures

in this video we’re going to talk about
the standard deviation and what it’s actually trying to measure.
So often the standard deviation is presented as a formula like this but that doesn’t
really give us much insight into what it’s actually trying to measure.
So we’re going to work through this simple example here: we have four observations
and I’ve kept the dataset simple so that we can do all the calculations in
our head and focus on the concept. We’ve got these four observations here
and they have a sample mean of 80; now I’m going to ask you to think up your
own formula that helps us capture the following: we want to know on average how far do individual observations deviate from the mean. Again something that
helps us capture on average how far is an individual moving from the mean.
I’d suggest you take a moment, pause this video and think about it and try and
develop a formula for yourself. Chances are that as you thought about
this many of you will have come up with a following formula: first we think let’s
look at how far the first observation is from the mean, we can see that’s 5
below the mean, then how far is the second observation from the mean, how far is the third from the mean and the fourth. and then we can average these; and if you
tried working this out you’d notice the negative numbers and the positive ones
are gonna cancel each other out. So you’re quite likely thought well let’s
take absolute values instead. let’s take the absolute value of (75 – 80), all these
absolute deviations and average them. if If we were to write this in notation we
could think of this as being how far is X1 from the mean in absolute value,
how far is X2 from the mean all the way up to how far is Xn from the mean.
Generalizing this for n observations and take their average or divide by n, and
again writing it a bit more notation: sum from i going 1 up to n, the absolute
deviations of ( Xi-Xbar) over n; We can see here this formula captures
exactly that: here we’re getting the average of the absolute deviations, so
let’s write that down here. this is capturing the average absolute deviation You may notice that the formula here
looks pretty similar to the formula for the standard deviation, so we’re going to
get to building up the standard deviation in a moment.
The first thing just to address is why don’t we just use this formula? well the
answer is that we do! this is an alternate measure that exists although
part of the reason why we tend to work with the standard deviation more than
this average absolute deviation is for a few reasons: other than having nice
properties if you recall in making a plot of X versus the absolute value of X,
it’s not a smooth function. It takes on this V shape; ok those who
remember reaching back in some of your math, non-smooth functions are a bit more difficult to work with; so we need to think of another way to get rid of the
effect of having these negative values and positive values cancelling each
other out. we can see the answer right here we can also think about squaring
these deviations; so let’s take a look at building up that formula now: so as noted
we can think about using squared deviations and we can look at how far
was the first observation from the mean and square that, how far is the second
from the mean squared, how far is the next from the mean squared and how far
is that last observation from the mean squared and we can average these.
let’s write this out a bit more general we can look at how far is the first
observation from the mean and we can square that, how far is the second
observation from the mean and square that all the way up to the final or nth
observation how far is that from the mean and square that, and then divide by
the number that we have and again writing it in notation here we’re
summing from i going from 1 up to n, how far is Xi from Xbar squared over n; We can see this formula here it has a name: this gets called the ‘variance’,
‘the sample variance’ and if we look at this we’re calculating the
average of the squared deviations: average squared deviation. Now problem with this is that the units of the variance are in the units of our
variable x squared so if these here represent class grades as a percentage
this variance tells us, it’s in units percentage squared; so suppose this came out to be15.3, what that would tell us is on average an
individual’s grade moves about 15.3% squared from the mean! that doesn’t really have a meaningful interpretation
so we need to find a way to get our units back to percentages itself; okay
get back to the units of our X variable One way we can do that is we can take
the square root, so taking the square root of this it’s going to get us back
to units of X, here percentages. so that’s helped us get rid of the square so this
standard deviation we can think of as being the average deviation.
Mathematically it’s not quite that it’s actually the square root of the average
squared deviation. So rather than call it the average deviation because
it’s not quite that mathematically we call it the standard deviation.
But conceptually it’s fine for you to think of it as being the average deviation
capturing on average how far does an individual’s grade move from the mean.
One other thing you might have noticed when you see this formula is that we
actually have to subtract 1 in the denominator (n-1); so let’s take a moment and
talk about why that is and we’re going to go at it from a more conceptual point
of view rather than focusing on the mathematics: so let’s talk about that So we’re going to talk about why do we divide by (n-1) here in the
calculation of the sample standard deviation So essentially this has to do
with the fact that having n observations we can think of it as we start with n
pieces of information or n degrees of freedom, and we lose one degree of
freedom every time we use our data to estimate something, so
first let’s think of a simple example here of having just one observation. So
we have X1 being 76 with just one observation we use that as our estimate
of the sample mean, now if we want to estimate the sample standard deviation
you’ll notice that we can’t. we’ve used that observation as our estimate of the
mean we have no data left to try and estimate how far the individuals move
from the mean. okay so that’s just a simple exaggerated case of n=1
let’s think about the context of our example and having four observations: so
we’ve got X1, X2, X3, X4 and remember we have to use these to estimate the
sample mean this is dealing with a sample of data and we don’t know the
population mean. so let’s think of having these four observations here X1, X2, X3, X4 four observations and again we’ve used these to estimate the mean of 80, so
we can think of it as we’ve started with 4 observations or 4 degrees of freedom
and let’s think of this first observation: this first observation here
one of them we can think of as being free this can take on any value that it
wants let’s suppose that it happens to be 78, in the second observation again
this observation is free it can take on any value it wants, let’s suppose
it comes out to be 79, again I’m using simple numbers so that we can do
the calculations in our head and focus on the concept. This third observation is
free and can be any value it wants. suppose it ends up being 81 what do you notice
that it wants. We know that these four numbers have a mean of 80 so this can
only take on one value it has to be 82 in order to get back to that sample mean
of 80, again we can think of we had four piece of information or four
observations we use them to estimate the sample mean we lost one degree freedom; only three useful pieces of information left or three degrees of freedom. So again, these here are our free observations, here we lost one degree of freedom: (n-1) and again we can think of the
four pieces of information we have are here one two three and here’s the fourth,
so that’s the intuitive conceptual explanation of it. If you’d like you can
take a look at the mathematical explanation. if you’re very
mathematically inclined you might work your way through that I always find that
this conceptual explanation is usually easier for most to grasp!

## 18 thoughts on “Standard Deviation & Degrees of Freedom Explained | Statistics Tutorial #1 | MarinStatsLectures”

• James Lucas says:

LOL. Loved the end voice.

• Hemakumar Gantepalli says:

Awesome

Yay! Mike finally shows his face 🙂 Thanks for this video Mike

• Gaurav Gurjar says:

Thank you so much Mike! and tell that lil boy i love stat too…sweetu

#edited

• Green Cat says:

I always memorized the formula and never really thought about it, until now.

• nygreenguy says:

This video does a good job of explaining degrees of freedom as well!

• Doc P's Statistics Videos says:

Wonderful video. I remember hearing a version of this presentation in my first statistics class back in 1972 or so. I'll encourage my students to watch this after they fail to understand my presentation on the topic. (By the way, can you explain the software/video technique that puts you behind the board?)

• BuddhiDayananda says:

As usual Great work…

• Usama Laiq says:

Excellent demonstration of the S.D concept Bravo!!!!

• Usama Laiq says:

Or in other words this "n-1" factor is the unbiased estimator of "population variance" or "sigma squared" since, the sample has a capacity to under estimate or undermine the population, so the factor n-1 is used in calculation of sample standard deviation or variances to maximize the overall result by decreasing the value of denominator. Therefore the denominator in the S.D or variance of sample is written as n-1 and not just n.

• arpita behura says:

Hey Mike! Wonderful description in brief. Now, I feel I can use these stats videos to make my dream come true (jumping into data science).
Hoping to learn a lot of stats to make my foundations strong…. SALUTE

• flamboyant person says:

You are the best statistician. Thanks a lot.

• flamboyant person says:

Yes it is great to see your face. 🙂

• Kim Mosher says:

okay, so I understand losing a degree of freedom, and that makes sense for multiple values. But how do you explain one variable. The deviation, any type, should be zero, since it is itself the mean. However, when using n-1, you'd get zero, which implies that the standard deviation does not exist for one variable, is that correct?

• dickyrock1 says:

Yep very good indeed thanks Mike

• MarinStatsLectures- R Programming & Statistics says:

👋🏼 Hi There! In this video we answer these questions: what is the Standard Deviation? What does the standard deviation measures? why do you divide by (n-1) (Degrees of Freedom) and more with examples. If you like to support us, you can Donate (https://bit.ly/2CWxnP2), Share our Videos, Leave us a Comment and Give us a Like 👍🏼! Either way We Thank You!

• YUJU SHIH says:

Thank you very much for the explanation. As an English learner, I could understand what you expalin and aslo be interested at your video.

• Ian Petrus Tan says:

if he's on the other side of the glass is he writing everything laterally inverted so we can read them normally?