Why do we use n-1 for Samples?
These are one of those concepts that definitely slipped me by at a younger level and while I was in Engineering school. The bad habit of applying to just one set of problems (dealing only with samples) meant I always used one formula to find the variance and square root.
What difference do I mean, well it is that when estimating population parameters we divide our by “n” and when we deal with calculating sample statistics samples it is “n-1”.
As you can see in the above two formulas, on the left we have the population parameters with the variance estimated by dividing by n. On the right we have the variance of a sample of that population where we get the estimate of the variance and by dividing by n-1 we get what is called an unbiased estimate. The other slight variation is that we use the true mean for the population (it is a parameter of the system) and we use the mean of the sample on the right equation.
N.B. a sample is simply a subset of the population for example measuring the number of cars passing a junction for a week is a sample. The utilisation of n-1 in the calculation of sample variance and sample standard deviation is referred to as Bessel’s correction.
Intuitively
When we know every single value in the population then we can calculate the variance exactly and this forms a parameter of the system. When we take a sample, we are only getting a bit of the information. We are only receiving a small bit of knowledge of the total spread of our full population. Now simply the fact that we have less data points means we are invariably going to have less spread away from the mean. By only taking a subset of the points we are very unlikely to capture the true variance of the system. For example if we have as illustrated below a population of 20 points and we sample 8. The majority of those points are grouped around the mean, so when we take our sample we are more likely to select points close to the mean and less likely to get points that deviate more from the mean. The below plot shows a realisation of a a couple of different realisations of random sampling. What we can see with each one is that our sample points are more closely regrouped around their sample mean. This sample mean will always sit in the middle of your sample, but this sample mean could be completely different from the true mean, in fact you may even have scenarios where the population mean does not even intersect your sample set. Therefore, by taking a sample we are invariably capturing less of the variance and are likely to reduce the variance of the system.
Therefore, if we divide by n to get the variance of the sample (we get) a biased estimate, since our variance is likely to underestimate the true variance of the population. Therefore, to balance this we correct for this by using n-1 (having a smaller denominator) which has the effect of increasing the variance of the system making the sample variance tend towards the true variance of the system.
I have posted a nice little video which is available through Khan academy and a small tool which gives you an intuitive visualisation of why using n-1 gives us an unbiased estimate of the population variance.
More Mathematically
When we have