SITEMAP UPDATES ABOUT
MiC Quality

FREE MODULE Introduction to Statistics

  PROCESS IMPROVEMENT AND SIX SIGMA
ONLINE COURSES FREE TRIAL SIX SIGMA FAQ BROCHURES LICENSES ENROLL
FREE TRIAL
:: Online Courses
:: Free Trial
>> Statistics Module
>> Excel Primer
:: Six Sigma
:: FAQ
:: Brochures
:: Licenses
:: Discounts
 
Welcome to MiC Quality
Online Course Instructor
Glen Netherwood
MiC Quality
Online Learning
:: Home :: FREE TRIAL :: CURRENT STUDENTS LOGIN
:: INTRODUCTION TO STATISTICS Syllabus    PreviousNext

Understanding Variance


The variance is a measure of the process variation. The greater the scatter of the data values, the larger the variance. The variance is the average distance of the data values from the mean.

More precisely the variance is the mean of the squares of the distance of the data values from the mean:

i
1
-3
9
2
-2
4
3
-1
1
4
+2
4
5
+4
16
Sum
0
34
Variance
  8.5*

* I'll explain why we divide by 'n-1' shortly.

The values are squared because the square of any value is positive. Notice that if we used:

some of the values would be negative and others positive, the sum of all the values is zero. Squaring makes all the values positive and is a convenient way of overcoming this.

To explain why we divide by 'n-1'. The natural formula for the variance is:

However the parameter m is not known and so the statistic is substituted. This would give the value of s2 a bias, it would be too small, however dividing by 'n-1' exactly compensates for the bias:

The bias occurs because was calculated using the selfsame data values used to calculate s2. Recycling the data values in this way introduces the bias. The mathematics of this are complicated, but they are given here if you want to see them.

The value 'n-1' is also known as the number of 'degrees of freedom'. This is the number of independent data values in the formula. Suppose that the sample contains 10 values (n = 10), then if you know any 9 of the 10 values, and the value of you can calculate the remaining data value. There are only 9 'independent values':

do.gif (1079 bytes)

The average of the five values is 7, find the missing value, X5:

4

8

12

2

X5


Mathematically the value so calculated is the value that minimizes the value of s2. The resulting value of the variance is lower than if an actual value taken at random from the process (an 'independent value') was used, unless the independent value happened to equal the calculated value.

I've spent some time introducing the number of 'degrees of freedom' because it is an important, and somewhat puzzling, concept that often crops up in statistics.


 
Copyright 1998-2008 MiC Quality Legal Notices and Privacy Policy