SITEMAP UPDATES ABOUT
MiC Quality

FREE MODULE Introduction to Statistics

  PROCESS IMPROVEMENT AND SIX SIGMA
ONLINE COURSES FREE TRIAL SIX SIGMA FAQ BROCHURES LICENSES ENROLL
FREE TRIAL
:: Online Courses
:: Free Trial
>> Statistics Module
>> Excel Primer
:: Six Sigma
:: FAQ
:: Brochures
:: Licenses
:: Discounts
 
Welcome to MiC Quality
Online Course Instructor
Glen Netherwood
MiC Quality
Online Learning
:: Home :: FREE TRIAL :: CURRENT STUDENTS LOGIN
:: INTRODUCTION TO STATISTICS Syllabus    PreviousNext

Frequency Histograms


We will start by looking at a graphical method for studying the variation known as the 'Frequency Histogram'.

To create a frequency histogram, group the data into ‘bins’, each bin containing a range of values. The data below show the test results for 25 students:

Results

 

Bin

Midpoint

Frequency

38

10

60

90

88

 

  >0-20

10

7

96

1

41

86

14

 

  >20-40

30

8

25

5

3

16

22

 

  >40-60

50

5

2

29

34

55

36

 

  >60-80

70

0

37

36

91

47

43

 

  >80-100

90

5

I've grouped them into 5 bins of equal size. The first bin contains the frequency (or number) of results that are greater than zero and up to and including 20. The second bin contains the frequency of values greater than 20 up to and including 40. I've shaded these values to make it easier for you to check that there are eight (38, 25, 37, 29, 36, 34, 22 and 36).

Now I can create a histogram of the results. The vertical axis represents the frequency of observations in each range:

There are two conventions for showing the bin values on the horizontal axis of the histogram:

1. show the midpoint of the bin range
2. show the upper limit of each bin range, the 'cutpoint'

The histogram above shows the midpoint convention. Pass your cursor over the image to see the alternative 'cutpoint' convention.

The reason for creating a histogram is to see the 'pattern' of the data. The number of bins you use will affect how easy it is to see the pattern. If you use too many bins you will have too few values in each and the pattern will be 'ragged'. If you use too few bins you may miss important details.

There are various ways of calculating the optimum number of bins. I find that using the square root of the number of data values is as satisfactory as the more complicated methods. The result is usually on the low side, but you probably want to adjust it anyway to avoid awkward sized bins.

In the example there are 25 data values. The square root rule gives 5 bins. The smallest data value is 1, the largest is 96. A scale stretching from 0 through 100 will contain all the values; this conveniently gives 5 bins of span 20.

If there were 50 values then the calculation would suggest 7 bins of size 14 each, but that is an awkward span so I'd probably use 10 bins of span of 10 which is a nice round number; but bins of span 15 would also be satisfactory.

When you allocate the data into the bins you must decide on how to handle values that fall on the boundary between two bins. In the example I've included values greater than the lower boundary up to and including the upper boundary; this is consistent with the convention used by Excel.


 
Copyright 1998-2009 MiC Quality Legal Notices and Privacy Policy