We will start by looking at a graphical
method for studying the variation known
as the 'Frequency Histogram'.
To create a frequency histogram, group
the data into ‘bins’, each bin containing
a range of values. The vertical axis represents
the number of observations in each range,
known as the 'frequency'.
The data below show the test results
for 25 students:
|
Results |
|
Bin |
Midpoint
|
Frequency |
|
38
|
10 |
60 |
90 |
88 |
|
>0-20
|
10 |
7 |
|
96 |
1 |
41 |
86 |
14 |
|
>20-40
|
30 |
8 |
|
25 |
5 |
3 |
16 |
22 |
|
>40-60
|
50 |
5 |
|
2 |
29
|
34 |
55 |
36
|
|
>60-80
|
70 |
0 |
|
37 |
36 |
91 |
47 |
43 |
|
>80-100
|
90 |
5 |
There are various ways of calculating
the number of bins. I find that using
the square root of the number of data
values gives as good a result as the more
complicated methods. The value is usually
on the low side, but you can adjust it
upwards to get convenient bin boundaries.
Treat the calculated number of bins as
a starting point, and adjust it as necessary
to give the result you prefer.
In the example there are 25 data values.
The square root rule gives 5 bins. The
smallest data value is 1, the largest
is 96. A scale stretching from 0 through
100 will contain all the values; this
conveniently gives 5 bins of span 20.
Now we group the data into the bins.
You must decide on how to handle values
that fall on the boundary between two
bins. Various conventions are used, I
include values greater
than the lower boundary up to
and including the upper boundary;
this is consistent with the convention
used by Excel.
For example, there are 8 data values
greater than 20 and up to and including
40. I’ve highlighted them in the table
to make them easy to count. After calculating
all the frequencies we can create the
histogram: