The number of bins you use will affect
how easy it is to interpret the histogram.
If you use too many bins you will have
too few values in each and the pattern
will be 'ragged'. If you use too few bins
you may miss important details.
There are various ways of calculating
the optimum number of bins. I find that
using the square root of the number of
data values works as well as more complicated
methods. The result is usually on the
low side, but you will often want to adjust
the bin sizes anyway to get intervals
that are easy to interpret.
In the example there are 25 data values.
The square root rule gives 5 bins. The
smallest data value is 1, the largest
is 96. A scale stretching from 0 through
100 will contain all the values; this
conveniently gives 5 bins of span 20.
Suppose there were 50 values. The square
root of 50 is just over 7 so you could
use seven bins of span 14 each. That's
an awkward number, so I'd use bins of
span 15. You could even use 10 bins of
span of 10.
 |
The data show the wait times,
in minutes, for 50 admissions
into the casualty department of
a hospital:
24 |
22 |
24 |
30 |
24 |
16 |
18 |
32 |
27 |
69 |
26 |
36 |
41 |
27 |
43 |
29 |
26 |
21 |
39 |
44 |
25 |
32 |
30 |
28 |
26 |
34 |
21 |
30 |
30 |
31 |
32 |
37 |
64 |
26 |
68 |
20 |
32 |
43 |
31 |
24 |
20 |
27 |
30 |
33 |
39 |
40 |
22 |
31 |
29 |
43 |
spreadsheet
Draw a histogram and comment
on the results. What action would
you suggest?
|
| |