|
See Efficiency of Estimators
Suppose you:
- take a large number of samples from
a population that does not conform to
a normal distribution
- calculate the mean each of those samples
- find the shape of the population distribution
formed by these sample means
You will find that the distribution of
the sample means will resemble a normal
distribution. The larger the the number
of items in each sample, the better the
approximation.

The Central Limit Theorem is of considerable
practical importance because many methods
used in inferential statistics rely on the
samples being taken from a population that
conforms to a normal distribution. Many
populations do not conform to a normal distribution,
but this can be overcome by using the means
of samples taken from the population. Control
charts are a good example of this.
A random sample taken from
a population is used to estimate the population
mean. The sample mean is a point estimate,
and is unlikely to exactly equal the true
population mean.
The confidence interval defines a band
around the sample mean within which the
true population will lie, to some degree
of confidence:

For example, there is a 95% probability
that the true population mean will lie within
the 95% confidence interval of the sample
mean. The method used to calculate the confidence
interval will vary, but usually involves
the normal distribution for large samples,
or the t-distribution for small samples.
The 100(1-a)%
confidence interval for the mean of a small
sample (t distribution) is:

The number of independent data values that
are used in estimating the value of a population
parameter.
The number of degrees of freedom in the
standard deviation formula is ‘n-1’:

If 'n' were used, instead
of 'n-1', the value of 's' would be biased;
the standard deviation calculated from small
samples would underestimate the population
standard deviation.
The number of degrees of freedom
is 'n-1' because only 'n-1' of the data
values 'xi' are independent;
if any 'n-1' are known then the other can
be calculated (using x-bar).
The mean x-bar is an estimate
of the true population mean and was calculated
using the same xi values that
are being used in the standard deviation
calculation. It can be shown that, because
of this, errors between the estimate x-bar
and the true population mean tend to bias
the value of 's'.
An estimator is a statistic
that represents the properties of a population.
Several estimators may be available to represent
a particular property.
When selecting the one you
prefer you would might consider the efficiency
and bias of the alternatives.
The most efficient estimator
is the one that gives the lowest expected
variance of the error. Technically the efficiency
is the efficiency is the lowest possible
variance from any estimator divided by the
expected variance of the selected estimator.
The bias is the expected (average)
difference between the estimator value and
the actual population value.
The entire collection of the items under
study. In inferential statistics the population
under study might be the hypothetical future
output of a process, given certain parameter
values.
The confidence interval is used to predict
the interval within which the population
mean falls. The prediction interval is used
to predict the interval within which a single
future observation will fall.
The 100(1-a)%
prediction interval for a small sample (t
distribution) is:
The standard deviation of the mean of a
sample. If you:
- take a large number of samples, of
equal size, from a population
- calculate the mean of each sample
- calculate the standard deviation of
the sample means
you will have found the standard error.
The standard error is related to the population
(process) standard deviation by:

where 'n' is the sample size.
|