SAMPLING DISTRIBUTION
Sampling Distribution of the Sample Mean ![]()
Suppose that students' commuting
time at a certain college, represented by the variable X, has mean
=
29 minutes and standard deviation
= 9 minutes. There are 6000 students at the college and we are about to
take a random sample of 40 students. Can we know in advance what value
, the
mean commuting time for the students in the sample, will take?
Of course
not. By chance, our sample may contain a disproportionate number of students who live far away
and thus have longer commuting time, resulting in a value of
that is larger than the population mean
of 29 minutes. Or, also by chance,
may be smaller than
.
is
a random variable which has many possible values.
The word 'random,'
however, does not mean haphazard. The sample mean
varies according to the law of probability, and its long-run behaviour can
be known in advanced. This information is described in the sampling distribution
of
,
which may be defined as follows:
The
Sampling Distribution of The Sample Mean is
the distribution of values taken by
over all possible samples of the same size from the same population
To help us understand this concept, let us imagine that instead of taking just one
sample, we repeat the sampling process infinitely times. From
each repetition, we calculate and record the value of
so
that we have many many observations of
.
We can then calculate the mean and standard deviation of all these
's.
We can also draw a histogram to see the shape of their distribution.
In other words, we can describe the sampling distribution of
using
three of its most important characteristics: its mean, its standard deviation,
and its shape. .
Although no one
can carry out this process in actuality, because no one can repeat
a sampling process infinitely many times, the result can be obtained through the
application of probability theory. In particular, theoretical Statisticians have derived the following
results for the sample mean: (The derivation, however, is beyond
the scope of our discussion.)
The
Mean and Standard Deviation of
:
If we take a random sample of size n from a population with
mean
and standard deviation
,
the sample mean
will
have a mean
and standard deviation
if the population size is infinite, or
if the population is finite of size N.
The Shape of the Distribution of
:
If the population from which the sample is taken is normal,
will
have a normal distribution (regardless of the sample size n). If population is not normal,
will
have an approximately normal distribution as long as n is 'large'
These results are summarized in the Central Limit Theorem (CLT),
which says:
If we take a random sample of size n from a population with mean
and
standard deviation
,
and n is 'large,' then the sample mean
will
have an at least approximately normal distribution with mean
and
standard deviation
if the population size is infinite, or
if the population is finite of size N .
Note:
The mean
of
is also known as the Expected Value of
,
which is represented in notational form as E(
)
or
.
And the standard deviation of
is
also known as the standard error (SE), expressed in notational form as
.
The distribution of
approaches normality as n approaches infinity. As a rule of thumb, the normal
approximation is considered to be acceptable when
.
The two expressions for the standard deviation of
differ
only by the factor
,
where N is the population size and n is the sample size. This factor is called
the Finite Population Correction Factor (FPCF). If N is considerably larger than
n, this factor is close to 1. In such cases it is often ignored. As a rule
of thumb, the FPCF can be dropped if n is less than 5% of N.
In practice, the population standard deviation
is often unknown. If n is 'large' we can substitute the sample standard
deviation s in its place. The standard error will then be estimated by
or 
With this knowledge of the sampling distribution of
,
we are now ready to calculate probabilities concerning its possible values.
Let us return to the Commuting Time example at the beginning of this section, where we take a random sample of 40 students from a student population of 6000. The average commuting time for the entire student population is 29 minutes, and the standard deviation is 9 minutes.
Calculate the probability that the sample mean will be within 2 minutes of the population mean.
Is the above answer exact or approximate? Explain your answer.
Recalculate the probability in (a) using a sample size of 100 instead of 40.
Recalculate the probability in (c) without using the FPCF. Comment on your answer.
Solution:
Here, we have
= 29 minutes,
= 9 minutes, N = 6000 and n = 40. Since
,
we can apply the Central Limit Theorem (CLT) and state that the sample mean
has
an at least approximately normal distribution with mean
,
and standard deviation
.
The desired probability can then be expressed as ,
or about 84%.
(Obtained from a table of Standard Normal Probabilities, or Z table) Recall
that the probability of a normal random variable falling within a specified
interval can be represented as area under the standard normal curve. In this
example we can display the probability as follows:
The above answer is
exact if the distribution of commuting time is normal, otherwise it is
approximate.
Changing n from 40 to 100,
changes the SE to
,
and the desired probability can be obtained by:
,
or about 97%.
Note that the
probability of observing
within
2 minutes from
increases
as n increases. In other words, the distribution of the sample mean gets more
concentrated around the population mean as the sample size increases.
Without the FPCF,
,
and the probability is obtained by
.
This is very close to the answer we obtained in (c). As we mentioned before,
when N is considerably larger than n, we can drop the FPCF without a
significant loss of accuracy. Here, as n (= 100) is less than 5% of N (=
6000), we can apply the rule of thumb to drop the FPCF.
The Sampling Distribution of The
Sample Proportion
:
So far, we have discussed the
sampling distribution of the sample mean
. But the sample mean is not the only statistic with a sampling
distribution. In fact, all statistics, such as the sample median, the sample standard
deviation, and the sample proportion, have sampling distributions. Of
these, we will now explore the sampling distribution of the sample proportion
.
Suppose we take a random sample
of 100 students from a large college where 40% of the students work at
least part time. What percentage of the students in the sample can we expect to
have at least a part time job? How much can the sample proportion be
expected to vary from the population proportion? And what is the probability of
getting a sample where this proportion is greater than 50%? These are some
of the questions that can be answered by looking at the sampling distribution of
the sample proportion.
It can be shown
that the Central Limit Theorem, which gave us the sampling distribution of the sample
mean, can also be applied to the sample proportion
.
The content of the theorem is summarized below:
The
Central Limit Theorem for the Sample Proportion
:
If we take a random sample of size n from population with proportion p, then the sample
proportion
will
have mean
and standard deviation
if the population
is infinite, and
if the population is finite with size N.
And
will
have an approximately normal distribution if n is 'large.'
As before, this is
an approximate distribution: the larger n is, the better the approximation.
But the rule of thumb for deciding what value of n is large enough for the
approximation to be considered acceptable is different from the one we used for
the sampling distribution of
(where n is considered large enough if
).
Here, n should be large enough so that both np and n(1-p) are at
least 5.
Note:
The two expressions for the standard deviation of
differ
only by the Finite Population Correction Factor which, as before,
can be safely ignored if
.
We have presented the approximate distribution of
.
Its exact distribution can be obtained from the Binomial
Distribution. (Not covered in this module)
Example 2
Let us return to the 'Part-time Job' example and answer the questions we have posed. Recall that we are planning to select a random sample of 100 students from a college where 40% of the students work at least part time. And we wish to calculate:
What percentage of the students in the sample can be expected to have at least a part time job.
How much we can expect the sample proportion to vary from the population proportion.
The probability of observing a sample proportion that is greater than 0.5.
Answers:
The desired quantity is the expected value of
.
.
The amount by which 'we expect the sample proportion to vary from the population proportion' is just another way of describing the Standard Deviation of
(or the Standard Error).
.
Note that we do not use the FPCF even though the population is finite. Since N is not given, we have assumed that it is considerably larger than n.
Here we have p = 0.4, n = 100, and we wish to calculate
. Since np = 100(0.4) and n(1-p) = 100(0.6) are both greater than 5, we can then say that the sample proportion
has an approximately normal distribution, with mean 0.4 and standard deviation 0.0490, as calculated above. The desired probability can be approximated as follows:

Click here for self test exercises on this section.
REFERENCES:
Statistics for Business and Economics, 7th ed. (by Anderson, Sweeney, and Williams)
Section 7.4 pp. 259 - 261
Section 7.5 pp. 262-272
Section 7.6 pp. 273-278
Essentials of Statistics for Business and Economics, 2nd ed. (by Anderson, Sweeney, and Williams)
Section 7.4 pp. 257-260
Section 7.5 pp. 261-270
Section 7.6 pp. 271-276
The Basic Practice of Statistics, 2nd ed. (by David S. Moore)
Section 4.1 pp. 215-219
Section 4.3 pp. 236-249
Section 8.1 pp. 431-433