SAMPLING DISTRIBUTION

 

Sampling Distribution of the Sample Mean

Suppose that students' commuting time at a certain college, represented by the variable X,  has mean = 29 minutes and standard deviation  = 9 minutes. There are 6000 students at the college and we are about to take a random sample of 40 students. Can we know in advance what value , the mean commuting time for the students in the sample, will take?

 

Of course not. By chance, our sample may contain a disproportionate number of students who live far away and thus have longer commuting time, resulting in a value of that is larger than the population mean of 29 minutes. Or, also by chance,  may be smaller than . is a random variable which has many possible values. 

 

The word  'random,' however, does not mean haphazard. The sample mean varies according to the law of probability, and  its long-run behaviour can be known in advanced. This information is described  in the sampling distribution of , which may be defined as follows:
  

The Sampling Distribution of  The Sample Mean is the distribution of values taken by over all possible samples of the same size from the same population

 

 

To help us understand this concept, let us imagine that instead of taking just one sample, we repeat the sampling process infinitely times. From each repetition, we calculate and record the value of so that we have many many observations of . We can then calculate the mean and standard deviation of all these 's. We can also draw a histogram to see the shape of their distribution. In other words, we can describe the sampling distribution of using three of its most important characteristics: its mean, its standard deviation, and its shape. . 

 

Although no one can carry out this process in actuality, because no one can repeat a sampling process infinitely many times, the result can be obtained through the application of probability theory. In particular, theoretical Statisticians have derived  the following results for the sample mean: (The derivation, however,  is beyond the scope of our discussion.)
 

The Mean and Standard Deviation of :
If we take a random sample of size n from a population with mean and standard deviation ,  the sample mean will  have a mean and standard deviation  if the population size is infinite, or  if the population is finite of size N.

The Shape of the Distribution of :
If the population from which the sample is taken is normal,  will have a normal distribution (regardless of the sample size n). If population is not normal, will have an approximately normal distribution as long as n is 'large' 

These results are summarized in the Central Limit Theorem (CLT), which says:

If we take a random sample of size n from a population with mean and standard deviation , and n is 'large,' then the sample mean will have an at least approximately normal distribution with mean and standard deviation if the population size is infinite, or if the population is finite of size N .

 

 

Note:

  1. The mean of   is also known as the Expected Value of , which is represented in notational form as E() or . And the standard deviation of is also known as the standard error (SE), expressed in notational form as .

  2. The distribution of approaches normality as n approaches infinity. As a rule of thumb, the normal approximation is considered to be acceptable when .

  3. The two expressions for the standard deviation of differ only by the factor  , where N is the population size and n is the sample size. This factor is called the Finite Population Correction Factor (FPCF). If N is considerably larger than n, this factor is close to 1. In such cases it is often ignored. As a rule of thumb, the FPCF can be dropped if n is less than 5% of N. 

  4. In practice, the population standard deviation is often unknown. If n is 'large' we can substitute the sample standard deviation s in its place. The standard error will then be estimated by or

With this knowledge of the sampling distribution of , we are now ready to calculate probabilities concerning its possible values.

 

Example 1:

Let us return to the Commuting Time example at the beginning of this section, where we take a random sample of 40 students from a student population of 6000. The average commuting time for the entire student population  is 29 minutes, and the standard deviation is 9 minutes. 

  1. Calculate the probability that the sample mean  will  be within 2 minutes of the population mean.

  2. Is the above answer exact or approximate? Explain your answer.

  3. Recalculate the probability in (a) using a sample size of 100 instead of 40.

  4. Recalculate the probability in (c) without using the FPCF. Comment on your answer. 

 

Solution:

  1.  Here, we have = 29 minutes,  = 9 minutes, N = 6000 and n = 40. Since , we can apply the Central Limit Theorem (CLT) and state that the sample mean has an at least approximately normal distribution with mean , and standard deviation . The desired probability can then be expressed as ,  or about 84%. (Obtained from a table of Standard Normal Probabilities, or Z table) Recall that the probability of a normal random variable falling within a specified interval can be represented as area under the standard normal curve. In this example we can display the probability as follows:  

     



    The above answer is exact if the distribution of commuting time is normal, otherwise it is approximate.

  2. Changing n from 40 to 100, changes the SE to , and the desired probability can be obtained by: , or about 97%. 

    Note that the probability of observing  within 2 minutes from increases as n increases. In other words, the distribution of the sample mean gets more concentrated around the population mean as the sample size increases.

  3. Without the FPCF, , and the probability is obtained by


    This is very close to the answer we obtained in (c). As we mentioned before, when N is considerably larger than n, we can drop the FPCF without a significant loss of accuracy. Here, as n (= 100) is less than 5% of N (= 6000),  we can apply the rule of thumb to drop the FPCF.

The Sampling Distribution of The Sample Proportion :

 

So far, we have discussed the sampling distribution of the sample mean . But the sample mean is not the only statistic with a sampling distribution. In fact, all statistics, such as the sample median, the sample standard deviation, and the sample proportion,  have sampling  distributions. Of these, we will now explore the sampling distribution of the sample proportion

Suppose we take a random sample of 100 students from a large college where 40% of the students work at least part time. What percentage of the students in the sample can we expect to have at least a part time job? How much can the sample proportion  be expected to vary from the population proportion? And what is the probability of getting a sample where this proportion is  greater than 50%? These are some of the questions that can be answered by looking at the sampling distribution of the sample proportion. 

It can be shown that the Central Limit Theorem, which gave us the sampling distribution of the sample mean, can also be applied to the sample proportion . The content of the theorem is summarized below:

 

The Central Limit Theorem for the Sample Proportion :

If we take a random sample of size n from population with proportion p, then the sample proportion will have mean and standard deviation if the population is infinite, and if the population is finite with size N.  

And will have an approximately normal distribution if n is 'large.'

 

As before, this is an approximate distribution: the larger n is, the better the approximation. But the rule of thumb for deciding what value of n is large enough for the approximation to be considered acceptable is different from the one we used for the sampling distribution of (where n is considered large enough if ). Here, n should be large enough so that both np and n(1-p) are at least 5.

 

Note:

  1. The two expressions for the standard deviation of differ only by the Finite Population Correction Factor which, as before, can be safely ignored if .

  2. We have presented the approximate distribution of . Its exact distribution can be obtained from the Binomial Distribution. (Not covered in this module)

Example 2

Let us return to the 'Part-time Job' example and answer the questions we have posed. Recall that we are planning to select a random sample of 100 students from a college where 40% of the students work at least part time. And we wish to calculate:

  1. What percentage of the students in the sample can be expected to have at least a part time job. 

  2. How much we can expect the sample proportion to vary from the population proportion.

  3. The probability of observing a sample proportion that is greater than 0.5.

Answers:

  1. The desired quantity is  the expected value of .

  2. The amount by which 'we expect the sample proportion to vary from the population proportion' is just another way of describing the Standard Deviation of (or the Standard Error). 

    Note that we do not use the FPCF even though the population is finite. Since N is not given, we have assumed that it is considerably larger than n.

  3.  Here we have  p = 0.4, n = 100, and we wish to calculate .   Since np = 100(0.4) and n(1-p) = 100(0.6) are both greater than 5, we can then say  that the sample proportion has an approximately normal distribution, with mean 0.4 and standard deviation 0.0490, as calculated above. The desired probability can be approximated as follows:

 

 

 


Click here for self test exercises on this section.


REFERENCES:

Statistics for Business and Economics, 7th ed. (by Anderson, Sweeney, and Williams)

Section 7.4 pp. 259 - 261

Section 7.5 pp. 262-272   

Section 7.6 pp. 273-278

Essentials of Statistics for Business and Economics, 2nd ed. (by Anderson, Sweeney, and Williams)

Section 7.4 pp. 257-260

Section 7.5 pp. 261-270

Section 7.6 pp. 271-276

The Basic Practice of Statistics, 2nd ed. (by David S. Moore)

Section 4.1 pp. 215-219

Section 4.3 pp. 236-249

Section 8.1 pp. 431-433