Sampling Distribution Theory II

Sampling Distribution of Proportion:

  1. The sampling distribution of proportion is defined as:

Where x is the number of successes (values with a specified characteristic) in a sample of size n.

  1. If the sampling procedure is simple random, with replacement, x is recognised as Binomial Random Variable with parameters n and π, π is the probability of success.  π can also be interpreted as the population proportion, since:

  1. To determine the mean and variance of p:

Infinite Population with Replacement:

or alternatively

Finite Population without Replacement:

Example:

A coordination team consists of seven members.  The education of each member as follows: (G = Graduate, PG = Post Graduate)

Members

1

2

3

4

5

6

7

Education

G

PG

PG

PG

PG

G

G

(i)                  Determine the proportion of post-graduates in the population.

(ii)                Select all possible samples of two members from the population without replacement, and compute the proportion of post-graduate members in each sample.

(iii)               Compute the mean (μp) and the SD (σp) of the sample proportion computed in (ii).

Solution:

(i) Proportion of PG in the population:

N = 7

No. of PG = 4

π = 4/7 = 0.57

(ii) No. of possible samples (without replacement) = NCn = 7C2 = 21 samples.

1,2

1,3

1,4

1,5

1,6

1,7

 

2,3

2,4

2,5

2,6

2,7

 

 

3,4

3,5

3,6

3,7

 

 

 

4,5

4,6

4,7

 

 

 

 

5,6

5,7

 

 

 

 

 

6,7

The corresponding sampling proportions are:

0.5

0.5

0.5

0.5

0

0

 

1

1

1

0.5

0.5

 

 

1

1

0.5

0.5

 

 

 

1

0.5

0.5

 

 

 

 

0.5

0.5

 

 

 

 

 

0

Sampling Distribution of Proportion

p

Tally Marks

f

P(p)

0

|||

3

3/21 = 1/7 = 0.143

0.5

||

12

12/21 = 4/7 = 0.571

1

|

6

6/21 = 2/7 = 0.286

Total

 

21

1

 

p

P(p)

p.P(p)

p2.P(p)

0

0.143

0

–0.5715

0.32661

0.04671

0

0.5

0.571

0.2855

–0.0715

0.00511

0.00292

0.14275

1

0.286

0.286

0.4285

0.18361

0.05251

0.286

Total

 

0.5715

 

 

0.10214

0.42875

(iii) Mean ( ) and SD ( ) of sample proportion distribution:

or alternatively

The results are verified as below:

Shape of the Sampling Distribution of Proportion p:

The central limit theorem also holds for the random variable p, which states that:

(i)                  The sampling distribution of proportion p approaches a normal distribution with mean  and SD (with replacement)

(ii)                If the random sampling is without replacement and the sampling fraction , the f.p.c. must be used as below in the formula of SD:

(iii)               When n ≥ 50 and both n.π and n(1 – π) are greater than 5, the sampling distribution can be considered ‘normal’.

(iv)              When the distribution of p is normal, the following statistic will be standard normal variable:

Sampling Distribution of Difference between Two Proportions:

  1. If two random samples of size n1 and n2 are drawn independently from two populations with proportions π1 and π2 the sampling distribution of (p1 – p2) the difference between two sample proportions, approaches normal distribution with:

as n1 and n2 increase.

Moreover:

will be standard normal variable.

  1. For unknown π1 and π2, sample estimates p1 and p2 are used thus:

  1. When the two unknown population proportions can be assumed equal, an estimated  is obtained as below:

and the estimated standard error as below:

Sampling Distribution of t:

  1. If a random sample of size n is drawn from a known Normal Population with mean μ and SD σ, the sampling distribution of the sample mean  is a normal distribution with mean  and standard error , and hence z would be a standard normal variable:

  1. But when the population is unknown with unknown SD σ, the value of σ is replaced the sample SD ‘S’, as given below:

Therefore, the standard error is equal to :

  1. According to W.S. Gossett, the following statistics is denoted by ‘t’ instead of ‘z’, which follows another distribution known as ‘students’ t-distribution’ or simply ‘t-distribution’.
  2. The sample standard deviation is given by:

In the above equation the (n – 1) is called ‘Degree of Freedom’ or simply d.f., through which we can obtain ‘t-value’ from ‘t-table’.

  1. The t-distribution approaches standard normal distribution as n increases.  Typically when n > 30, the t-distribution is considered approximately standard normal.

Properties of t-distribution:

  1. The t-distribution, like the standard normal, is bell shaped, unimodal and symmetrical about the mean,
  2. There is a different t-distribution for every possible sample size,
  3. The exact shape of t-distribution, depends on the parameter, the number of degrees of freedom, denoted by ν.
  4. As the sample size increases, the shape of t-distribution becomes approximately equal to the standard normal distribution:

  1. The mean and standard error of t-distribution are:

Sampling Distribution of Variances:

Population Variance:

or alternatively

Mean of sampling distribution of S2 ( ):

Example:

A population consists of the following numbers: 1,3,5,7.  Find the population variance (σ2) and the mean of sampling distribution of variances ( ), if all samples are drawn with replacement of size 2 from the population.

Solution:

No. of possible samples (with replacement) = Nn = 42 = 16 samples

Samples:

1,1

1,3

1,5

1,7

3,1

3,3

3,5

3,7

5,1

5,3

5,5

5,7

7,1

7,3

7,5

7,7

Means of samples:

1

2

3

4

2

3

4

5

3

4

5

6

4

5

6

7

Variances of samples:

0

1

4

9

1

0

1

4

4

1

0

1

9

4

1

0

Sampling Distribution of S2:

S2

Tally Marks

f

f.S2

0

||||

4

0

1

|

6

6

4

||||

4

16

9

||

2

18

Total

 

16

40

Pooled Estimate of Variance:

  1. If random samples of size n1 and n2 are drawn independently from two normal populations with means μ1 and μ2 and variances σ12 and σ22, the sampling distribution of the difference between the sample means  follows a normal distribution with mean and standard error given as below:

Thus, the π will be equal to:

and it will be a standard normal variable.

  1. But if σ12 and σ22 are unknown and equal, their estimators S12 and S22 are defined as:

When the σ12 and σ22 are replaced by the estimators S12 and S22 the distribution of  can be standardised provided that the samples are large (n1 and n2 > 30). 

  1. But when samples are small, i.e., less than 30 (n1 and n2 ≤ 30), σ12 and σ22 are replaced by a single estimator known as ‘pooled variance’ denoted by Sp2:

Weighted Average of S12 and S22:

Where (n1 + n2 – 2) is the degree of freedom.

  1. With same size of samples n1 and n2, the estimator Sp2 is the simple average of S12 and S22:

  1. The pooled variance Sp2 assumes that the population variance is unknown and equal.  However, the same Sp2 is used to replace σ12 and σ22 for slightly unequal population variances provided that the samples are of equal size, i.e., n1 = n2.
  2. In both of the above situations, i.e., equal population variance and slightly unequal population variance with equal samples (i.e., n1 = n2), the statistic t is calculated as below:

Where Sp is pooled SD.

  1. Now consider the situation where σ12 and σ22 are considerably different (both unknown) and it is impossible to draw samples of equal size, the statistics used in this case would be:

Where the degree of freedom ν is as follows:

Top

Home