2.3 Exact Approaches to the Sampling Distribution

A second approach to constructing a sampling distribution has implicitly been demonstrated in the preceding section on bootstrapping (Section 2.1) and the section on probability distributions (Section 1.2.3). In these sections, we calculated the true sampling distribution of the proportion of yellow candies in a sample from the probabilities of the colours. If we know or think we know the proportion of yellow candies in the population, we can exactly calculate the probability that a sample of ten candies includes one, two, three, or ten yellow candies. See the section on discrete random variables for details (Section 1.2).

Table 2.1: Number of heads for a toss of three coins.
Outcome Combination Probability: Combination Probability: Outcome
0 tail-tail-tail 1/2 * 1/2 * 1/2 = 1/8 = .125 1/8 = .125
1 tail-tail-head 1/2 * 1/2 * 1/2 = 1/8 = .125
1 head-tail-tail 1/2 * 1/2 * 1/2 = 1/8 = .125
1 tail-head-tail 1/2 * 1/2 * 1/2 = 1/8 = .125 3/8 = .375
2 head-head-tail 1/2 * 1/2 * 1/2 = 1/8 = .125
2 head-tail-head 1/2 * 1/2 * 1/2 = 1/8 = .125
2 tail-head-head 1/2 * 1/2 * 1/2 = 1/8 = .125 3/8 = .375
3 head-head-head 1/2 * 1/2 * 1/2 = 1/8 = .125 1/8 = .125
Total 8 1.000

The calculated probabilities of all possible sample statistic outcomes give us an exact approach to the sampling distribution. Note that I use the word approach instead of approximation here because the obtained sampling distribution is no longer an approximation, that is, more or less similar to the true sampling distribution. No, it is the true sampling distribution itself.

2.3.1 Exact approaches for categorical data

An exact approach lists and counts all possible combinations. This can only be done if we work with discrete or categorical variables. For an unlimited number of categories, we cannot list all possible combinations.

A proportion is based on frequencies and frequencies are discrete (integer values), so we can use an exact approach to create a sampling distribution for one proportion such as the proportion of yellow candies in the example above. The exact approach uses the binomial probability formula to calculate probabilities. Consult the internet if you want to know this formula; we are not going to use it here.

Exact approaches are also available for the association between two categorical (nominal or ordinal) variables in a contingency table: Do some combinations of values for the two variables occur relatively frequently? For example, are yellow candies more often sticky than red candies? If candies are either sticky or not sticky and they have one out of a limited set of colours, we have two categorical variables. We can create an exact probability distribution for the combination of colour and stickiness. The Fisher-exact test is an example of an exact approach to the sampling distribution of the association between two categorical variables.

2.3.2 Computer-intensive

The exact approach can be applied to discrete variables because they have a limited number of values. Discrete variables are usually measured at the nominal or ordinal level. If the number of categories becomes large, a lot of computing time can be needed to calculate the probabilities of all possible sample statistic outcomes. Exact approaches are said to be computer-intensive.

It is usually wise to set a limit to the time you allow your computer to work on an exact sampling distribution because otherwise the problem may keep your computer occupied for hours or days.