Is the distribution of values of the statistic from all possible samples of size n?

Sampling Distribution of a Statistic

What does "sampling distribution" mean?

Does the type of sampling matter?

What are some examples of sampling distributions that we often look at in statistics courses?

What does "sampling distribution" mean?

One of our main goals in statistics is to take some data from a sample, compute something (a statistic), and make an inference about some value (a parameter) of the population.

Before we can make such an inference, we must do some theoretical work to figure out what sorts of values we'd expect to see for the statistic. In other words, what is the distribution of values for the statistic?

The name of that distribution is the sampling distribution of the statistic.

The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.

Students often find this a hard concept. The idea that we might have to list and study "all possible samples" is mind-boggling. Surely we won't have to do this!! Well, yes we do, in a way. Usually, we approximate the sampling distribution by just taking "a lot" of possible samples. In Basic Practice of Statistics, in the beginning of the chapter on sampling distributions, the author takes 1000 possible samples and uses those to approximate the sampling distribution of the sample proportion. That's a simulation of the sampling distribution. In the project on the sampling distribution of the sample mean which is found in these Web pages, you will list all the possible samples. That is possible only because it is a very small population and a tiny sample size.

Always the main point of the chapter in the statistics book on sampling distributions is to give you some theoretical results about some of the most important sampling distributions you will encounter in the rest of the course. After you learn those theoretical results, you won't have to bother with simulating the entire sampling distributions when you do your work. But, occasionally statisticians run across a statistic for which they don't already know what the sampling distribution looks like and they need to figure it out from basic principles. Either they can use some of the mathematics they have already learned (usually in M378K or M384C) or they have to fall back on using computer simulations to get an idea of what the sampling distribution looks like and what its mean and variance are.

Return to the top.

Does the type of sampling matter in generating the sampling distribution?

YES! "All possible samples" means all possible outcomes for one type of sampling, like SRS, etc. So, strictly speaking, should not talk about "the sampling distribution." Instead, we should say "the sampling distribution of the sample mean for SRS's of size 30".

Return to the top.

What are some sampling distributions we often analyze in a statistics course?

  • Sampling distribution of the sample mean for SRS's of size n (Recall the Central Limit Theorem. That's the main result about these.)
  • Sampling distribution of the sample proportion for SRS's of size n (Recall the formula for the mean and standard deviation of p-hat.)
  • My project also has you look at the sampling distribution of the sample range for an SRS of size 2. The point of this is to illustrate that not all statistics are good estimators of their corresponding parameter. When you sketch this sampling distribution, you see that the center of that sampling distribution (also known as the expected value of the statistic) is nowhere near the parameter value. This shows that the sample range would be a very poor estimator of the population range.
  • Sampling distribution of the sample count of "successes" in a when we measure values on a 0-1 random variable. You should recognize this as having a binomial distribution.

Return to the top.

Mary Parker

In the following example, we illustrate the sampling distribution for the sample mean for a very small population. The sampling method is done without replacement.

Sample Means with a Small Population: Pumpkin Weights

In this example, the population is the weight of six pumpkins (in pounds) displayed in a carnival "guess the weight" game booth. You are asked to guess the average weight of the six pumpkins by taking a random sample without replacement from the population.

Pumpkin

A

B

C

D

E

F

Weight (in pounds)

19

14

15

9

10

17

Since we know the weights from the population, we can find the population mean.

\(\mu=\dfrac{19+14+15+9+10+17}{6}=14\) pounds

To demonstrate the sampling distribution, let’s start with obtaining all of the possible samples of size \(n=2\) from the populations, sampling without replacement. The table below shows all the possible samples, the weights for the chosen pumpkins, the sample mean and the probability of obtaining each sample. Since we are drawing at random, each sample will have the same probability of being chosen.

Sample

Weight

\(\boldsymbol{\bar{x}}\)

Probability

A, B

A, C

A, D

A, E

A, F

B, C

B, D

B, E

B, F

C, D

C, E

C, F

D, E

D, F

E, F

19, 14

16.5

\(\frac{1}{15}\)

19, 15

17.0

\(\frac{1}{15}\)

19, 9

14.0

\(\frac{1}{15}\)

19, 10

14.5

\(\frac{1}{15}\)

19, 17

18.0

\(\frac{1}{15}\)

14, 15

14.5

\(\frac{1}{15}\)

14, 9

11.5

\(\frac{1}{15}\)

14, 10

12.0

\(\frac{1}{15}\)

14, 17

15.5

\(\frac{1}{15}\)

15, 9

12.0

\(\frac{1}{15}\)

15, 10

12.5

\(\frac{1}{15}\)

15, 17

16.0

\(\frac{1}{15}\)

9, 10

9.5

\(\frac{1}{15}\)

9, 17

13.0

\(\frac{1}{15}\)

10, 17

13.5

\(\frac{1}{15}\)

We can combine all of the values and create a table of the possible values and their respective probabilities.

\(\boldsymbol{\bar{x}}\)

Probability

9.5

11.5

12.0

12.5

13.0

13.5

14.0

14.5

15.5

16.0

16.5

17.0

18.0

\(\frac{1}{15}\)

\(\frac{1}{15}\)

\(\frac{2}{15}\)

\(\frac{1}{15}\)

\(\frac{1}{15}\)

\(\frac{1}{15}\)

\(\frac{1}{15}\)

\(\frac{2}{15}\)

\(\frac{1}{15}\)

\(\frac{1}{15}\)

\(\frac{1}{15}\)

\(\frac{1}{15}\)

\(\frac{1}{15}\)

The table is the probability table for the sample mean and it is the sampling distribution of the sample mean weights of the pumpkins when the sample size is 2. It is also worth noting that the sum of all the probabilities equals 1. It might be helpful to graph these values.

Sampling Distribution9.5 11.5 12 12.5 13 13.5 14 14.5 15.5 16 16.5 17 18 0.00 0.02 0.04 0.06 0.08 0.10 0.12

One can see that the chance that the sample mean is exactly the population mean is only 1 in 15, very small. (In some other examples, it may happen that the sample mean can never be the same value as the population mean.) When using the sample mean to estimate the population mean, some possible error will be involved since the sample mean is random.

Now that we have the sampling distribution of the sample mean, we can calculate the mean of all the sample means. In other words, we can find the mean (or expected value) of all the possible \(\bar{x}\)’s.

The mean of the sample means is

\(\mu_\bar{x}=\sum \bar{x}_{i}f(\bar{x}_i)=9.5\left(\frac{1}{15}\right)+11.5\left(\frac{1}{15}\right)+12\left(\frac{2}{15}\right)\\+12.5\left(\frac{1}{15}\right)+13\left(\frac{1}{15}\right)+13.5\left(\frac{1}{15}\right)+14\left(\frac{1}{15}\right)\\+14.5\left(\frac{2}{15}\right)+15.5\left(\frac{1}{15}\right)+16\left(\frac{1}{15}\right)+16.5\left(\frac{1}{15}\right)\\+17\left(\frac{1}{15}\right)+18\left(\frac{1}{15}\right)=14\)

Even though each sample may give you an answer involving some error, the expected value is right at the target: exactly the population mean. In other words, if one does the experiment over and over again, the overall average of the sample mean is exactly the population mean.

Now, let's do the same thing as above but with sample size \(n=5\)

Sample

Weights

\(\boldsymbol{\bar{x}}\)

Probability

A, B, C, D, E

19, 14, 15, 9, 10

13.4

1/6

A, B, C, D, F

19, 14, 15, 9, 17

14.8

1/6

A, B, C, E, F

19, 14, 15, 10, 17

15.0

1/6

A, B, D, E, F

19, 14, 9, 10, 17

13.8

1/6

A, C, D, E, F

19, 15, 9, 10, 17

14.0

1/6

B, C, D, E, F

14, 15, 9, 10, 17

13.0

1/6

The sampling distribution is:

\(\boldsymbol{\bar{x}}\)

Probability

13.0

13.4

13.8

14.0

14.8

15.0

1/6

1/6

1/6

1/6

1/6

1/6

The mean of the sample means is...

\(\mu=(\dfrac{1}{6})(13+13.4+13.8+14.0+14.8+15.0)=14\) pounds

The following dot plots show the distribution of the sample means corresponding to sample sizes of \(n=2\) and of \(n=5\).

Population Mean 9 10 11 12 13 14 15 16 17 18 2 5 Sample Size

Again, we see that using the sample mean to estimate population mean involves sampling error. However, the error with a sample of size \(n=5\) is on the average smaller than with a sample of size \(n= 2\).

Sampling Error and Size

Sampling Error The error resulting from using a sample characteristic to estimate a population characteristic.

Sample size and sampling error: As the dotplots above show, the possible sample means cluster more closely around the population mean as the sample size increases. Thus, the possible sampling error decreases as sample size increases.

What happens when the population is not small, as in the pumpkin example?

Sample Means with Large Samples: Exam Example

An instructor of an introduction to statistics course has 200 students. The scores out of 100 points are shown in the histogram.

The population mean is \(μ=71.18\) and the population standard deviation is \(σ=10.73\)

Let's demonstrate the sampling distribution of the sample means using the StatKey website. The first video will demonstrate the sampling distribution of the sample mean when n = 10 for the exam scores data. The second video will show the same data but with samples of n = 30.

  • n=10
  • n=30

You should start to see some patterns. The mean of the sampling distribution is very close to the population mean. The standard deviation of the sampling distribution is smaller than the standard deviation of the population.

In the examples so far, we were given the population and sampled from that population.

What happens when we do not have the population to sample from? What happens when all that we are given is the sample? Fortunately, we can use some theory to help us. The mathematical details of the theory are beyond the scope of this course but the results are presented in this lesson.

In the next two sections, we will discuss the sampling distribution of the sample mean when the population is Normally distributed and when it is not.

What is the distribution of all values of the statistic when all possible samples of the same size n are taken from the same population?

The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. Students often find this a hard concept.

What is distribution of values taken by a statistic in all possible samples?

The sampling distribution (histogram) of a statistic is the distribution of values taken by the statistic in ALL possible samples of the same size from the same population. The interpretation of a sampling distribution is the same, whether we obtain it by simulation or by the mathematics of probability.

Is the probability distribution of a sample statistic that is formed when samples of size n are repeatedly taken from a population?

A sampling distribution is the probability distribution of a sample statistic that is formed when samples of size n are repeatedly taken from a population. If the sample statistic is the sample mean, then the distribution is the sampling distribution of sample means.

What is the distribution of the sample statistic?

A sampling distribution is a probability distribution of a statistic that is obtained through repeated sampling of a specific population. It describes a range of possible outcomes for a statistic, such as the mean or mode of some variable, of a population.

Toplist

Última postagem

Tag