This section will introduce you to statistical hypothesis testing. We will be testing a hypothesis about whether a sample mean came from a particular population. We will be looking at two similar and related procedures.

The simplest form of a statistical test is to compare a sample to
see if it came from a population with __known population parameters__.
It is a very rare situation in which such population parameters are
known, but it is an easier statistical procedure to compute.
Therefore, we will start here, which will provide a basis for
understanding the logic of statistical hypothesis testing.

For example, suppose that we know that the population mean and standard deviation for a given variable are 30.00 and 4.00, respectively. The null hypothesis is that our sample came from a population with a population mean of 30, and the alternative hypothesis is that the population mean for our sample is not equal to 30, as shown below.

This form of the null and alternative hypotheses are said to be
**nondirectional** in that the alternative hypothesis does not
state whether the the population mean is either greater than or less
than 30, only that it is not equal to 30. Shortly, you will learn
about directional hypotheses.

We can draw this situation graphically even before we take a look
at our obtained mean. Let us assume that the sample mean is based on
a sample size that is sufficiently large that the central limit
theorem applies. For this example, we will assume that the sample
size is 100. Therefore, we know that the *sampling distribution of
means* will be normal.

We can compute the standard error of the mean (that is, the standard deviation for this sampling distribution of means) by dividing the population standard deviation by the square root of the sample size, as you learned previously. In this case, the standard error is equal to 4.00 divided by the square root of 100, which is .40. (Check the math to make sure you understand how this is computed.)

You also learned previously that in statistical tests, we set a decision criterion called alpha. For now, let's assume that we have set alpha to the traditional value of .05. When we are testing a nondirectional hypothesis, as we are here, we need to split the value of .05 between the two tails of the distribution so that there is .025 in each tail. This may sound familiar, because we performed this task before when we computed a confidence interval. There you learned that 1.96 standard deviations above the mean will cut off the top 2.5%, and a 1.96 standard deviations below the mean will cut off the bottom 2.5%.

The figure below illustrates the sampling distribution for means for the example we are working on here. So if our sample mean is less than 29.216 or greater than 30.784, it is not likely to have come from a population with a mean of 30.00 and a standard deviation of 4.00. So we would reject the null hypothesis and conclude that our mean must have come from another population with a different population mean. We could be wrong, but 95% of the time we will be correct. Setting the alpha at .05 says that we have to be at least 95% confident of our decision before we reject the null hypothesis.

Suppose that we want to be even more confident of our decision before we reject the null hypothesis. In that case, we might set the alpha level to .01, which says that we want to be 99% confident of our decision before rejecting the null hypothesis. If we assume the nondirectional hypotheses above, we will need to find a value that will cut off .005 of the area in each tail (half of .01).

If you consult the
standard normal table, you will find the a *Z*-score of
2.58 cuts off the top .005 of the distribution. So our confidence
interval will be modified as shown below, with the decision points
moving out a bit further from the decision points for an alpha of
.05. The new decision points are 28.968 and 31.032, as shown in the
figure below. If our sample mean falls outside of those
limits, we would consider that convincing enough evidence that we
should reject the null hypothesis.

These two examples rest on the assumption that the alternative
hypothesis is nondirectional. When we make a nondirectional
hypothesis, we must split the area under the curve that is equal to
alpha evenly among the two tails of the distribution. We refer to
such a situation as a **two-tail probability** or a **two-tail
test**.

However, you may have a strong reason to believe that the
alternative hypothesis should be directional. Instead of the
alternative hypothesis reading that the population mean is not equal
to 30, it reads that the population mean is greater than 30. Such a
directional hypothesis is called a **one-tail test**, because
the entire value of alpha is concentrated in the one tail.

For example, if alpha is .05, we would find the value of *Z*
that cuts off the top 5% (.05) in the upper tail. We do not need to
split the alpha because we are predicting that, if the null
hypothesis is rejected, it must be because the population mean is
actually larger than 30. Again, we consult the
standard normal table to find the value of *Z* that cuts
off the top 5%. If you do that, you will find that a *Z* of
1.64 cuts of .0505 and a *Z* of 1.65 cuts off .0495. So the *
Z* that would cut off exactly .0500 is half way between 1.64 and
1.65, or 1.645. This is probably more precision than we need for
most decisions. The norm is to take the most conservative figure,
which is the *Z* score that comes closest to the alpha level
but does not exceed it, which in this case is 1.65.

This situation is graphed in the figure below. The cutoff for rejecting the null hypothesis is 30.66. If the sample mean is greater than 30.66, we reject the null hypothesis and conclude that the population mean must be greater than 30.

These figures and the above explanation will help you to
visualize the conceptual process behind this first inferential
statistic that we are covering, but the procedures require more
computation than is necessary. Instead of computing the cutoff
scores, all we really need to compute is the *Z*-score for our
sample mean, which we can do using the formula below. Then we
evaluate that *Z*-score against the **critical Z-score**,
which is the

So going back to our original situation, we want to test the null
hypothesis that the population value is equal to 30.00. We know that
the population standard deviation is 4.00. If we conduct a
nondirectional test (i.e., a two-tail test) with an alpha of .05,
our critical values of *Z* are ±1.96. If the mean of our sample
of 100 people is 27.94, is this sufficiently different from our
population mean of 30.00 to reject the null hypothesis and conclude
that the sample must have been drawn from a different population
with a different population mean?

Plugging the numbers into the formula above, we get a computed
value of *Z* of -5.15. This value certainly exceeds our
critical value of ±1.96, so we reject the null hypothesis and
conclude that our sample was not drawn from this population.

It is actually rare to know the population parameters. The more common situation is when we want to test a sample mean against a hypothesized population mean. In this situation, we have to estimate the population standard deviation using the standard deviation from the sample. We have done this estimation before. The trick was to use a formula that would give an unbiased estimate of the population parameter. So our formula will change slightly in this situation.

But there is one more important change in this situation, and
this change involves introducing a new distribution, called the **
Student's t distribution**, or simply the

You were introduced to the concept of degrees of
freedom (df) earlier, and you also learned earlier that dividing
by the degrees of freedom instead of *N* produced an
unbiased estimate of the population variance.
Here we are using the concept of df to identify the appropriate *t*
distribution to use in this statistical test.

It is helpful to understand how the *t* distributions
compare to the standard normal distribution. The larger the sample
size, the closer the *t* distribution is to the standard normal
or *Z* distribution. In fact, when the sample size is infinity,
the *t* and *Z* distributions are identical. But as the
sample size gets smaller, the *t* distribution becomes
progressively flatter and more spread out.

Because there are so many *t* distributions, what is tabled
is not each distribution, but rather the critical values for each
distribution. Click on this link to view the *
t*
distribution table. Note that this table presents critical
values for various alpha levels (labeled level of significance in
the table) ranging from .20 to .001 and for various degrees of
freedom (df).

When we are testing a single sample against a hypothesized
population mean, the degrees of freedom are one less than the sample
size (i.e., df = *N-1)*. If you look at the bottom of the
table, where the dfs are equal to infinity, you will find some
familiar numbers (1.645 for level of significance of .10, 1.96 for
.05, and 2.58 for .01). These were the critical values of *Z*
that we identified earlier on this page.

Note that if you are using a one-tail test, the alpha is not split between the tails. Hence, you look up the critical value that is double the level of significance of the two-tail test. So a one-tail test with an alpha of .05 is equivalent to a two-tail test with an alpha of .10.

Note also that as the degrees of freedom decrease, the critical
value at any level of significance increases. The reason is that
with smaller sample sizes, the *t* distribution is
progressively flatter and more spread out, so you have to go further
out into the tail to cut off a given area of the curve. This effect
is very dramatic for small sample sizes, but much less dramatic for
moderate to large sample sizes.

For example, let's take the two tail alpha value of .05. At
infinity, the critical value is 1.96, but it only increases to 1.98
at df=120 and 2.00 at df=60. However, by the time that you reach df=10,
the critical value is 2.228, at df=5, it is 2.571, and at df=1, it
is 12.706. If the degrees of freedom are equal to 1, the sample size
is 2, which is the minimum *N* for which we can compute a
variance estimate. But with sample sizes that small, population
estimates are not going to be precise. This general concept was
first introduced when we talked about the
standard error of the mean. The larger the sample size (and
therefore the dfs), the more confident we can be in our population
estimates, and therefore the less we have to adjust our critical
values of *t* to compensate for the imprecision of our
population estimates.

The *t*-test comparing a sample against a hypothesized
population mean is called a **one-sample t-test** or a

Now that you have the basic concepts for the single-group *t*-test,
let's actually work through a problem. Let's assume that you have a
sample of 15 people, and the mean and standard deviation of that
sample are 24.33 and 6.29, respectively. You want to test the
hypothesis that the population mean for the population from which
this sample was drawn is 22.00. Let's further assume that you decide
to conduct a two-tail test with the traditional alpha level of .05.

Listed below are the null and alternative hypotheses. The formula
that you would use to compute the value of *t* is virtually
identical to the formula that is used to compute the *Z* when
the population standard deviation is known. The only difference is
that we substitute the unbiased estimate of the population standard
deviation for the actual population standard deviation. The next
formula shows the actual computations.

The computed value of *t* (1.39) needs to be
compared to the critical value of *t* for the alpha level and
degrees of freedom. Since the sample size is 15, df=14. Looking at
the *
t*
distribution table in the column labeled .05 level of
significance (equivalent to a two-tail test with an alpha of .05).
If you do, you will find that the critical value of *t *is
2.145. Since our computed value of *t* is less than this
critical value, we fail to reject the null hypothesis and we
conclude that this sample may have come from a population with a
population mean of 22.00.