In a previous post, we introduced the basic terminology of hypothesis testing. We also wanted to test the null hypothesis that the two websites have the same clickthrough rate i.e.
against the alternative hypothesis
that
. Here we show how to do it.
Test Statistic and the t-test
A test statistic is some function of a sample that is used for hypothesis testing. The name of the test denotes the distribution of the test statistic: for instance, a z-test refers to a hypothesis test where the test statistic under the null hypothesis is Gaussian distributed. However, to use an exact z-test, we require the standard deviation of our data to be known. An alternative related test is the t-test. Consider the case of iid observations with sample mean
for some unknown parameters
. Let
which is an unbiased estimate of the variance. Then
(1)
has a Student’s t-distribution with degrees of freedom.
If we wanted to test whether the population or true mean was , we would set a threshold for the p-value, for instance 0.05, let
be
-distributed with
degrees of freedom, and then calculate whether
(2)
If one of these holds, then we reject the null hypothesis.
Two Samples
In our setting we don’t want to test whether our sample mean matches some hypothesized true mean, but instead want to test whether the true means between two samples are equal. That is, we have two samples with mean
and
with mean
, and we assume the sample means
and
. Then
is the sample mean of the differences. If
is the unbiased estimator of the variance of
and
that for
, then under the null hypothesis
,
, so that
(3)
follows a student’s t-distribution with degrees of freedom.