Mathwizurd.com is created by David Witten, a mathematics and computer science student at Stanford University. For more information, see the "About" page.

t Distribution

Let's say you're weighing soccer balls at the store, and you pick up 10 balls, and calculate their weights: $$1, 1.3, 1.3, 1.4, 1.5, 1.6, 1.6, 1.7, 1.7, 1.9$$ The advertised weight of the ball is 1.7. Is that reasonable?

Z vs. T

$$Z = \dfrac{\bar{x} - \mu}{\sigma/\sqrt{n}}$$ Z is normally distributed with mean 0 and standard deviation 1. In our example above, if we knew the standard deviation of the distribution that produces the balls (e.g. the factory has an error of $\pm0.2$ pounds).

However, we don't know the standard deviation. We can only get it from our original sample. Therefore, our new standardized variable is the quotient of two random variables: the mean of our sample, and the standard deviation of our sample. $$T = \dfrac{\bar{x} - \mu}{S/\sqrt{n}}$$ Now, how does this affect our distribution?

Shape of the Distribution

Simulation

To demonstrate this, I simulated 40,000 samples of size 3 from a normal distribution (mean = 1.5, stdev = 0.2)

import numpy as np
from statistics import stdev
from matplotlib import pyplot as plt
deviations = []
for i in range(40000):
    deviations.append(stdev(np.random.normal(1.5, 0.2, 3)))
plt.hist(deviations, bins = 30)

It produced this distribution:

standard_deviation.png

As you can see, it is skew right. So, the mean of this distribution is less than its standard deviation. The mean is 0.1785, and the median is 0.1680.

From this, can we guess the shape of our final distribution?

Our numerator is a normal distribution, because it’s the mean of normal variables. Our denominator is skew-right, so we expect smaller values (meaning below the mean) more than larger values. If the denominator is smaller, then the quotient is greater.

So, we should expect to see more extreme values, and fewer values closer to 0 (implying a larger denominator). Therefore, this should look like the normal distribution with heavier tails.

Theory

So, we got a general understanding of the denominator, but what is the actual shape of the distribution? For that, we should look at the Chi^2 distribution. I will make a post on this but for now, there are two theorems:

If $X_1, X_2, ..., X_n$ are a random sample from a normal distribution, then $\bar{X}$ and $S^2$ are independent.

If $X_1, X_2, ..., X_n$ are a random sample from a normal distribution, then $\dfrac{(n-1)S^2}{\sigma^2}$ is from a $\chi^2_{n-1}$ distribution.

$\chi^2$ distributions are skew-right, explaining the shape of our result. Now, let's analyze T again. $$T = \dfrac{\bar{x} - \mu}{S/\sqrt{n}}$$ $$ = \dfrac{\bar{x}- \mu}{\frac{S}{\sigma}\frac{\sigma}{\sqrt{n}}}$$ $$ = \dfrac{\bar{x}- \mu}{\sqrt{\frac{S^2}{\sigma^2}}\frac{\sigma}{\sqrt{n}}}$$ $$ = \dfrac{\bar{x}- \mu}{\sqrt{\frac{(n-1)S^2}{\sigma^2}/(n-1)}\frac{\sigma}{\sqrt{n}}}$$ $$ = \dfrac{(\bar{x}- \mu)/\frac{\sigma}{\sqrt{n}}}{\sqrt{\frac{(n-1)S^2}{\sigma^2}/(n-1)}}$$ The numerator is a standard normal with mean $\mu$ and variance $\dfrac{\sigma^2}{n}$. The denominator is a chi-squared variable with n-1 degrees of freedom divided by n-1. So, the t-distribution is a normal variable divided by chi-squared.

One Sample T-Test

This goes back to our initial question. Is it possible this sample: $1, 1.3, 1.3, 1.4, 1.5, 1.6, 1.6, 1.7, 1.7, 1.9$ has a mean of 1.7? Well first, let's calculate the mean and stdev. $$\text{Mean: } 1.5$$ $$\text{Standard deviation: } 0.2582$$ What is our t-value? The way we calculate this is we assume that the mean is 1.7, the standard deviation is the one we found, and we check if it's reasonable to observe the mean that we saw. $$\dfrac{1.5- 1.7}{0.2582/\sqrt{10}} = \dfrac{-0.2}{0.0816} = -2.21371$$ Remember, if we knew the underlying standard deviation, we would know that the only thing that varies is the mean of our distribution that we're comparing to 1.7. However, the standard deviation (0.258) would be different in every sample we took. So, we plug it into the t-distribution with 9 degrees of freedom. After all, the bigger sample we take, the less the standard deviation will vary. If we take samples with 1000 items, their standard deviations will not vary much. The probability of getting a t-value $\leq -2.21371$ or $\geq 2.121371$ equals $0.054$ or $\boxed{5.4\%}$. Therefore, we have insufficient evidence to disprove that the mean of the sample is actually 1.7.

Covariance