Toss1 Toss2
1 0 0
2 1 0
3 0 1
4 1 1
University of Amsterdam
2024-02-12
Lets start simple and throw only 2 times with a fair coin. Assigning 1 for heads and 0 for tails.
The coin can only have the values 0, 1, heads or tails.
If we throw 2 times we have the following possible outcomes.
Toss1 Toss2
1 0 0
2 1 0
3 0 1
4 1 1
With frequency of heads being
Toss1 Toss2 frequency
1 0 0 0
2 1 0 1
3 0 1 1
4 1 1 2
For each coin toss, disregarding the outcom, there is a .5 probability of landing heads.
Toss1 Toss2
1 0.5 0.5
2 0.5 0.5
3 0.5 0.5
4 0.5 0.5
So for each we can specify the total probability by applying the product rule (e.g. multiplying the probabilities)
Toss1 Toss2 probability
1 0.5 0.5 0.25
2 0.5 0.5 0.25
3 0.5 0.5 0.25
4 0.5 0.5 0.25
Which is the same for all outcomes.
Though some outcomes occurs more often. Throwing 0 times heads, only occurs once and hence has a probability of .25. But throwing 1 times heads, can occur in two situations. So, for this situation we can add up the probabilities.
Toss1 Toss2 frequency probability
1 0 0 0 0.25
2 1 0 1 0.25
3 0 1 1 0.25
4 1 1 2 0.25
Toss1 | Toss2 | Toss3 | Toss4 | Toss5 | Toss6 | Toss7 | Toss8 | Toss9 | Toss10 |
---|---|---|---|---|---|---|---|---|---|
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
Toss1 | Toss2 | Toss3 | Toss4 | Toss5 | Toss6 | Toss7 | Toss8 | Toss9 | Toss10 | probability |
---|---|---|---|---|---|---|---|---|---|---|
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.0009766 |
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.0009766 |
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.0009766 |
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.0009766 |
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.0009766 |
#Heads | frequencies | Probabilities |
---|---|---|
0 | 1 | 0.0009766 |
1 | 10 | 0.0097656 |
2 | 45 | 0.0439453 |
3 | 120 | 0.1171875 |
4 | 210 | 0.2050781 |
5 | 252 | 0.2460938 |
6 | 210 | 0.2050781 |
7 | 120 | 0.1171875 |
8 | 45 | 0.0439453 |
9 | 10 | 0.0097656 |
10 | 1 | 0.0009766 |
\[ {n\choose k}p^k(1-p)^{n-k}, \small {n\choose k} = \frac{n!}{k!(n-k)!} \]
n | k | p | n! | k! | (n-k)! | (n over k) | p^k | (1-p)^(n-k) | Binom Prob |
---|---|---|---|---|---|---|---|---|---|
10 | 0 | 0.5 | 3628800 | 1 | 3628800 | 1 | 1.0000000 | 0.0009766 | 0.0009766 |
10 | 1 | 0.5 | 3628800 | 1 | 362880 | 10 | 0.5000000 | 0.0019531 | 0.0097656 |
10 | 2 | 0.5 | 3628800 | 2 | 40320 | 45 | 0.2500000 | 0.0039063 | 0.0439453 |
10 | 3 | 0.5 | 3628800 | 6 | 5040 | 120 | 0.1250000 | 0.0078125 | 0.1171875 |
10 | 4 | 0.5 | 3628800 | 24 | 720 | 210 | 0.0625000 | 0.0156250 | 0.2050781 |
10 | 5 | 0.5 | 3628800 | 120 | 120 | 252 | 0.0312500 | 0.0312500 | 0.2460938 |
10 | 6 | 0.5 | 3628800 | 720 | 24 | 210 | 0.0156250 | 0.0625000 | 0.2050781 |
10 | 7 | 0.5 | 3628800 | 5040 | 6 | 120 | 0.0078125 | 0.1250000 | 0.1171875 |
10 | 8 | 0.5 | 3628800 | 40320 | 2 | 45 | 0.0039063 | 0.2500000 | 0.0439453 |
10 | 9 | 0.5 | 3628800 | 362880 | 1 | 10 | 0.0019531 | 0.5000000 | 0.0097656 |
10 | 10 | 0.5 | 3628800 | 3628800 | 1 | 1 | 0.0009766 | 1.0000000 | 0.0009766 |
Sampling from your sample to approximate the sampling distribution.
Sampling with replacement
4 | 3 | 1 | 1 | 4 | 2 | 3 | 0 | 3 | 1 | 4 | 2 | 2 | 1 | 3 | 0 | 0 | 2 | 1 | 1 | 3 | 3 | 5 | 1 | 3 | 3 | 3 | 1 | 2 | 3 | 1 | 2 | 1 | 2 | 2 | 0 | 1 | 2 | 2 | 2 |
3 | 1 | 3 | 2 | 2 | 4 | 1 | 2 | 2 | 1 | 0 | 0 | 1 | 2 | 3 | 4 | 1 | 2 | 1 | 0 | 4 | 2 | 2 | 2 | 3 | 2 | 2 | 1 | 3 | 2 | 2 | 1 | 1 | 2 | 2 | 1 | 2 | 0 | 2 | 3 |
2 | 2 | 4 | 1 | 4 | 3 | 2 | 3 | 1 | 1 | 3 | 4 | 2 | 0 | 2 | 0 | 1 | 2 | 2 | 2 | 0 | 2 | 3 | 3 | 3 | 2 | 2 | 2 | 3 | 3 | 2 | 1 | 2 | 2 | 1 | 0 | 3 | 1 | 2 | 3 |
2 | 4 | 2 | 4 | 1 | 3 | 3 | 3 | 2 | 5 | 2 | 2 | 2 | 1 | 2 | 0 | 1 | 0 | 1 | 2 | 0 | 3 | 0 | 4 | 1 | 2 | 2 | 2 | 2 | 0 | 4 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 0 |
0 | 2 | 3 | 1 | 0 | 2 | 2 | 1 | 1 | 1 | 2 | 3 | 0 | 1 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 3 | 2 | 1 | 2 | 1 | 2 | 3 | 1 | 1 | 3 | 2 | 1 | 3 | 3 | 0 |
3 | 1 | 3 | 1 | 3 | 1 | 1 | 2 | 1 | 5 | 4 | 4 | 3 | 1 | 4 | 4 | 2 | 3 | 3 | 2 | 1 | 2 | 3 | 3 | 1 | 3 | 1 | 1 | 1 | 1 | 4 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 0 | 0 |
1 | 0 | 3 | 3 | 1 | 2 | 5 | 3 | 2 | 3 | 1 | 3 | 1 | 2 | 3 | 0 | 2 | 1 | 3 | 0 | 1 | 2 | 1 | 3 | 2 | 4 | 1 | 2 | 5 | 2 | 1 | 2 | 2 | 1 | 1 | 2 | 2 | 5 | 3 | 2 |
3 | 3 | 1 | 4 | 1 | 2 | 4 | 2 | 1 | 2 | 1 | 3 | 5 | 3 | 2 | 2 | 5 | 1 | 2 | 1 | 0 | 5 | 2 | 3 | 3 | 2 | 2 | 2 | 3 | 2 | 2 | 3 | 2 | 1 | 1 | 4 | 2 | 3 | 4 | 2 |
0 | 4 | 1 | 1 | 1 | 2 | 0 | 2 | 1 | 1 | 3 | 3 | 2 | 3 | 1 | 1 | 1 | 2 | 3 | 2 | 1 | 2 | 4 | 0 | 1 | 2 | 4 | 3 | 1 | 2 | 2 | 1 | 2 | 1 | 0 | 2 | 1 | 3 | 2 | 3 |
1 | 1 | 5 | 0 | 2 | 2 | 0 | 2 | 3 | 3 | 1 | 2 | 2 | 5 | 3 | 1 | 1 | 2 | 3 | 1 | 3 | 3 | 2 | 1 | 2 | 3 | 1 | 2 | 1 | 3 | 2 | 2 | 1 | 0 | 0 | 0 | 2 | 3 | 1 | 2 |
1 | 1 | 2 | 1 | 3 | 3 | 0 | 3 | 1 | 2 | 0 | 2 | 4 | 1 | 3 | 3 | 4 | 3 | 0 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 1 | 3 | 0 | 0 | 2 | 3 | 5 | 2 | 0 | 2 | 1 | 2 |
0 | 2 | 0 | 2 | 0 | 2 | 3 | 4 | 1 | 2 | 2 | 1 | 0 | 2 | 2 | 3 | 3 | 0 | 3 | 2 | 3 | 3 | 1 | 3 | 2 | 3 | 1 | 3 | 2 | 3 | 1 | 2 | 2 | 1 | 3 | 4 | 3 | 2 | 3 | 1 |
0 | 1 | 3 | 4 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 1 | 1 | 2 | 0 | 0 | 2 | 3 | 2 | 2 | 1 | 4 | 1 | 3 | 3 | 2 | 2 | 2 | 3 | 0 | 2 | 1 | 4 | 0 | 5 | 3 | 4 | 0 | 3 | 1 |
1 | 2 | 1 | 1 | 2 | 1 | 2 | 2 | 3 | 0 | 3 | 4 | 2 | 5 | 2 | 2 | 1 | 3 | 1 | 1 | 1 | 2 | 3 | 1 | 3 | 3 | 3 | 1 | 3 | 1 | 1 | 2 | 1 | 0 | 1 | 0 | 1 | 3 | 1 | 1 |
3 | 0 | 0 | 2 | 1 | 1 | 0 | 1 | 2 | 3 | 2 | 0 | 3 | 0 | 4 | 1 | 1 | 3 | 4 | 2 | 4 | 4 | 1 | 1 | 2 | 4 | 4 | 1 | 3 | 1 | 1 | 0 | 4 | 3 | 3 | 3 | 2 | 2 | 2 | 2 |
3 | 1 | 1 | 3 | 3 | 3 | 2 | 2 | 5 | 3 | 2 | 4 | 5 | 5 | 3 | 3 | 2 | 4 | 3 | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 2 | 1 | 1 | 1 | 0 | 4 | 1 | 4 | 2 | 1 | 5 | 2 | 2 | 2 |
1 | 1 | 1 | 1 | 2 | 1 | 2 | 2 | 2 | 0 | 4 | 1 | 3 | 0 | 2 | 2 | 0 | 2 | 0 | 1 | 4 | 4 | 3 | 1 | 3 | 1 | 1 | 2 | 3 | 1 | 2 | 2 | 2 | 3 | 3 | 1 | 3 | 0 | 1 | 5 |
1 | 2 | 1 | 1 | 2 | 1 | 3 | 1 | 3 | 2 | 1 | 2 | 3 | 3 | 4 | 0 | 2 | 0 | 2 | 0 | 2 | 2 | 2 | 0 | 3 | 3 | 0 | 4 | 2 | 2 | 2 | 3 | 1 | 3 | 0 | 0 | 3 | 1 | 1 | 2 |
1 | 2 | 2 | 3 | 1 | 3 | 2 | 2 | 2 | 2 | 1 | 5 | 1 | 2 | 3 | 3 | 3 | 1 | 1 | 4 | 1 | 0 | 3 | 2 | 2 | 2 | 2 | 6 | 5 | 1 | 4 | 1 | 0 | 3 | 1 | 3 | 1 | 2 | 3 | 2 |
1 | 4 | 4 | 1 | 3 | 3 | 2 | 6 | 0 | 1 | 1 | 3 | 3 | 1 | 1 | 4 | 3 | 1 | 1 | 3 | 3 | 3 | 1 | 3 | 3 | 0 | 2 | 2 | 2 | 2 | 2 | 0 | 3 | 1 | 3 | 0 | 2 | 1 | 3 | 1 |
0 | 2 | 1 | 0 | 2 | 1 | 2 | 2 | 2 | 3 | 5 | 2 | 2 | 1 | 1 | 2 | 3 | 2 | 0 | 1 | 0 | 1 | 3 | 2 | 2 | 2 | 4 | 2 | 1 | 3 | 1 | 1 | 4 | 5 | 0 | 2 | 3 | 3 | 1 | 3 |
0 | 3 | 1 | 4 | 2 | 2 | 1 | 2 | 2 | 2 | 0 | 1 | 3 | 2 | 6 | 3 | 1 | 4 | 2 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 4 | 1 | 1 | 2 | 1 | 1 | 5 | 5 | 3 | 3 |
4 | 4 | 1 | 0 | 3 | 2 | 2 | 3 | 3 | 2 | 2 | 0 | 1 | 1 | 1 | 1 | 3 | 1 | 1 | 1 | 1 | 1 | 4 | 3 | 2 | 1 | 2 | 1 | 2 | 3 | 5 | 1 | 1 | 1 | 4 | 2 | 1 | 1 | 2 | 1 |
4 | 5 | 2 | 2 | 3 | 4 | 2 | 2 | 0 | 1 | 4 | 3 | 1 | 0 | 2 | 4 | 3 | 2 | 5 | 3 | 0 | 0 | 2 | 3 | 2 | 0 | 3 | 2 | 0 | 4 | 1 | 0 | 1 | 3 | 2 | 2 | 4 | 4 | 1 | 4 |
2 | 0 | 2 | 3 | 3 | 3 | 2 | 3 | 0 | 3 | 2 | 0 | 1 | 2 | 2 | 2 | 7 | 3 | 2 | 2 | 1 | 3 | 1 | 2 | 1 | 0 | 2 | 3 | 2 | 0 | 5 | 2 | 6 | 2 | 1 | 3 | 1 | 2 | 4 | 0 |
Frequencies for number of heads per sample.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
---|---|---|---|---|---|---|---|---|---|---|---|
Freq | 104 | 263 | 310 | 215 | 74 | 29 | 4 | 1 | 0 | 0 | 0 |
For all continuous probability distributions:
In probability and statistics, Student’s t-distribution (or simply the t-distribution) is any member of a family of continuous probability distributions that arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown.
In the English-language literature it takes its name from William Sealy Gosset’s 1908 paper in Biometrika under the pseudonym “Student”. Gosset worked at the Guinness Brewery in Dublin, Ireland, and was interested in the problems of small samples, for example the chemical properties of barley where sample sizes might be as low as 3 (Wikipedia, 2024).
layout(matrix(c(2:6,1,1,7:8,1,1,9:13), 4, 4))
n = 56 # Sample size
df = n - 1 # Degrees of freedom
mu = 120
sigma = 15
IQ = seq(mu-45, mu+45, 1)
par(mar=c(4,2,2,0))
plot(IQ, dnorm(IQ, mean = mu, sd = sigma), type='l', col="red", main = "Population Distribution")
n.samples = 12
for(i in 1:n.samples) {
par(mar=c(2,2,2,0))
hist(rnorm(n, mu, sigma), main="Sample Distribution", cex.axis=.5, col="beige", cex.main = .75)
}
Let’s take a larger sample from our normal population.
[1] 107.23685 127.41511 114.49546 117.95965 114.28153 103.07176 111.25836
[8] 113.32382 119.02879 106.59907 126.02548 148.09688 100.41463 115.35363
[15] 110.83483 119.10494 145.80933 112.99751 124.73989 96.95211 133.51865
[22] 115.11211 112.44197 133.23333 127.44063 117.51603 110.32868 97.04325
[29] 107.35497 144.51702 112.78471 111.58268 128.19846 103.32059 116.82780
[36] 105.43609 117.68510 99.12848 113.08409 120.31098 84.90971 128.78364
[43] 141.29826 113.82271 88.68238 132.32723 116.03553 124.13257 107.46536
[50] 122.29764 131.74950 115.54641 92.64111 95.68900 107.86958 102.73153
let’s take more samples.
mean.x.values | se.x.values |
---|---|
119.7035 | 2.155462 |
123.3844 | 1.795654 |
118.4070 | 2.087566 |
119.0329 | 2.302962 |
119.3704 | 2.487849 |
121.0642 | 2.106344 |
of the mean
\[T_{n-1} = \frac{\bar{x}-\mu}{SE_x} = \frac{\bar{x}-\mu}{s_x / \sqrt{n}}\]
So the t-statistic represents the deviation of the sample mean \(\bar{x}\) from the population mean \(\mu\), considering the sample size, expressed as the degrees of freedom \(df = n - 1\)
\[T_{n-1} = \frac{\bar{x}-\mu}{SE_x} = \frac{\bar{x}-\mu}{s_x / \sqrt{n}}\]
\[T_{n-1} = \frac{\bar{x}-\mu}{SE_x} = \frac{\bar{x}-\mu}{s_x / \sqrt{n}}\]
mean.x.values mu se.x.values t.values
[995,] 123.4837 120 2.404987 1.44853662
[996,] 117.4054 120 1.873479 -1.38491533
[997,] 118.9022 120 1.772061 -0.61949536
[998,] 119.8276 120 2.200470 -0.07835800
[999,] 120.0845 120 1.629206 0.05185194
[1000,] 120.7386 120 1.986962 0.37170451
So if the population is normaly distributed (assumption of normality) the t-distribution represents the deviation of sample means from the population mean (\(\mu\)), given a certain sample size (\(df = n - 1\)).
The t-distibution therefore is different for different sample sizes and converges to a standard normal distribution if sample size is large enough.
The t-distribution is defined by the probability density function (PDF):
\[\textstyle\frac{\Gamma \left(\frac{\nu+1}{2} \right)} {\sqrt{\nu\pi}\,\Gamma \left(\frac{\nu}{2} \right)} \left(1+\frac{x^2}{\nu} \right)^{-\frac{\nu+1}{2}}\!\]
where \(\nu\) is the number of degrees of freedom and \(\Gamma\) is the gamma function (Wikipedia, 2024).
Warning
Formula not exam material
Two sided
One sided
SMCR / SMCO