Klinkenberg
University of Amsterdam
10 nov 2022
\(\LARGE{\text{outcome} = \text{model} + \text{error}}\)
In statistics, linear regression is a linear approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables denoted X.
\(\LARGE{Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \dotso + \beta_n X_{ni} + \epsilon_i}\)
In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters \(\beta\)’s are estimated from the data.
Source: wikipedia
A selection from Field (8.3.2.1. Assumptions of the linear model):
For simple regression
For multiple regressin
To adhere to the multicollinearity assumptien, there must not be a too high linear relation between the predictor variables.
This can be assessed through:
For the linearity assumption to hold, the predictors must have a linear relation to the outcome variable.
This can be checked through:
Perdict study outcome based on IQ and motivation.
Studieprestatie Motivatie IQ
1 2.710330 3.276778 129.9922
2 2.922617 2.598901 128.4936
3 1.997056 3.207279 130.2709
4 2.322539 2.104968 125.7457
5 2.162133 3.264948 128.6770
6 2.278899 2.217771 127.5349
Perdict study outcome based on IQ and motivation.
(Intercept) IQ motivation
-30.2822189 0.2690984 -0.6314253
De beta coëfficients are:
\(\widehat{\text{studie prestatie}} = b_0 + b_1 \text{IQ} + b_2 \text{motivation}\)
\(\text{model} = \widehat{\text{studie prestatie}}\)
\(\widehat{\text{studie prestatie}} = b_0 + b_1 \text{IQ} + b_2 \text{motivation}\) \(\widehat{\text{model}} = b_0 + b_1 \text{IQ} + b_2 \text{motivation}\)
model b.0 b.1 IQ b.2 motivation
[1,] 2.753512 -30.28 0.27 129.9922 -0.63 3.276778
[2,] 2.775969 -30.28 0.27 128.4936 -0.63 2.598901
[3,] 2.872561 -30.28 0.27 130.2709 -0.63 3.207279
[4,] 2.345205 -30.28 0.27 125.7457 -0.63 2.104968
[5,] 2.405860 -30.28 0.27 128.6770 -0.63 3.264948
\(\widehat{\text{model}} = -30.28 + 0.27 \times \text{IQ} + -0.63 \times \text{motivation}\)
Is that true?
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
The explained variance is the deviation of the estimated model outcome compared to the total mean.
To get a percentage of explained variance, it must be compared to the total variance. In terms of squares:
\(\frac{{SS}_{model}}{{SS}_{total}}\)
We also call this: \(r^2\) of \(R^2\).
Why?
Scientific & Statistical Reasoning