Multiple regression

Klinkenberg

University of Amsterdam

10 nov 2022

Multiple regression

Multiple regression

\(\LARGE{\text{outcome} = \text{model} + \text{error}}\)

In statistics, linear regression is a linear approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables denoted X.

\(\LARGE{Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \dotso + \beta_n X_{ni} + \epsilon_i}\)

In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters \(\beta\)’s are estimated from the data.

Source: wikipedia

Outcome vs Model

Assumptions

A selection from Field (8.3.2.1. Assumptions of the linear model):

For simple regression

  • Sensitivity
  • Homoscedasticity

For multiple regressin

  • Multicollinearity
  • Linearity

Multicollinearity

To adhere to the multicollinearity assumptien, there must not be a too high linear relation between the predictor variables.

This can be assessed through:

  • Correlations
  • Matrix scatterplot
  • VIF: max < 10, mean < 1
  • Tolerance > 0.2

Linearity

For the linearity assumption to hold, the predictors must have a linear relation to the outcome variable.

This can be checked through:

  • Correlations
  • Matrix scatterplot with predictors and outcome variable

Example

Perdict study outcome based on IQ and motivation.

Read data

  Studieprestatie Motivatie       IQ
1        2.710330  3.276778 129.9922
2        2.922617  2.598901 128.4936
3        1.997056  3.207279 130.2709
4        2.322539  2.104968 125.7457
5        2.162133  3.264948 128.6770
6        2.278899  2.217771 127.5349

Regression model in R

Perdict study outcome based on IQ and motivation.

fit <- lm(study.outcome ~ IQ + motivation)

What is the model

fit$coefficients
(Intercept)          IQ  motivation 
-30.2822189   0.2690984  -0.6314253 
b.0 = round(fit$coefficients[1], 2) ## Intercept
b.1 = round(fit$coefficients[2], 2) ## Beta coefficient for IQ
b.2 = round(fit$coefficients[3], 2) ## Beta coefficient for motivation

De beta coëfficients are:

  • \(b_0\) (intercept) = -30.28
  • \(b_1\) = 0.27
  • \(b_2\) = -0.63.

Visual

What are the expected values based on this model

\(\widehat{\text{studie prestatie}} = b_0 + b_1 \text{IQ} + b_2 \text{motivation}\)

exp.stu.prest = b.0 + b.1 * IQ + b.2 * motivation

model = exp.stu.prest

\(\text{model} = \widehat{\text{studie prestatie}}\)

Apply regression model

\(\widehat{\text{studie prestatie}} = b_0 + b_1 \text{IQ} + b_2 \text{motivation}\) \(\widehat{\text{model}} = b_0 + b_1 \text{IQ} + b_2 \text{motivation}\)

cbind(model, b.0, b.1, IQ, b.2, motivation)[1:5,]
        model    b.0  b.1       IQ   b.2 motivation
[1,] 2.753512 -30.28 0.27 129.9922 -0.63   3.276778
[2,] 2.775969 -30.28 0.27 128.4936 -0.63   2.598901
[3,] 2.872561 -30.28 0.27 130.2709 -0.63   3.207279
[4,] 2.345205 -30.28 0.27 125.7457 -0.63   2.104968
[5,] 2.405860 -30.28 0.27 128.6770 -0.63   3.264948

\(\widehat{\text{model}} = -30.28 + 0.27 \times \text{IQ} + -0.63 \times \text{motivation}\)

How far are we off?

error = study.outcome - model

cbind(model, study.outcome, error)[1:5,]
        model study.outcome       error
[1,] 2.753512      2.710330 -0.04318159
[2,] 2.775969      2.922617  0.14664823
[3,] 2.872561      1.997056 -0.87550534
[4,] 2.345205      2.322539 -0.02266610
[5,] 2.405860      2.162133 -0.24372667

Outcome = Model + Error

Is that true?

study.outcome == model + error
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
  • Yes!

Visual

Explained variance

The explained variance is the deviation of the estimated model outcome compared to the total mean.

To get a percentage of explained variance, it must be compared to the total variance. In terms of squares:

\(\frac{{SS}_{model}}{{SS}_{total}}\)

We also call this: \(r^2\) of \(R^2\).

Why?

r = cor(study.outcome, model)
r^2
[1] 0.7527463

End

Contact

CC BY-NC-SA 4.0