18 oct 2018

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether population means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV).

Determine main effect while correcting for covariate

- 1 dependent variable
- 1 or more independent variables
- 1 or more covariates

A covariate is a variable can be a confounding variable biasing your results. By adding a covariate we reduce error/residual in the model.

- Same as ANOVA
- Independence of the covariate and treatment effect §12.3.1.
- No difference on ANOVA with covar and independent variable
- Matching experimental groups on the covariate

- Homogeneity of regression slopes §12.3.2.
- Visual: scatterplot dep var * covar per condition
- Testing: interaction indep. var * covar

We want to test the difference in national extraversion but want to controll for openness to experience.

- Dependent variable: Extraversion
- Independent variabele: Nationality
- Dutch
- German
- Belgian

- Covariate: Openness to experience

# Simulate data n = 20 k = 3 group = round(runif(n,1,k),0) mu.covar = 8 sigma.covar = 1 covar = round(rnorm(n,mu.covar,sigma.covar),2) # Create dummy variables dummy.1 <- ifelse(group == 2, 1, 0) dummy.2 <- ifelse(group == 3, 1, 0) # Set parameters b.0 = 15 # initial value for group 1 b.1 = 3 # difference between group 1 and 2 b.2 = 4 # difference between group 1 and 3 b.3 = 3 # Weight for covariate # Create error error = rnorm(n,0,1)

\[{outcome} = {model} + {error}\] \[{model} = {indvar} + {covariate} = {nationality} + {openness}\]

Formal model

\[b_0 + b_1 {dummy}_1 + b_2 {dummy}_2 + b_3 covar\]

# Define model outcome = b.0 + b.1 * dummy.1 + b.2 * dummy.2 + b.3 * covar + error

aggregate(outcome ~ group, data, mean)

## group outcome ## 1 1 38.67000 ## 2 2 42.16625 ## 3 3 43.79800

What are the beta coëfficients based on the data without the covariate?

fit.group <- lm(outcome ~ factor(group), data); fit.group

## ## Call: ## lm(formula = outcome ~ factor(group), data = data) ## ## Coefficients: ## (Intercept) factor(group)2 factor(group)3 ## 38.670 3.496 5.128

fit.group$coefficients[2:3] + fit.group$coefficients[1]

## factor(group)2 factor(group)3 ## 42.16625 43.79800

\[{Dutch} = 38.67 \> {German} = 42.16625 \> {Belgian} = 43.798\]

What is the weight of only the covariate?

fit.covar <- lm(outcome ~ covar, data) fit.covar

## ## Call: ## lm(formula = outcome ~ covar, data = data) ## ## Coefficients: ## (Intercept) covar ## 15.667 3.185

fit <- lm(outcome ~ factor(group) + covar, data); fit

## ## Call: ## lm(formula = outcome ~ factor(group) + covar, data = data) ## ## Coefficients: ## (Intercept) factor(group)2 factor(group)3 covar ## 15.965 2.769 4.181 2.881

fit$coefficients[2:3] + fit$coefficients[1]

## factor(group)2 factor(group)3 ## 18.73401 20.14609

\[{Dutch} = 15.96 \> {German} = 18.73 \> {Belgian} = 20.14\]

What is the total variance

\[{MS}_{total} = s^2_{outcome} = \frac{{SS}_{outcome}}{{df}_{outcome}}\]

ms.t = var(data$outcome); ms.t

## [1] 11.97756

ss.t = var(data$outcome) * (length(data$outcome) - 1); ss.t

## [1] 227.5737

plot <- ggplot(data, aes(x=n, y=outcome)) + geom_point(shape=1) + geom_hline(yintercept=mean(outcome)) + geom_segment(aes(x = n, y = grand.mean, xend = n, yend = outcome)) + ggtitle("Total variance") plot