ANCOVA

Author

Klinkenberg

Published

October 12, 2022

ANCOVA

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether population means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV).

WIKIPEDIA

ANCOVA

Determine main effect while correcting for covariate

1 dependent variable
1 or more independent variables
1 or more covariates

A covariate is a variable can be a confounding variable biasing your results. By adding a covariate we reduce error/residual in the model.

Assumptions

Same as ANOVA
Independence of the covariate and treatment effect §12.3.1.
- No difference on ANOVA with covar and independent variable
- Matching experimental groups on the covariate
Homogeneity of regression slopes §12.3.2.
- Visual: scatterplot dep var * covar per condition
- Testing: interaction indep. var * covar

Data example

We want to test the difference in national extraversion but want to controll for openness to experience.

Dependent variable: Extraversion
Independent variabele: Nationality
- Dutch
- German
- Belgian
Covariate: Openness to experience

Simulate data

# Simulate data
n = 20
k = 3
group       = round(runif(n,1,k),0)
mu.covar    = 8
sigma.covar = 1
covar       = round(rnorm(n,mu.covar,sigma.covar),2)

# Create dummy variables
dummy.1 <- ifelse(group == 2, 1, 0)
dummy.2 <- ifelse(group == 3, 1, 0)

# Set parameters
b.0 = 15 # initial value for group 1
b.1 = 3  # difference between group 1 and 2
b.2 = 4  # difference between group 1 and 3
b.3 = 3  # Weight for covariate

# Create error
error = rnorm(n,0,1)

Define the model

$o u t c o m e = m o d e l + e r r o r$ $m o d e l = i n d v a r + c o v a r i a t e = n a t i o n a l i t y + o p e n n e s s$

Formal model

$b_{0} + b_{1} {d u m m y}_{1} + b_{2} {d u m m y}_{2} + b_{3} c o v a r$

# Define model
outcome = b.0 + b.1 * dummy.1 + b.2 * dummy.2 + b.3 * covar + error

Dummies

	group	b.0	b.1	dummy.1	b.2	dummy.2	b.3	covar	error	outcome
1	3	15	3	0	4	1	3	9	0.89	46.89
2	2	15	3	1	4	0	3	9.3	-1.45	44.45
3	1	15	3	0	4	0	3	8.42	-0.45	39.81
4	3	15	3	0	4	1	3	8.15	1.9	45.35
5	1	15	3	0	4	0	3	9.38	0.71	43.85
6	3	15	3	0	4	1	3	8.47	-0.02	44.39
7	2	15	3	1	4	0	3	7.26	-0.67	39.11
8	1	15	3	0	4	0	3	6.7	1.75	36.85
9	2	15	3	1	4	0	3	8.33	-0.46	42.53
10	2	15	3	1	4	0	3	8.04	0.74	42.86
11	2	15	3	1	4	0	3	7.7	0.63	41.73
12	1	15	3	0	4	0	3	8.07	0.26	39.47
13	1	15	3	0	4	0	3	7.61	-2.17	35.66
14	2	15	3	1	4	0	3	6.72	-0.15	38.01
15	2	15	3	1	4	0	3	8.68	0.27	44.31
16	3	15	3	0	4	1	3	7.65	-1.83	40.12
17	1	15	3	0	4	0	3	6.14	0.36	33.78
18	2	15	3	1	4	0	3	9.04	-0.79	44.33
19	3	15	3	0	4	1	3	7.78	-0.1	42.24
20	1	15	3	0	4	0	3	8.85	-0.28	41.27

The data

	n	group	covar	outcome
3	1	1	8.42	39.81
5	2	1	9.38	43.85
8	3	1	6.7	36.85
12	4	1	8.07	39.47
13	5	1	7.61	35.66
17	6	1	6.14	33.78
20	7	1	8.85	41.27
2	8	2	9.3	44.45
7	9	2	7.26	39.11
9	10	2	8.33	42.53
10	11	2	8.04	42.86
11	12	2	7.7	41.73
14	13	2	6.72	38.01
15	14	2	8.68	44.31
18	15	2	9.04	44.33
1	16	3	9	46.89
4	17	3	8.15	45.35
6	18	3	8.47	44.39
16	19	3	7.65	40.12
19	20	3	7.78	42.24

Group means

aggregate(outcome ~ group, data, mean)

  group  outcome
1     1 38.67000
2     2 42.16625
3     3 43.79800

Model fit no covar

What are the beta coëfficients based on the data without the covariate?

fit.group <- lm(outcome ~ factor(group), data); fit.group


Call:
lm(formula = outcome ~ factor(group), data = data)

Coefficients:
   (Intercept)  factor(group)2  factor(group)3  
        38.670           3.496           5.128

fit.group$coefficients[2:3] + fit.group$coefficients[1]

factor(group)2 factor(group)3 
      42.16625       43.79800

$D u t c h = 38.67 G e r m a n = 42.16625 B e l g i a n = 43.798$

Model fit only covar

What is the weight of only the covariate?

fit.covar <- lm(outcome ~ covar, data)
fit.covar


Call:
lm(formula = outcome ~ covar, data = data)

Coefficients:
(Intercept)        covar  
     15.667        3.185

Model fit with covar

fit <- lm(outcome ~ factor(group) + covar, data); fit


Call:
lm(formula = outcome ~ factor(group) + covar, data = data)

Coefficients:
   (Intercept)  factor(group)2  factor(group)3           covar  
        15.965           2.769           4.181           2.881

fit$coefficients[2:3] + fit$coefficients[1]

factor(group)2 factor(group)3 
      18.73401       20.14609

$D u t c h = 15.96 G e r m a n = 18.73 B e l g i a n = 20.14$

Total variance

What is the total variance

${M S}_{t o t a l} = s_{o u t c o m e}^{2} = \frac{{S S}_{o u t c o m e}}{{d f}_{o u t c o m e}}$

ms.t = var(data$outcome); ms.t

[1] 11.97756

ss.t = var(data$outcome) * (length(data$outcome) - 1); ss.t

[1] 227.5737

The data

	n	group	covar	outcome	grand.mean
3	1	1	8.42	39.81	41.3505
5	2	1	9.38	43.85	41.3505
8	3	1	6.7	36.85	41.3505
12	4	1	8.07	39.47	41.3505
13	5	1	7.61	35.66	41.3505
17	6	1	6.14	33.78	41.3505
20	7	1	8.85	41.27	41.3505
2	8	2	9.3	44.45	41.3505
7	9	2	7.26	39.11	41.3505
9	10	2	8.33	42.53	41.3505
10	11	2	8.04	42.86	41.3505
11	12	2	7.7	41.73	41.3505
14	13	2	6.72	38.01	41.3505
15	14	2	8.68	44.31	41.3505
18	15	2	9.04	44.33	41.3505
1	16	3	9	46.89	41.3505
4	17	3	8.15	45.35	41.3505
6	18	3	8.47	44.39	41.3505
16	19	3	7.65	40.12	41.3505
19	20	3	7.78	42.24	41.3505

Total variance visual

Model variance group

The model variance consists of two parts. One for the independent variable and one for the covariate. Lets first look at the independent variable.

	n	group	covar	outcome	grand.mean	model.group
3	1	1	8.42	39.81	41.3505	38.67
5	2	1	9.38	43.85	41.3505	38.67
8	3	1	6.7	36.85	41.3505	38.67
12	4	1	8.07	39.47	41.3505	38.67
13	5	1	7.61	35.66	41.3505	38.67
17	6	1	6.14	33.78	41.3505	38.67
20	7	1	8.85	41.27	41.3505	38.67
2	8	2	9.3	44.45	41.3505	42.17
7	9	2	7.26	39.11	41.3505	42.17
9	10	2	8.33	42.53	41.3505	42.17
10	11	2	8.04	42.86	41.3505	42.17
11	12	2	7.7	41.73	41.3505	42.17
14	13	2	6.72	38.01	41.3505	42.17
15	14	2	8.68	44.31	41.3505	42.17
18	15	2	9.04	44.33	41.3505	42.17
1	16	3	9	46.89	41.3505	43.8
4	17	3	8.15	45.35	41.3505	43.8
6	18	3	8.47	44.39	41.3505	43.8
16	19	3	7.65	40.12	41.3505	43.8
19	20	3	7.78	42.24	41.3505	43.8

Model variance group visual

Model variance covariate visual

Model variance group and covariate

	n	group	covar	outcome	grand.mean	model.group	model.covar	model
3	1	1	8.42	39.81	41.3505	38.67	42.48	40.22
5	2	1	9.38	43.85	41.3505	38.67	45.54	42.99
8	3	1	6.7	36.85	41.3505	38.67	37	35.27
12	4	1	8.07	39.47	41.3505	38.67	41.37	39.21
13	5	1	7.61	35.66	41.3505	38.67	39.9	37.89
17	6	1	6.14	33.78	41.3505	38.67	35.22	33.65
20	7	1	8.85	41.27	41.3505	38.67	43.85	41.46
2	8	2	9.3	44.45	41.3505	42.17	45.29	45.53
7	9	2	7.26	39.11	41.3505	42.17	38.79	39.65
9	10	2	8.33	42.53	41.3505	42.17	42.2	42.73
10	11	2	8.04	42.86	41.3505	42.17	41.27	41.9
11	12	2	7.7	41.73	41.3505	42.17	40.19	40.92
14	13	2	6.72	38.01	41.3505	42.17	37.07	38.09
15	14	2	8.68	44.31	41.3505	42.17	43.31	43.74
18	15	2	9.04	44.33	41.3505	42.17	44.46	44.78
1	16	3	9	46.89	41.3505	43.8	44.33	46.07
4	17	3	8.15	45.35	41.3505	43.8	41.62	43.63
6	18	3	8.47	44.39	41.3505	43.8	42.64	44.55
16	19	3	7.65	40.12	41.3505	43.8	40.03	42.18
19	20	3	7.78	42.24	41.3505	43.8	40.44	42.56

Model variance group and covariate visual

Error variance with covariate

Sums of squares

SS.model = with(data, sum((model - grand.mean)^2))
SS.error = with(data, sum((outcome - model)^2))

# Sums of squares for individual effects
SS.model.group = with(data, sum((model.group - grand.mean)^2))
SS.model.covar = with(data, sum((model.covar - grand.mean)^2))

SS.covar = SS.model - SS.model.group; SS.covar ## SS.covar corrected for group

[1] 121.8463

SS.group = SS.model - SS.model.covar; SS.group ## SS.group corrected for covar

[1] 54.65778

F-ratio

$F = \frac{{M S}_{m o d e l}}{{M S}_{e r r o r}} = \frac{S I G N A L}{N O I S E}$

n = 20
k = 3
df.model = k - 1
df.error = n - k - 1

MS.model = SS.group / df.model
MS.error = SS.error / df.error
  
F = MS.model / MS.error
F

[1] 21.74406

$P$ -value

library("visualize")
visualize.f(F, df.model, df.error, section = "upper")

Alpha & Power

Adjusted means

# Add dummy variables
data$dummy.1 <- ifelse(data$group == 2, 1, 0)
data$dummy.2 <- ifelse(data$group == 3, 1, 0)

# b coefficients
b.cov = fit$coefficients["covar"];          b.int = fit$coefficients["(Intercept)"]
b.2   = fit$coefficients["factor(group)2"]; b.3   = fit$coefficients["factor(group)3"]

# Adjustment factor for the means of the independent variable
data$mean.adj <- with(data, b.int + b.cov * mean(covar) + b.2 * dummy.1 + b.3 * dummy.2)

aggregate(mean.adj ~ group, data, mean)

  group mean.adj
1     1 39.18576
2     2 41.95576
3     3 43.36576

Real $β$ ’s

b.0 = 15 # initial value for group 1
b.1 = 3  # difference between group 1 and 2
b.2 = 4  # difference between group 1 and 3
b.3 = 3  # Weight for covariate

cbind(m.covar = mu.covar*3, 
      BETA    = c(b.0, b.0+b.1, b.0+b.2), 
      sum     = mu.covar*3 + c(b.0, b.0+b.1, b.0+b.2))

     m.covar BETA sum
[1,]      24   15  39
[2,]      24   18  42
[3,]      24   19  43

End

Contact

ANCOVA

ANCOVA

ANCOVA

Assumptions

Data example

Simulate data

Define the model

Dummies

The data

Group means

Model fit no covar

Model fit only covar

Model fit with covar

Total variance

The data

Total variance visual

Model variance group

Model variance group visual

Model variance covariate visual

Model variance group and covariate

Model variance group and covariate visual

Error variance with covariate

Sums of squares

F-ratio

P-value

Alpha & Power

Adjusted means

Real β’s

End

Contact

$P$ -value

Real $β$ ’s