15 nov 2018

Inhoud

Correlation

Pearson Correlation

In statistics, the Pearson correlation coefficient, also referred to as the Pearson's r, Pearson product-moment correlation coefficient (PPMCC) or bivariate correlation, is a measure of the linear correlation between two variables X and Y. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. It is widely used in the sciences. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s.

Source: Wikipedia

PMCC

\[r_{xy} = \frac{{COV}_{xy}}{S_xS_y}\] Where \(S\) is sthe standard deviation and \(COV\) is the covariance.

\[{COV}_{xy} = \frac{\sum_{i=1}^N (x_i - \bar{x})(y_i - \bar{y})}{N-1}\]

Plot correlation

set.seed(565433)

x = rnorm(10, 5)
y = rnorm(10, 5)

plot(x, y, las = 1)

m.x = mean(x)
m.y = mean(y)

polygon(c(m.x,8,8,m.x),c(m.y,m.y,8,8), col = rgb(0,1,0,.5))
polygon(c(m.x,0,0,m.x),c(m.y,m.y,0,0), col = rgb(0,1,0,.5))

polygon(c(m.x,0,0,m.x),c(m.y,m.y,8,8), col = rgb(1,0,0,.5))
polygon(c(m.x,8,8,m.x),c(m.y,m.y,0,0), col = rgb(1,0,0,.5))

points(x,y)

abline(h = m.y, lwd = 3)
abline(v = m.x, lwd = 3)

segments(x, m.y, x, y, col = "orange",    lwd = 2)
segments(x, y, m.x, y, col = "darkgreen", lwd = 2)

text(m.x+.7, m.y+.7, "+ x +", cex = 2)
text(m.x-.7, m.y-.7, "- x -", cex = 2)
text(m.x+.7, m.y-.7, "+ x -", cex = 2)
text(m.x-.7, m.y+.7, "- x +", cex = 2)

\[(x_i - \bar{x})(y_i - \bar{y})\]

Guess the correlation

Simulate data

n     = 50
grade = rnorm(n, 6, 1.6)
b.0   = 100
b.1   = .3
error = rnorm(n, 0, 0.7)

IQ = b.0 + b.1 * grade + error
#IQ = group(IQ)

error = rnorm(n, 0, 0.7)
motivation = 3.2 + .2 * IQ + error

Explaining vairance

grade      = data$grade
IQ         = data$IQ
mean.grade = mean(grade)
mean.IQ    = mean(IQ)
N          = length(grade)

plot(data$grade, ylim=summary(c(data$grade, data$IQ))[c('Min.','Max.')], col='orange')
points(data$IQ, col='blue')