15 nov 2018

## Pearson Correlation

In statistics, the Pearson correlation coefficient, also referred to as the Pearson's r, Pearson product-moment correlation coefficient (PPMCC) or bivariate correlation, is a measure of the linear correlation between two variables X and Y. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. It is widely used in the sciences. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s.

Source: Wikipedia

## PMCC

$r_{xy} = \frac{{COV}_{xy}}{S_xS_y}$ Where $$S$$ is sthe standard deviation and $$COV$$ is the covariance.

${COV}_{xy} = \frac{\sum_{i=1}^N (x_i - \bar{x})(y_i - \bar{y})}{N-1}$

## Plot correlation

set.seed(565433)

x = rnorm(10, 5)
y = rnorm(10, 5)

plot(x, y, las = 1)

m.x = mean(x)
m.y = mean(y)

polygon(c(m.x,8,8,m.x),c(m.y,m.y,8,8), col = rgb(0,1,0,.5))
polygon(c(m.x,0,0,m.x),c(m.y,m.y,0,0), col = rgb(0,1,0,.5))

polygon(c(m.x,0,0,m.x),c(m.y,m.y,8,8), col = rgb(1,0,0,.5))
polygon(c(m.x,8,8,m.x),c(m.y,m.y,0,0), col = rgb(1,0,0,.5))

points(x,y)

abline(h = m.y, lwd = 3)
abline(v = m.x, lwd = 3)

segments(x, m.y, x, y, col = "orange",    lwd = 2)
segments(x, y, m.x, y, col = "darkgreen", lwd = 2)

text(m.x+.7, m.y+.7, "+ x +", cex = 2)
text(m.x-.7, m.y-.7, "- x -", cex = 2)
text(m.x+.7, m.y-.7, "+ x -", cex = 2)
text(m.x-.7, m.y+.7, "- x +", cex = 2)

$(x_i - \bar{x})(y_i - \bar{y})$

## Simulate data

n     = 50
grade = rnorm(n, 6, 1.6)
b.0   = 100
b.1   = .3
error = rnorm(n, 0, 0.7)

IQ = b.0 + b.1 * grade + error
#IQ = group(IQ)

error = rnorm(n, 0, 0.7)
motivation = 3.2 + .2 * IQ + error

## Explaining vairance

grade      = data$grade IQ = data$IQ
plot(data$grade, ylim=summary(c(data$grade, data$IQ))[c('Min.','Max.')], col='orange') points(data$IQ, col='blue')