21 dec 2018

Inhoud

Nonparametric tests

Parametric vs Nonparametric

Attribute Parametric Nonparametric
distribution normaly distributed any distribution
sampling random sample random sample
sensitivity to outliers yes no
works with large data sets small and large data sets
speed fast slow

Ranking

x = c(1, 4, 6, 7, 8, 9)
y = c(1, 4, 6, 7, 8, 39)

layout(matrix(1:2, 1, 2))
boxplot(x, horizontal=T, col='red')
boxplot(y, horizontal=T, col='red')

rbind(rx = rank(x), ry = rank(y))
##    [,1] [,2] [,3] [,4] [,5] [,6]
## rx    1    2    3    4    5    6
## ry    1    2    3    4    5    6

Ties

x = c(1, 4, 6, 7, 8, 8, 4, 7, 9)

rbind(x, ordered = sort(x), non.tied.rank = 1:length(x), ranked = rank(sort(x)))
##               [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## x                1  4.0  6.0    7  8.0  8.0  4.0  7.0    9
## ordered          1  4.0  4.0    6  7.0  7.0  8.0  8.0    9
## non.tied.rank    1  2.0  3.0    4  5.0  6.0  7.0  8.0    9
## ranked           1  2.5  2.5    4  5.5  5.5  7.5  7.5    9

\[\frac{2 + 3}{2} = 2.5, \frac{5 + 6}{2} = 5.5, \frac{7 + 8}{2} = 7.5\]

Procedure

  1. Assumption: independent random samples.
  2. Hypothesis:
    \(H_0\) : equal population distributions (implies equal mean ranking)
    \(H_A\) : unequal mean ranking (two sided)
    \(H_A\) : higher mean ranking for one group.
  3. Test statistic is difference between mean or sum of ranking.
  4. Standardise test statistic
  5. Calculate P-value one or two sided.
  6. Conclude to reject \(H_0\) if \(p < \alpha\).

Wilcoxon rank-sum test

Independent 2 samples

Wilcoxon rank-sum test

Developed by Frank Wilcoxon the rank-sum test is a nonparametric alternative to the independent samples t-test.

By first ranking \(x\) and then sum these ranks per group one would expect, under the null hypothesis, equal values for both groups.

After standardising this difference one can test using a standard normal distribution.

Simulate data

n      = 20
factor = rep(c("Ecstasy","Alcohol"),each=n/2)
dummy  = ifelse(factor == "Ecstacy", 0, 1)
b.0    = 23
b.1    = 5
error  = rnorm(n, 0, 1.7)
depres = b.0 + b.1*dummy + error
depres = round(depres)

data <- data.frame(factor, depres)

## add the ranks
data$R <- rank(data$depres)

Example

Calculate the sum of ranks per group

R <- aggregate(R ~ factor, data, sum)
R
##    factor   R
## 1 Alcohol 102
## 2 Ecstasy 108

So W is the lowest

\[W=min\left(\sum{R_1},\sum{R_2}\right)\]

W <- min(R$R)
W
## [1] 102

Standardise W

To calculate the Z score we need to standardise the W. To do so we need the mean W and the standard error of W.

For this we need the sample sizes for each group.

n <- aggregate(R ~ factor, data, length)

n.1 = n$R[1]
n.2 = n$R[2]

cbind(n.1, n.2)
##      n.1 n.2
## [1,]  10  10

Mean W

\[\bar{W}_s=\frac{n_1(n_1+n_2+1)}{2}\]

W.mean = (n.1*(n.1+n.2+1))/2
W.mean
## [1] 105

SE W

\[{SE}_{\bar{W}_s}=\sqrt{ \frac{n_1 n_2 (n_1+n_2+1)}{12} }\]

W.se = sqrt((n.1*n.2*(n.1+n.2+1))/12)
W.se
## [1] 13.22876

Calculate Z

\[z = \frac{W - \bar{W}}{{SE}_W}\]

Which looks a lot like

\[\frac{X - \bar{X}}{{SE}_X} \text{or} \frac{b - \mu_{b}}{{SE}_b} \]

z = (W - W.mean) / W.se
z
## [1] -0.2267787

Test for significance 1 sided

if(!"visualize" %in% installed.packages()){ install.packages("visualize") }
library("visualize")

visualize.norm(z, section="lower")