Two sample Mann-Whitney-Wilcoxon test

Enter the name for this tabbed section: Description
This non-parametric test is used to establish whether two samples have been extracted from populations with the similar values (measured by their ranks). The test uses the sum of the ranked values in both samples.
If one samples have larger values than the others, on average, the ranks will be smaller than in the other sample. Under the null hypothesis, on the other hand, the ranks will be similar in both samples.
It is a non-parametric version of the two sample t-test or Welsh-test that does not require Normality or equality of variance of the 2 populations.
Enter the name for this tabbed section: R Syntax
In R, the test can be performed using the function :

wilcox.test{stats}

wilcox.test(var1, var2, alternative = c(“two.sided, “less”, “greater”), exact = TRUE/FALSE)

wilcox.test(var1 ~ factor, alternative = c(“two.sided, “less”, “greater”), exact = TRUE/FALSE)
  • var1 is a vector containing the different numerical observations in the first sample
  • var2 is a vector containing the different numerical observations in the second sample. Var1 and var 2 do not need to be the same length.
  • factor is a vector containing a 2-level factor variable of the same length as var1.
  • alternative = one of “two.sided”, “less”, “greater”. The parameters defines to define the type of alternative hypothesis to be used: two.sided for bilateral test and “greater” or “less” for unilateral tests.
  • Greater means the the ranks associated with var1 are larger than those associated with var2.
  • If the option exact = TRUE is used, R will calculate the exact p-values as long as the number of observations < 50 and there are no ties in rank. If exact = FALSE, then an approximation using a normal distribution is used.
Enter the name for this tabbed section: Code Example
Mann-Whitney-Wilcoxon

This test is known either as the Mann-Whitney test or as the Wilcoxon test. In R it is implemented under the name wilcox.test ().

For the purpose of this example, we will use the mtcars database and more particularly two variables: mpg (miles per gallon: the mileage of the different cars) and the binary variable am (automatic transmission: 0 for automatic, 1 for manual). We would like to know if the mileages of cars with automatic transmission is similar to that with manual transmission.

wilcox.test(mtcars$mpg ~ mtcars$am, data = mtcars)
## Warning in wilcox.test.default(x = c(21.4, 18.7, 18.1, 14.3, 24.4, 22.8, :
## cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  mtcars$mpg by mtcars$am
## W = 42, p-value = 0.001871
## alternative hypothesis: true location shift is not equal to 0

Note the function form of the first argument y ~ x. This works here because the am variable has only two levels (0/1). We could have used the other form of the test using a x1, and x2 vectors, each representing a sample. The result indicates that we should reject the null hypothesis. The data do not come from the same population,