Wilcoxon signed-rank test – Wikipedia

before-content-x4

Statistical hypothesis test

after-content-x4

The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used either to test the location of a population based on a sample of data, or to compare the locations of two populations using two matched samples.[1] The one-sample version serves a purpose similar to that of the one-sample Student’s t-test.[2] For two matched samples, it is a paired difference test like the paired Student’s t-test (also known as the “t-test for matched pairs” or “t-test for dependent samples”). The Wilcoxon test can be a good alternative to the t-test when population means are not of interest; for example, when one wishes to test whether a population’s median is nonzero, or whether there is a better than 50% chance that a sample from one population is greater than a sample from another population.

History[edit]

The test is named for Frank Wilcoxon (1892–1965) who, in a single paper, proposed both it and the rank-sum test for two independent samples.[3] The test was popularized by Sidney Siegel (1956) in his influential textbook on non-parametric statistics.[4] Siegel used the symbol T for the test statistic, and consequently, the test is sometimes referred to as the Wilcoxon T-test.

Test procedure[edit]

There are two variants of the signed-rank test. From a theoretical point of view, the one-sample test is more fundamental because the paired sample test is performed by converting the data to the situation of the one-sample test. However, most practical applications of the signed-rank test arise from paired data.

For a paired sample test, the data consists of samples

(X1,Y1),,(Xn,Yn){displaystyle (X_{1},Y_{1}),dots ,(X_{n},Y_{n})}

. Each sample is a pair of measurements. In the simplest case, the measurements are on an interval scale. Then they may be converted to real numbers, and the paired sample test is converted to a one-sample test by replacing each pair of numbers

after-content-x4
(Xi,Yi){displaystyle (X_{i},Y_{i})}

by its difference

XiYi{displaystyle X_{i}-Y_{i}}

.[5] In general, it must be possible to rank the differences between the pairs. This requires that the data be on an ordered metric scale, a type of scale that carries more information than an ordinal scale but may have less than an interval scale.[6]

The data for a one-sample test is a set of real number samples

X1,,Xn{displaystyle X_{1},dots ,X_{n}}

. Assume for simplicity that the samples have distinct absolute values and that no sample equals zero. (Zeros and ties introduce several complications; see below.) The test is performed as follows:[7][8]

  1. Compute
  2. Sort
  3. Let
  4. Produce a

The ranks are defined so that

Ri{displaystyle R_{i}}

is the number of

j{displaystyle j}

for which

|Xj||Xi|{displaystyle |X_{j}|leq |X_{i}|}

. Additionally, if

σ:{1,,n}{1,,n}{displaystyle sigma colon {1,dots ,n}to {1,dots ,n}}

is such that

|Xσ(1)|<<|Xσ(n)|{displaystyle |X_{sigma (1)}|

, then

Rσ(i)=i{displaystyle R_{sigma (i)}=i}

for all

i{displaystyle i}

.

The signed-rank sum

T{displaystyle T}

is closely related to two other test statistics. The positive-rank sum

T+{displaystyle T^{+}}

and the negative-rank sum

T{displaystyle T^{-}}

are defined by[9]

Because

T++T{displaystyle T^{+}+T^{-}}

equals the sum of all the ranks, which is

1+2++n=n(n+1)/2{displaystyle 1+2+dots +n=n(n+1)/2}

, these three statistics are related by:[10]

Because

T{displaystyle T}

,

T+{displaystyle T^{+}}

, and

T{displaystyle T^{-}}

carry the same information, any of them may be used as the test statistic.

The positive-rank sum and negative-rank sum have alternative interpretations that are useful for the theory behind the test. Define the Walsh average

Wij{displaystyle W_{ij}}

to be

12(Xi+Xj){displaystyle {tfrac {1}{2}}(X_{i}+X_{j})}

. Then:[11]

Null and alternative hypotheses[edit]

One-sample test[edit]

The one-sample Wilcoxon signed-rank test can be used to test whether data comes from a symmetric population with a specified median.[12] If the population median is known, then it can be used to test whether data is symmetric about its center.[13]

To explain the null and alternative hypotheses formally, assume that the data consists of independent and identically distributed samples from a distribution

F{displaystyle F}

. If

X1{displaystyle X_{1}}

and

X2{displaystyle X_{2}}

are IID

F{displaystyle F}

-distributed random variables, define

F(2){displaystyle F^{(2)}}

to be the cumulative distribution function of

12(X1+X2){displaystyle {tfrac {1}{2}}(X_{1}+X_{2})}

. Set

F{displaystyle F}

is continuous. The one-sample Wilcoxon signed-rank sum test is a test for the following null hypothesis against one of the following alternative hypotheses:[14]

Null hypothesis H0
One-sided alternative hypothesis H1
Two-sided alternative hypothesis H3

The alternative hypothesis being tested depends on whether the test statistic is used to compute a one-sided or two-sided p-value (and if one-sided, which side). If

μ{displaystyle mu }

is a fixed, predetermined quantity, then the test can also be used as a test for the value of

Pr(12(X1+X2)>μ){displaystyle Pr({tfrac {1}{2}}(X_{1}+X_{2})>mu )}

μ{displaystyle mu }

from every data point.

The above null and alternative hypotheses are derived from the fact that

2T+/n2{displaystyle 2T^{+}/n^{2}}

is a consistent estimator of

p2{displaystyle p_{2}}

.[15] It can also be derived from the description of

T+{displaystyle T^{+}}

and

T{displaystyle T^{-}}

in terms of Walsh averages, since that description shows that the Wilcoxon test is the same as the sign test applied to the set of Walsh averages.[16]

Restricting the distributions of interest can lead to more interpretable null and alternative hypotheses. One mildly restrictive assumption is that

F(2){displaystyle F^{(2)}}

has a unique median. This median is called the pseudomedian of

F{displaystyle F}

; in general it is different from the mean and the median, even when all three exist. If the existence of a unique pseudomedian can be assumed true under both the null and alternative hypotheses, then these hypotheses can be restated as:

Null hypothesis H0
The pseudomedian of
One-sided alternative hypothesis H1
The pseudomedian of
One-sided alternative hypothesis H2
The pseudomedian of

Most often, the null and alternative hypotheses are stated under the assumption of symmetry. Fix a real number

μ{displaystyle mu }

. Define

F{displaystyle F}

to be symmetric about

μ{displaystyle mu }

if a random variable

X{displaystyle X}

with distribution

F{displaystyle F}

satisfies

Pr(Xμx)=Pr(Xμ+x){displaystyle Pr(Xleq mu -x)=Pr(Xgeq mu +x)}

for all

x{displaystyle x}

. If

F{displaystyle F}

has a density function

f{displaystyle f}

, then

F{displaystyle F}

is symmetric about

μ{displaystyle mu }

if and only if

f(μ+x)=f(μx){displaystyle f(mu +x)=f(mu -x)}

for every

x{displaystyle x}

.[17]

If the null and alternative distributions of

F{displaystyle F}

can be assumed symmetric, then the null and alternative hypotheses simplify to the following:[18]

Null hypothesis H0
One-sided alternative hypothesis H1
One-sided alternative hypothesis H2

If in addition

Pr(X=μ)=0{displaystyle Pr(X=mu )=0}

, then

μ{displaystyle mu }

is a median of

F{displaystyle F}

. If this median is unique, then the Wilcoxon signed-rank sum test becomes a test for the location of the median.[19] When the mean of

F{displaystyle F}

is defined, then the mean is

μ{displaystyle mu }

, and the test is also a test for the location of the mean.[20]

The restriction that the alternative distribution is symmetric is highly restrictive, but for one-sided tests it can be weakened. Say that

F{displaystyle F}

is stochastically smaller than a distribution symmetric about zero if an

F{displaystyle F}

-distributed random variable

X{displaystyle X}

satisfies

Pr(X<x)Pr(X>x){displaystyle Pr(X<-x)geq Pr(X>x)}

x0{displaystyle xgeq 0}

. Similarly,

F{displaystyle F}

is stochastically larger than a distribution symmetric about zero if

Pr(X<x)Pr(X>x){displaystyle Pr(X<-x)leq Pr(X>x)}

x0{displaystyle xgeq 0}

. Then the Wilcoxon signed-rank sum test can also be used for the following null and alternative hypotheses:[21][22]

Null hypothesis H0
One-sided alternative hypothesis H1
One-sided alternative hypothesis H2

The hypothesis that the data are IID can be weakened. Each data point may be taken from a different distribution, as long as all the distributions are assumed to be continuous and symmetric about a common point

μ0{displaystyle mu _{0}}

. The data points are not required to be independent as long as the conditional distribution of each observation given the others is symmetric about

μ0{displaystyle mu _{0}}

.[23]

Paired data test[edit]

Because the paired data test arises from taking paired differences, its null and alternative hypotheses can be derived from those of the one-sample test. In each case, they become assertions about the behavior of the differences

XiYi{displaystyle X_{i}-Y_{i}}

.

Let

F(x,y){displaystyle F(x,y)}

be the joint cumulative distribution of the pairs

(Xi,Yi){displaystyle (X_{i},Y_{i})}

. If

F{displaystyle F}

is continuous, then the most general null and alternative hypotheses are expressed in terms of

p2=12{displaystyle p_{2}={tfrac {1}{2}}}

One-sided alternative hypothesis H1
Two-sided alternative hypothesis H3

Like the one-sample case, under some restrictions the test can be interpreted as a test for whether the pseudomedian of the differences is located at zero.

A common restriction is to symmetric distributions of differences. In this case, the null and alternative hypotheses are:[24][25]

Null hypothesis H0
The observations
One-sided alternative hypothesis H1
The observations
One-sided alternative hypothesis H2
The observations

These can also be expressed more directly in terms of the original pairs:[26]

Null hypothesis H0
The observations
One-sided alternative hypothesis H1
For some
One-sided alternative hypothesis H2
For some
Two-sided alternative hypothesis H3
For some

The null hypothesis of exchangeability can arise from a matched pair experiment with a treatment group and a control group. Randomizing the treatment and control within each pair makes the observations exchangeable. For an exchangeable distribution,

XiYi{displaystyle X_{i}-Y_{i}}

has the same distribution as

YiXi{displaystyle Y_{i}-X_{i}}

, and therefore, under the null hypothesis, the distribution is symmetric about zero.[27]

Because the one-sample test can be used as a one-sided test for stochastic dominance, the paired difference Wilcoxon test can be used to compare the following hypotheses:[28]

Null hypothesis H0
The observations
One-sided alternative hypothesis H1
The differences

The formula is true because every subset of

{1,,n}{displaystyle {1,dots ,n}}

which sums to

t+{displaystyle t^{+}}

either does not contain

n{displaystyle n}

, in which case it is also a subset of

{1,,n1}{displaystyle {1,dots ,n-1}}

, or it does contain

n{displaystyle n}

, in which case removing

n{displaystyle n}

from the subset produces a subset of

{1,,n1}{displaystyle {1,dots ,n-1}}

which sums to

t+n{displaystyle t^{+}-n}

. Under the null hypothesis, the probability mass function of

T+{displaystyle T^{+}}

satisfies

Pr(T+=t+)=un(t+)/2n{displaystyle Pr(T^{+}=t^{+})=u_{n}(t^{+})/2^{n}}

. The function

un{displaystyle u_{n}}

is closely related to the integer partition function.[54]

If

pn(t+){displaystyle p_{n}(t^{+})}

is the probability that

T+=t+{displaystyle T^{+}=t^{+}}

under the null hypothesis when there are

n{displaystyle n}

samples, then

pn(t+){displaystyle p_{n}(t^{+})}

satisfies a similar recursion:[55]

with similar boundary conditions. There is also a recursive formula for the cumulative distribution function

Pr(T+t+){displaystyle Pr(T^{+}leq t^{+})}

.[56]

For very large

n{displaystyle n}

, even the above recursion is too slow. In this case, the null distribution can be approximated. The null distributions of

T{displaystyle T}

,

T+{displaystyle T^{+}}

, and

T{displaystyle T^{-}}

are asymptotically normal with means and variances:[57]

Better approximations can be produced using Edgeworth expansions. Using a fourth-order Edgeworth expansion shows that:[58][59]

where

The technical underpinnings of these expansions are rather involved, because conventional Edgeworth expansions apply to sums of IID continuous random variables, while

T+{displaystyle T^{+}}

is a sum of non-identically distributed discrete random variables. The final result, however, is that the above expansion has an error of

O(n3/2){displaystyle O(n^{-3/2})}

, just like a conventional fourth-order Edgeworth expansion.[58]

The moment generating function of

T{displaystyle T}

has the exact formula:[60]

When zeros are present and the signed-rank zero procedure is used, or when ties are present and the average rank procedure is used, the null distribution of

T{displaystyle T}

changes. Cureton derived a normal approximation for this situation.[61][62] Suppose that the original number of observations was

n{displaystyle n}

and the number of zeros was

z{displaystyle z}

. The tie correction is

where the sum is over all the sizes

t{displaystyle t}

of each group of tied observations. The expectation of

T{displaystyle T}

is still zero, while the expectation of

T+{displaystyle T^{+}}

is

If

then

Alternative statistics[edit]

Wilcoxon[63] originally defined the Wilcoxon rank-sum statistic to be

min(T+,T){displaystyle min(T^{+},T^{-})}

. Early authors such as Siegel[64] followed Wilcoxon. This is appropriate for two-sided hypothesis tests, but it cannot be used for one-sided tests.

Instead of assigning ranks between 1 and n, it is also possible to assign ranks between 0 and

n1{displaystyle n-1}

. These are called modified ranks.[65] The modified signed-rank sum

T0{displaystyle T_{0}}

, the modified positive-rank sum

T0+{displaystyle T_{0}^{+}}

, and the modified negative-rank sum

T0{displaystyle T_{0}^{-}}

are defined analogously to

T{displaystyle T}

,

T+{displaystyle T^{+}}

, and

T{displaystyle T^{-}}

but with the modified ranks in place of the ordinary ranks. The probability that the sum of two independent

F{displaystyle F}

-distributed random variables is positive can be estimated as

2T0+/(n(n1)){displaystyle 2T_{0}^{+}/(n(n-1))}

.[66] When consideration is restricted to continuous distributions, this is a minimum variance unbiased estimator of

p2{displaystyle p_{2}}

.[67]

Example[edit]

order by absolute difference

sgn{displaystyle operatorname {sgn} }

is the sign function,

abs{displaystyle {text{abs}}}

is the absolute value, and

Ri{displaystyle R_{i}}

is the rank. Notice that pairs 3 and 9 are tied in absolute value. They would be ranked 1 and 2, so each gets the average of those ranks, 1.5.

The

Effect size[edit]

To compute an effect size for the signed-rank test, one can use the rank-biserial correlation.

If the test statistic T is reported, the rank correlation r is equal to the test statistic T divided by the total rank sum S, or r = T/S.
[68] Using the above example, the test statistic is T = 9. The sample size of 9 has a total rank sum of S = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9) = 45. Hence, the rank correlation is 9/45, so r = 0.20.

If the test statistic T is reported, an equivalent way to compute the rank correlation is with the difference in proportion between the two rank sums, which is the Kerby (2014) simple difference formula.[68] To continue with the current example, the sample size is 9, so the total rank sum is 45. T is the smaller of the two rank sums, so T is 3 + 4 + 5 + 6 = 18. From this information alone, the remaining rank sum can be computed, because it is the total sum S minus T, or in this case 45 − 18 = 27. Next, the two rank-sum proportions are 27/45 = 60% and 18/45 = 40%. Finally, the rank correlation is the difference between the two proportions (.60 minus .40), hence r = .20.

Software implementations[edit]

  • R includes an implementation of the test as wilcox.test(x,y, paired=TRUE), where x and y are vectors of equal length.[69]
  • ALGLIB includes implementation of the Wilcoxon signed-rank test in C++, C#, Delphi, Visual Basic, etc.
  • GNU Octave implements various one-tailed and two-tailed versions of the test in the wilcoxon_test function.
  • SciPy includes an implementation of the Wilcoxon signed-rank test in Python
  • Accord.NET includes an implementation of the Wilcoxon signed-rank test in C# for .NET applications
  • MATLAB implements this test using “Wilcoxon rank sum test” as [p,h] = signrank(x,y) also returns a logical value indicating the test decision. The result h = 1 indicates a rejection of the null hypothesis, and h = 0 indicates a failure to reject the null hypothesis at the 5% significance level
  • Julia HypothesisTests package includes the Wilcoxon signed-rank test as “value(SignedRankTest(x, y))”

See also[edit]

References[edit]

  1. ^ Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). John Wiley & Sons, Inc. ISBN 0-471-16068-7., p. 350
  2. ^ “Wilcoxon signed-rank test – Handbook of Biological Statistics”. www.biostathandbook.com. Retrieved 2021-09-02.
  3. ^ Wilcoxon, Frank (Dec 1945). “Individual comparisons by ranking methods” (PDF). Biometrics Bulletin. 1 (6): 80–83. doi:10.2307/3001968. hdl:10338.dmlcz/135688. JSTOR 3001968.
  4. ^ Siegel, Sidney (1956). Non-parametric statistics for the behavioral sciences. New York: McGraw-Hill. pp. 75–83. ISBN 9780070573482.
  5. ^ Conover, p. 352
  6. ^ Siegel, p. 76
  7. ^ Conover, p. 353
  8. ^ Pratt, John W.; Gibbons, Jean D. (1981). Concepts of Nonparametric Theory. Springer-Verlag. ISBN 978-1-4612-5933-6., p. 148
  9. ^ Pratt and Gibbons, p. 148
  10. ^ Pratt and Gibbons, p. 148
  11. ^ Pratt and Gibbons, p. 150
  12. ^ Conover, pp. 352–357
  13. ^ Hettmansperger, Thomas P. (1984). Statistical Inference Based on Ranks. John Wiley & Sons. ISBN 0-471-88474-X., pp. 32, 50
  14. ^ Pratt and Gibbons, p. 153
  15. ^ Pratt and Gibbons, pp. 153–154
  16. ^ Hettmansperger, pp. 38–39
  17. ^ Pratt and Gibbons, pp. 146–147
  18. ^ Pratt and Gibbons, pp. 146–147
  19. ^ Hettmansperger, pp. 30–31
  20. ^ Conover, p. 353
  21. ^ Pratt and Gibbons, pp. 155–156
  22. ^ Hettmansperger, pp. 49–50
  23. ^ Pratt and Gibbons, p. 155
  24. ^ Conover, p. 354
  25. ^ Hollander, Myles; Wolfe, Douglas A.; Chicken, Eric (2014). Nonparametric Statistical Methods (Third ed.). John Wiley & Sons, Inc. ISBN 978-0-470-38737-5., pp. 39–41
  26. ^ Pratt and Gibbons, p. 147
  27. ^ Pratt and Gibbons, p. 147
  28. ^ Hettmansperger, pp. 49–50
  29. ^ Wilcoxon, Frank (1949). Some Rapid Approximate Statistical Procedures. American Cynamic Co.
  30. ^ Pratt, J. (1959). “Remarks on zeros and ties in the Wilcoxon signed rank procedures”. Journal of the American Statistical Association. 54 (287): 655–667. doi:10.1080/01621459.1959.10501526.
  31. ^ Pratt, p. 659
  32. ^ Pratt, p. 663
  33. ^ Derrick, B; White, P (2017). “Comparing Two Samples from an Individual Likert Question”. International Journal of Mathematics and Statistics. 18 (3): 1–13.
  34. ^ Conover, William Jay (1973). “On Methods of Handling Ties in the Wilcoxon Signed-Rank Test”. Journal of the American Statistical Association. 68 (344): 985–988. doi:10.1080/01621459.1973.10481460.
  35. ^ Pratt and Gibbons, p. 162
  36. ^ Conover, pp. 352–353
  37. ^ Pratt and Gibbons, p. 164
  38. ^ Conover, pp. 358–359
  39. ^ Pratt, p. 660
  40. ^ Pratt and Gibbons, pp. 168–169
  41. ^ Pratt, pp. 661–662
  42. ^ Pratt and Gibbons, p. 170
  43. ^ Pratt and Gibbons, pp. 163, 166
  44. ^ Pratt, p. 660
  45. ^ Pratt and Gibbons, p. 166
  46. ^ Pratt and Gibbons, p. 171
  47. ^ Pratt, p. 661
  48. ^ Pratt, p. 660
  49. ^ Gibbons, Jean D.; Chakraborti, Subhabrata (2011). Nonparametric Statistical Inference (Fifth ed.). Chapman & Hall/CRC. ISBN 978-1-4200-7762-9., p. 194
  50. ^ Hettmansperger, p. 34
  51. ^ Pratt and Gibbons, pp. 148–149
  52. ^ Pratt and Gibbons, pp. 148–149, pp. 186–187
  53. ^ Hettmansperger, p. 171
  54. ^ Pratt and Gibbons, p. 187
  55. ^ Pratt and Gibbons, p. 187
  56. ^ Pratt and Gibbons, p. 187
  57. ^ Pratt and Gibbons, p. 149
  58. ^ a b Kolassa, John E. (1995). “Edgeworth approximations for rank sum test statistics”. Statistics and Probability Letters. 24 (2): 169–171. doi:10.1016/0167-7152(95)00164-H.
  59. ^ Hettmansperger, p. 37
  60. ^ Hettmansperger, p. 35
  61. ^ Cureton, Edward E. (1967). “The normal approximation to the signed-rank sampling distribution when zero differences are present”. Journal of the American Statistical Association. 62 (319): 1068–1069. doi:10.1080/01621459.1967.10500917.
  62. ^ Pratt and Gibbons, p. 193
  63. ^ Wilcoxon, p. 82
  64. ^ Siegel, p. 76
  65. ^ Pratt and Gibbons, p. 158
  66. ^ Pratt and Gibbons, p. 159
  67. ^ Pratt and Gibbons, p. 191
  68. ^ a b Kerby, Dave S. (2014), “The simple difference formula: An approach to teaching nonparametric correlation.”, Comprehensive Psychology, 3: 11.IT.3.1, doi:10.2466/11.IT.3.1
  69. ^ Dalgaard, Peter (2008). Introductory Statistics with R. Springer Science & Business Media. pp. 99–100. ISBN 978-0-387-79053-4.

External links[edit]


after-content-x4