# Kolmogorov-Smirnov Test in R Programming

• Last Updated : 22 Jul, 2020

The Kolmogorov-Smirnov Test is a type of non-parametric test of the equality of discontinuous and continuous of a 1D probability distribution that is used to compare the sample with the reference probability test (known as one-sample K-S Test) or among two samples (known as two-sample K-S test). A K-S Test quantifies a distance between the cumulative distribution function of the given reference distribution and the empirical distributions of given two samples, or between the empirical distribution of given two samples. In a one-sample K-S test, the distribution that is considered under a null hypothesis can be purely discrete or continuous or mixed. In the two-sample K-S test, the distribution considered under the null hypothesis is generally continuous distribution but it is unrestricted otherwise. The Kolmogorov-Smirnov test can be done very easily in R Programming.

#### Kolmogorov-Smirnov Test Formula

The formula for the Kolmogorov-Smirnov test can be given as:

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready. where,

supx : the supremum of the set of distances

Fn(x) : the empirical distribution function for n id observations Xi

The empirical distribution function is a distribution function that is associated with the empirical measures of the chosen sample. Being a step function, this cumulative distribution jumps up by a 1/n step at each and every n data points.

#### Implementation in R

The K-S test can be performed using the ks.test() function in R.

Syntax:

ks.text(x, y, …, alternative = c(“two.sided”, “less”, “greater”), exact= NULL, tol= 1e-8,
simulate.p.value = FALSE, B=2000)

Parameters:

x: numeric vector of data values
y: numeric vector of data values or a character string which is used to name a cummulative distribution function.
…: the parameters which are defined by the y value

alternative: used to indicate the alternate hypothesis.
exact: usually NULL or it indicates a logic that an exact p-value should be computed.

tol: an upper bound used for rounding off errors in the data values.
simulate.p.value: a logic that checks whether to use Monte Carlo method to compute the p-value.
B: an integer value that indicates the number of replicates to be created while using the Monte Carlo method.

Let us understand how to execute a K-S Test step by step using an example of a two-sample K-S test.

• Step 1: At first install the required packages. For performing the K-S test we need to install the “dgof” package using the install.packages() function from the R console.
```install.packages("dgof")
```
• Step 2: After a successful installation of the package, load the required package in our R Script. for that purpose, use the library() function as follows:

## R

 `# loading the required package``library``(``"dgof"``)`
• Step 3: Use the rnorm() function and the runif() function to generate to samples say x and y. The rnorm() function is used to generate random variates while the runif() function is used to generate random deviates.

## R

 `# loading the required package``library``(dgof) `` ` `# generating random variate``# sample 1``x <- ``rnorm``(50)`` ` `# generating random deviates``# sample 2``y <- ``runif``(30)`
• Step 4: Now perform the K-S test on these two samples. For that purpose, use the ks.test() of the dgof package.

## R

 `# loading the required package``library``(dgof) `` ` `# generating random variate``# sample 1``x <- ``rnorm``(50)`` ` `# generating random deviates``# sample 2``y <- ``runif``(30)`` ` `# performing the K-S Test``# Do x and y come from ``# the same distribution?``ks.test``(x, y)`

Output:

```    Two-sample Kolmogorov-Smirnov test

data:  x and y
D = 0.84, p-value = 5.151e-14
alternative hypothesis: two-sided
```

#### Visualization of the Kolmogorov- Smirnov Test in R

Being quite sensitive to the difference of shape and location of the empirical cumulative distribution of the chosen two samples, the two-sample K-S test is efficient, and one of the most general and useful non-parametric test. Hence we will see how the graph represents the difference between the two samples.

Example:

Here we are generating both the samples using the rnorm() functions and then plot them.

## R

 `# loading the required package``library``(dgof) `` ` `# sample 1``# generating a random variate``x <- ``rnorm``(50)`` ` `# sample 2``# generating a random variate``x2 <- ``rnorm``(50, -1)`` ` `# plotting the result``# visualization``plot``(``ecdf``(x), ``     ``xlim = ``range``(``c``(x, x2)), ``     ``col = ``"blue"``)``plot``(``ecdf``(x2), ``     ``add = ``TRUE``, ``     ``lty = ``"dashed"``,``     ``col = ``"red"``)`` ` `# performing the K-S ``# Test on x and x2``ks.test``(x, x2, alternative = ``"l"``)`

Output:

```    Two-sample Kolmogorov-Smirnov test

data:  x and x2
D^- = 0.34, p-value = 0.003089
alternative hypothesis: the CDF of x lies below that of y
``` My Personal Notes arrow_drop_up