Open In App

Kolmogorov-Smirnov Test in R Programming

Last Updated : 10 Mar, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

The Kolmogorov-Smirnov Test is a type of non-parametric test of the equality of discontinuous and continuous a 1D probability distribution that is used to compare the sample with the reference probability test (known as one-sample K-S Test) or among two samples (known as two-sample K-S test). A K-S Test quantifies the distance between the cumulative distribution function of the given reference distribution and the empirical distributions of given two samples, or between the empirical distribution of given two samples. In a one-sample K-S test, the distribution that is considered under a null hypothesis can be purely discrete or continuous, or mixed. In the two-sample K-S test, the distribution considered under the null hypothesis is generally continuous distribution but it is unrestricted otherwise. The Kolmogorov-Smirnov test can be done very easily in R Programming Language.

Kolmogorov-Smirnov Test Formula

The formula for the Kolmogorov-Smirnov test can be given as:

D_n=sup_x\left| F_n(x) - F(x)\right|

where,

  • supx : the supremum of the set of distances
  •  Fn(x) : the empirical distribution function for n id observations Xi

The empirical distribution function is a distribution function that is associated with the empirical measures of the chosen sample. Being a step function, this cumulative distribution jumps up by a 1/n step at each and every n data point. 

One Sample Kolmogorov-Smirnov Test in R

The K-S test can be performed using the ks.test() function in R. 

Syntax:

ks.text(x, y, alternative = c(“two.sided”, “less”, “greater”), exact= NULL, tol= 1e-8, 
simulate.p.value = FALSE, B=2000)

Parameters:

  • x: numeric vector of data values
  • y: numeric vector of data values or a character string which is used to name a cumulative distribution function.
  • alternative: used to indicate the alternate hypothesis.
  • exact: usually NULL or it indicates a logic that an exact p-value should be computed.
  • tol: an upper bound used for rounding off errors in the data values.
  • simulate.p.value: a logic that checks whether to use Monte Carlo method to compute the p-value.
  • B: an integer value that indicates the number of replicates to be created while using the Monte Carlo method.

Let us understand how to execute a K-S Test step by step using an example of a two-sample K-S test. First, install the required packages. For performing the K-S test we need to install the “dgof” package using the install.packages() function from the R console.

R

# installing the required package
install.packages("dgof")

                    

After a successful installation of the package, load the required package in our R Script. for that purpose, use the library() function as follows:

R

# loading the required package
library("dgof")

                    

Use the rnorm() function to generate samples say x. The rnorm() function is used to generate random variates.

R

x1 <- rnorm(100)
ks.test(x1, "pnorm")

                    

Output:

    One-sample Kolmogorov-Smirnov test

data:  x1
D = 0.10091, p-value = 0.2603
alternative hypothesis: two-sided

Two Sample Kolmogorov-Smirnov Test in R

Use the rnorm() function and the runif() function to generate samples say x and y. The rnorm() function is used to generate random variates while the runif() function is used to generate random deviates.

R

# generating random variate sample 1
x <- rnorm(50)
 
# generating random deviates sample 2
y <- runif(30)

                    

Now perform the K-S test on these two samples. For that purpose, use the ks.test() of the dgof package.

R

# performing the K-S Test Do x and y
# come from the same distribution?
ks.test(x, y)

                    

Output: 

    Two-sample Kolmogorov-Smirnov test

data:  x and y
D = 0.84, p-value = 5.151e-14
alternative hypothesis: two-sided

Visualization of the Kolmogorov- Smirnov Test in R

Being quite sensitive to the difference in shape and location of the empirical cumulative distribution of the chosen two samples, the two-sample K-S test is efficient, and one of the most general and useful non-parametric tests. Hence we will see how the graph represents the difference between the two samples.

Here we are generating both samples using the rnorm() functions and then plotting them.  

R

# loading the required package
library(dgof)
 
# sample 1
# generating a random variate
x <- rnorm(50)
 
# sample 2
# generating a random variate
x2 <- rnorm(50, -1)
 
# plotting the result
# visualization
plot(ecdf(x),
     xlim = range(c(x, x2)),
     col = "blue")
plot(ecdf(x2),
     add = TRUE,
     lty = "dashed",
     col = "red")
 
# performing the K-S
# Test on x and x2
ks.test(x, x2, alternative = "l")

                    

Output: 

    Two-sample Kolmogorov-Smirnov test

data:  x and x2
D^- = 0.34, p-value = 0.003089
alternative hypothesis: the CDF of x lies below that of y
Visualization of the Kolmogorov- Smirnov Test in R

Visualization of the Kolmogorov- Smirnov Test in R



Previous Article
Next Article

Similar Reads

Kolmogorov-Smirnov Test (KS Test)
The Kolmogorov-Smirnov (KS) test is a non-parametric method for comparing distributions, essential for various applications in diverse fields. In this article, we will look at the non-parametric test which can be used to determine whether the shape of the two distributions is the same or not. What is Kolmogorov-Smirnov Test?Kolmogorov–Smirnov Test
13 min read
ML | Kolmogorov-Smirnov Test
Kolmogorov–Smirnov test a very efficient way to determine if two samples are significantly different from each other. It is usually used to check the uniformity of random numbers. Uniformity is one of the most important properties of any random number generator and Kolmogorov–Smirnov test can be used to test it. The Kolmogorov–Smirnov test may also
3 min read
Performing Binomial Test in R programming - binom.test() Method
With the help of binom.test() method, we can get the binomial test for some hypothesis of binomial distribution in R Programming. Syntax: binom.test(x, n, p-value) Return: Returns the value of binomial test. Example 1: # Using binom.test() method gfg &lt;- binom.test(58, 100) print(gfg) Output: Exact binomial test data: 58 and 100 number of success
1 min read
Performing F-Test in R programming - var.test() Method
var.test() Method in R Programming language perform the f-test between two normal populations with some hypothesis that variances of two populations are equal in R programming. var.test() Method in R Syntax Syntax: var.test() Return: Returns the F-Test score for some hypothesis. var.test() Method in R Programming language Example Example 1: C/C++ C
1 min read
Differences Between two-sample, t-test and paired t-test
Statistical tests are essential tools in data analysis, helping researchers make inferences about populations based on sample data. Two common tests used to compare the means of different groups are the two-sample t-test and the paired t-test. Both tests are based on the t-distribution, but they have distinct use cases and assumptions. In this arti
7 min read
T-Test Approach in R Programming
We will be trying to understand the T-Test in R Programming with the help of an example. Suppose a businessman with two sweet shops in a town wants to check if the average number of sweets sold in a day in both stores is the same or not. So, the businessman takes the average number of sweets sold to 15 random people in the respective shops. He foun
6 min read
One-Proportion Z-Test in R Programming
The One proportion Z-test is used to compare an observed proportion to a theoretical one when there are only two categories. For example, we have a population of mice containing half male and half females (p = 0.5 = 50%). Some of these mice (n = 160) have developed spontaneous cancer, including 95 males and 65 females. We want to know, whether canc
4 min read
Shapiro–Wilk Test in R Programming
The Shapiro-Wilk's test or Shapiro test is a normality test in frequentist statistics. The null hypothesis of Shapiro's test is that the population is distributed normally. It is among the three tests for normality designed for detecting all kinds of departure from normality. If the value of p is equal to or less than 0.05, then the hypothesis of n
4 min read
Two-Proportions Z-Test in R Programming
A two-proportion z-test allows us to compare two proportions to see if they are the same. It calculates the range of values that is likely to include the difference between the population proportions. The z-test is based on a standard normal distribution. It has a critical value i.e. 1.96. for 5% two-tailed. In R, the function used for performing a
6 min read
Fisher’s F-Test in R Programming
Fisher’s F test calculates the ratio between the larger variance and the smaller variance. We use the F test when we want to check whether the means of three or more groups are different or not. F-test is used to assess whether the variances of two populations (A and B) are equal. The method is simple; it consists of taking the ratio between the la
4 min read