Runs Test of Randomness in Python

Random numbers are an imperative part of many systems, including simulations, cryptography and much more. So the ability to produce values randomly, with no apparent logic and predictability, becomes a prime function. Since computers cannot produce values which are completely random, algorithms, known as pseudorandom number generators (PRNG) are used to accomplish this task.

The values produced by PRNGs are not truly random and depend on the initial value provided to the algorithm, known as  the seed value. The property of a pseudorandom sequence being reproducible, given it’s seed value is essential for its application in simulations, such as the Monte Carlo Simulation, where the system might need to be tested on the same sequence more than once.

Some of the most popular and highly used PRNGs are:

  1. Mersenne Twister:  Used as the default random number generator in Python, R, Excel, Matlab, Ruby and many more popular software systems.
  2. Linear Congruential Generator: Used in C++ and Java
  3.  Wichmann-Hill Generator: Used in Excel and was the default in Python 2.2
  4.  Park-Miller Generator
  5.  Middle Square Weyl Sequence

To ensure that the values generated by the PRNG are as close to random as possible, several statistical tests including the Diehard tests, TestU01 series, Chi-Square test and the Runs test of Randomness are used. This article focuses on the Runs Test of Randomness.

What is the Runs Test?

Runs test of randomness is a statistical test that is used to check the randomness in data. It is a nonparametric test and uses runs of data to decide whether the presented data is random or tends to follow a pattern. A run is defined as a series of increasing values or decreasing values. The number of increasing, or decreasing, values is the length of the run.



The first step in the runs test is to count the number of runs in the data sequence. There are several ways to define runs, however, in all cases the formulation must produce a dichotomous sequence of values. In our case, the values above the median are treated as positive and values below the median as negative. A run is defined as a series of consecutive positive or negative values.

Applying Runs Test

  • The first step in applying this test is to formulate the null and alternate hypothesis.

                 Hnull : The sequence was produced in a random manner

                 Halt  : The sequence was not produced in a random manner

  • Calculate the test statistic, Z as :            
\qquad\,Z = \frac{R - \bar{R}}{s_R}

Where, 
R = The number of observed runs
R' = The number of expected runs, given as

\qquad\,\bar{R} = \frac{2 n_1 n_2}{n_1 + n_2} + 1

SR  = Standard Deviation of the number of runs

\qquad\,s_{R}^2 = \frac{2 n_1 n_2(2 n_1 n_2 - n_1 - n_2)}                {(n_1 + n_2)^2 (n_1 + n_2 - 1)}

With n1 and n2 = the number of positive and 
negative values in the series
  • Compare the value of the calculated Z-statistic with Zcritical  for a given level of confidence (Zcritical =1.96 for confidence level of 95%) . The null hypothesis is rejected i.e. the numbers are declared not to be random, if |Z|>Zcritical .          

Example:    

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# simple code to implement Runs 
# test of randomnes
  
import random
import math
import statistics
  
  
def runsTest(l, l_median):
  
    runs, n1, n2 = 0, 0, 0
      
    # Checking for start of new run
    for i in range(len(l)):
          
        # no. of runs
        if (l[i] >= l_median and l[i-1] < l_median) or \
                (l[i] < l_median and l[i-1] >= l_median):
            runs += 1  
          
        # no. of positive values
        if(l[i]) >= l_median:
            n1 += 1   
          
        # no. of negative values
        else:
            n2 += 1   
  
    runs_exp = ((2*n1*n2)/(n1+n2))+1
    stan_dev = math.sqrt((2*n1*n2*(2*n1*n2-n1-n2))/ \
                       (((n1+n2)**2)*(n1+n2-1)))
  
    z = (runs-runs_exp)/stan_dev
  
    return z
    
# Making a list of 100 random numbers 
l = []
for i in range(100):
    l.append(random.random())
      
l_median= statistics.median(l)
  
Z = abs(runsTest(l, l_median))
  
print('Z-statistic= ', Z)

chevron_right


Output:

Z-statistic=  1.809160364503323



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.