Open In App

SPSA (Simultaneous Perturbation Stochastic Approximation) Algorithm using Python

SPSA is an algorithm of optimisation invented by James C. Spall specially useful for noisy cost functions and the ones which the exact gradient is not available. The general Idea is to increase the quality of the answer by estimating the gradient of the cost functions and using it to update the parameters in such way that the cost function is improved (lower its value).

Random Perturbation

The derivative of the Cost function is not calculated, it is approximated by a random perturbation made to all parameters. The perturbation is made by adding (and subtracting) to the parameters vector w a vector composed by random variables of a Bernoulli-like distribution in each of its components. The distribution is such that each variable is 1 or -1 with equal probability.



The perturbation made by the addition creates the w_plus vector, and the one made by the subtraction generates the w_minus vector.  See equations below:



The cost function is now evaluated at the point of this two vectors, and then the slope(gradient) is measured (approximated).

Hyper Parameters

Pseudo Code for the SPSA

A <−−− 0.01N
a <−−− 0.1(A + 1)α/magnitude_g0
for k - 1,2,...,N do
    ak <−−− a/(k + A)α
    ck <−−− c/kγ
    δk <−−− special_random_vector
     g <−−− (L(w + ckδk) - L(w - ckδk))
     w <−−− w - akg
end for

Implementation SPSA

import numpy as np
import matplotlib.pyplot as plt
 
# Defining the seed to have same results
np.random.seed(42)

                    

Below is a helper function to calculate the output of a polynomial, given the coefficients in the array a and the x value (point to measure).

def polynomial(a, x):
 
    N = len(a)
    S = 0
    for k in range(N):
        S += a[k]*x**k
    return S

                    

Below function will calculate the mean squared error between the predicted values and the real values. But the final error will be passed with some noise added/subtracted from it.

def Loss(parameters, X, Y):
 
  # Predictions of our model
    Y_pred = polynomial(parameters, X)
     
    # mse (mean square error)
    L = ((Y_pred - Y)**2).mean()
     
    # Noise in range: [-5, 5]
    noise = 5*np.random.random()
    return L + noise

                    

Below function will calculate the gradients given the loss, weights or coefficients of the functions and 

def grad(L, w, ck):
     
    # number of parameters
    p = len(w)
     
    # bernoulli-like distribution
    deltak = np.random.choice([-1, 1], size=p)
     
    # simultaneous perturbations
    ck_deltak = ck * deltak
 
    # gradient approximation
    DELTA_L = L(w + ck_deltak) - L(w - ck_deltak)
 
    return (DELTA_L) / (2 * ck_deltak)

                    

By using the below helper function we will choose the constants that are required for the algorithm to work well.

def initialize_hyperparameters(alpha, lossFunction, w0, N_iterations):
 
    c = 1e-2   # a small number
 
    # A is <= 10% of the number of iterations
    A = N_iterations*0.1
 
    # order of magnitude of first gradients
    magnitude_g0 = np.abs(grad(lossFunction, w0, c).mean())
 
    # the number 2 in the front is an estimative of
    # the initial changes of the parameters,
    # different changes might need other choices
    a = 2*((A+1)**alpha)/magnitude_g0
 
    return a, A, c

                    

Below functions are the main optimization algorithm and make use of all the helper functions we have defined above.

# optimization algorithm
def SPSA(LossFunction, parameters, alpha=0.602,\
         gamma=0.101, N_iterations=int(1e5)):
     
    # model's parameters
    w = parameter
 
    a, A, c = initialize_hyperparameters(
      alpha, LossFunction, w, N_iterations)
 
    for k in range(1, N_iterations):
 
        # update ak and ck
        ak = a/((k+A)**(alpha))
        ck = c/(k**(gamma))
 
        # estimate gradient
        gk = grad(LossFunction, w, ck)
 
        # update parameters
        w -= ak*gk
 
    return w

                    

Now, we will show the working of the algorithm we have implemented above with an example.

# Y is the polynomial to be approximated
X = np.linspace(0, 10, 100)
Y = 1*X**2 - 4*X + 3
 
noise = 3*np.random.normal(size=len(X))
Y += noise
 
# plot polynomial
plt.title("polynomial with noise")
plt.plot(X, Y, 'go')
plt.show()

                    

Output:

Set of random points with noise

Before training

# Initial parameters are randomly
# choosing in the range: [-10,10]
parameters = (2*np.random.random(3) - 1)*10
 
plt.title("Before training")
 
# Compare true and predicted values before
# training
plt.plot(X, polynomial(parameters, X), "bo")
plt.plot(X, Y, 'go')
plt.legend(["predicted value", "true value"])
plt.show()

                    

Output:

Predictions of the function before training

After training

# Training with SPSA
parameters = SPSA(LossFunction = lambda parameters: Loss(parameters, X, Y),
                  parameters = parameters)
 
plt.title("After training")
plt.plot(X, polynomial(parameters, X), "bo")
plt.plot(X, Y, 'go')
plt.legend(["predicted value", "true value"])
plt.show()

                    

Output:

Predictions after learning the weights


Article Tags :