Open In App

Python – Gaussian fit

Last Updated : 14 Jan, 2022
Like Article

What is normal or Gaussian distribution?

When we plot a dataset such as a histogram, the shape of that charted plot is what we call its distribution. The most commonly observed shape of continuous values is the bell curve, also called the Gaussian or normal distribution.

It is named after the German mathematician Carl Friedrich Gauss. Some common example datasets that follow Gaussian distribution are Body temperature, People’s height, Car mileage, IQ scores. 

Let’s try to generate the ideal normal distribution and plot it using Python.

How to plot Gaussian distribution in Python

We have libraries like Numpy, scipy, and matplotlib to help us plot an ideal normal curve.


import numpy as np
import scipy as sp
from scipy import stats
import matplotlib.pyplot as plt 
## generate the data and plot it for an ideal normal curve
## x-axis for the plot
x_data = np.arange(-5, 5, 0.001)
## y-axis as the gaussian
y_data = stats.norm.pdf(x_data, 0, 1)
## plot data
plt.plot(x_data, y_data)


The points on the x-axis are the observations, and the y-axis is the likelihood of each observation.

We generated regularly spaced observations in the range (-5, 5) using np.arange(). Then we ran it through the norm.pdf() function with a mean of 0.0 and a standard deviation of 1, which returned the likelihood of that observation. Observations around 0 are the most common, and the ones around -5.0 and 5.0 are rare. The technical term for the pdf() function is the probability density function.

The Gaussian function:

First, let’s fit the data to the Gaussian function. Our goal is to find the values of A and B that best fit our data. First, we need to write a python function for the Gaussian function equation. The function should accept the independent variable (the x-values) and all the parameters that will make it.


#Define the Gaussian function
def gauss(x, H, A, x0, sigma):
    return H + A * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))

We will use the function curve_fit from the python module scipy.optimize to fit our data. It uses non-linear least squares to fit data to a functional form. You can learn more about curve_fit by using the help function within the Jupyter notebook or scipy online documentation.

The curve_fit function has three required inputs: the function you want to fit, the x-data, and the y-data you fit. There are two outputs. The first is an array of the optimal values of the parameters. The second is a matrix of the estimated covariance of the parameters from which you can calculate the standard error for the parameters.

Example 1:


from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
xdata = [ -10.0, -9.0, -8.0, -7.0, -6.0, -5.0, -4.0, -3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
ydata = [1.2, 4.2, 6.7, 8.3, 10.6, 11.7, 13.5, 14.5, 15.7, 16.1, 16.6, 16.0, 15.4, 14.4, 14.2, 12.7, 10.3, 8.6, 6.1, 3.9, 2.1]
# Recast xdata and ydata into numpy arrays so we can use their handy features
xdata = np.asarray(xdata)
ydata = np.asarray(ydata)
plt.plot(xdata, ydata, 'o')
# Define the Gaussian function
def Gauss(x, A, B):
    y = A*np.exp(-1*B*x**2)
    return y
parameters, covariance = curve_fit(Gauss, xdata, ydata)
fit_A = parameters[0]
fit_B = parameters[1]
fit_y = Gauss(xdata, fit_A, fit_B)
plt.plot(xdata, ydata, 'o', label='data')
plt.plot(xdata, fit_y, '-', label='fit')

Example 2:


import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as mpl
# Let's create a function to model and create data
def func(x, a, x0, sigma):
    return a*np.exp(-(x-x0)**2/(2*sigma**2))
# Generating clean data
x = np.linspace(0, 10, 100)
y = func(x, 1, 5, 2)
# Adding noise to the data
yn = y + 0.2 * np.random.normal(size=len(x))
# Plot out the current state of the data and model
fig = mpl.figure()
ax = fig.add_subplot(111)
ax.plot(x, y, c='k', label='Function')
ax.scatter(x, yn)
# Executing curve_fit on noisy data
popt, pcov = curve_fit(func, x, yn)
#popt returns the best fit values for parameters of the given model (func)
print (popt)
ym = func(x, popt[0], popt[1], popt[2])
ax.plot(x, ym, c='r', label='Best fit')


Similar Reads

Python - Inverse Gaussian Distribution in Statistics
scipy.stats.invgauss() is an inverted gauss continuous random variable. It is inherited from the of generic methods as an instance of the rv_continuous class. It completes the methods with details specific for this particular distribution. Parameters : a : shape parameter c : special case of gengauss. Default equals to c = -1 Code #1 : Creating Inv
2 min read
Python - Normal Inverse Gaussian Distribution in Statistics
scipy.stats.norminvgauss() is a Normal Inverse Gaussian continuous random variable. It is inherited from the of generic methods as an instance of the rv_continuous class. It completes the methods with details specific for this particular distribution. Parameters : q : lower and upper tail probability x : quantiles loc : [optional]location parameter
2 min read
Python - Reciprocal Inverse Gaussian Distribution in Statistics
scipy.stats.recipinvgauss() is a reciprocal inverse Gaussian continuous random variable. It is inherited from the of generic methods as an instance of the rv_continuous class. It completes the methods with details specific for this particular distribution. Parameters : q : lower and upper tail probability x : quantiles loc : [optional]location para
2 min read
Visualizing the Bivariate Gaussian Distribution in Python
The Gaussian distribution(or normal distribution) is one of the most fundamental probability distributions in nature. From its occurrence in daily life to its applications in statistical learning techniques, it is one of the most profound mathematical discoveries ever made. This article will ahead towards the multi-dimensional distribution and get
6 min read
Mahotas - Gaussian filtering
In this article we will see how we can do Gaussian filtering in mahotas. For this we are going to use the fluorescent microscopy image from a nuclear segmentation benchmark. We can get the image with the help of command given below mahotas.demos.nuclear_image() A Gaussian filter is a linear filter. It's usually used to blur the image or to reduce n
2 min read
Mahotas – Edges using Difference of Gaussian for binary image
In this article we will see how we can edges of the binary image in mahotas with the help of DoG algorithm. In imaging science, difference of Gaussians (DoG) is a feature enhancement algorithm that involves the subtraction of one blurred version of an original image from another, less blurred version of the original. In order to do this we will use
2 min read
How to generate 2-D Gaussian array using NumPy?
In this article, let us discuss how to generate a 2-D Gaussian array using NumPy. To create a 2 D Gaussian array using the Numpy python module. Functions used:numpy.meshgrid()- It is used to create a rectangular grid out of two given one-dimensional arrays representing the Cartesian indexing or Matrix indexing. Syntax: numpy.meshgrid(*xi, copy=True
2 min read
Prior and Posterior Gaussian Process for Different kernels in Scikit Learn
In this article, we will learn about the Prior and Posterior Gaussian Processes for Different kernels. But first, let's understand what is Prior and Posterior Gaussian Processes are. After that, we will use the sci-kit learn library to see the code implementation for the same in Python. What is the Prior and Posterior Gaussian Process? In Gaussian
6 min read
Gaussian Process Classification (GPC) on the XOR Dataset in Scikit Learn
Gaussian process classification (GPC) is a probabilistic approach to classification that models the conditional distribution of the class labels given the feature values. In GPC, the data is assumed to be generated by a Gaussian process, which is a stochastic process that is characterized by its mean and covariance functions. The mean function in G
4 min read
Probabilistic Predictions with Gaussian Process Classification (GPC) in Scikit Learn
Gaussian Process Classification (GPC) is a probabilistic model for classification tasks. It is based on the idea of using a Gaussian process to model the relationship between the input features and the target labels of a classification problem. GPC makes use of Bayesian inference to make predictions, which means that it can output not only the most
7 min read
Practice Tags :