# Data Analysis with SciPy

SciPy is a python library that is useful in solving many mathematical equations and algorithms. It is designed on the top of Numpy library that gives more extension of finding scientific mathematical formulae like Matrix Rank, Inverse, polynomial equations, LU Decomposition, etc. Using its high level functions will significantly reduce the complexity of the code and helps in better analyzing the data. SciPy is an interactive Python session used as a data-processing library that is made to compete with its rivalries such as MATLAB, Octave, R-Lab,etc. It has many user-friendly, efficient and easy-to-use functions that helps to solve problems like numerical integration, interpolation, optimization, linear algebra and statistics.

The benefit of using SciPy library in Python while making ML models is that it also makes a strong programming language available for use in developing less complex programs and applications.

`# import numpy library ` `import` `numpy as np ` `A ` `=` `np.array([[` `1` `,` `2` `,` `3` `],[` `4` `,` `5` `,` `6` `],[` `7` `,` `8` `,` `8` `]]) ` |

*chevron_right*

*filter_none*

### Linear Algebra

**Determinant of a Matrix**`# importing linalg function from scipy`

`from`

`scipy`

`import`

`linalg`

`# Compute the determinant of a matrix`

`linalg.det(A)`

*chevron_right**filter_none***Output :**2.999999999999997**Compute pivoted LU decomposition of a matrix**

LU decomposition is a method that reduce matrix into constituent parts that helps in easier calculation of complex matrix operations. The decomposition methods are also called matrix factorization methods, are base of linear algebra in computers, even for basic operations such as solving systems of linear equations, calculating the inverse, and calculating the determinant of a matrix.

The decomposition is:

A = P L U

where P is a permutation matrix, L lower triangular with unit diagonal elements, and U upper triangular.`P, L, U`

`=`

`linalg.lu(A)`

`print`

`(P)`

`print`

`(L)`

`print`

`(U)`

`# print LU decomposition`

`print`

`(np.dot(L,U))`

*chevron_right**filter_none***Output :**array([[ 0., 1., 0.], [ 0., 0., 1.], [ 1., 0., 0.]]) array([[ 1. , 0. , 0. ], [ 0.14285714, 1. , 0. ], [ 0.57142857, 0.5 , 1. ]]) array([[ 7. , 8. , 8. ], [ 0. , 0.85714286, 1.85714286], [ 0. , 0. , 0.5 ]]) array([[ 7., 8., 8.], [ 1., 2., 3.], [ 4., 5., 6.]])**Eigen values and eigen vectors of this matrix**`eigen_values, eigen_vectors`

`=`

`linalg.eig(A)`

`print`

`(eigen_values)`

`print`

`(eigen_vectors)`

*chevron_right**filter_none***Output :**array([ 15.55528261+0.j, -1.41940876+0.j, -0.13587385+0.j]) array([[-0.24043423, -0.67468642, 0.51853459], [-0.54694322, -0.23391616, -0.78895962], [-0.80190056, 0.70005819, 0.32964312]])**Solving systems of linear equations can also be done**`v`

`=`

`np.array([[`

`2`

`],[`

`3`

`],[`

`5`

`]])`

`print`

`(v)`

`s`

`=`

`linalg.solve(A,v)`

`print`

`(s)`

*chevron_right**filter_none***Output :**array([[2], [3], [5]]) array([[-2.33333333], [ 3.66666667], [-1. ]])

### Sparse Linear Algebra

SciPy has some routines for computing with sparse and potentially very large matrices. The necessary tools are in the submodule scipy.sparse.

**Lets look on how to construct a large sparse matrix:**

`# import necessary modules ` `from` `scipy ` `import` `sparse ` `# Row-based linked list sparse matrix ` `A ` `=` `sparse.lil_matrix((` `1000` `, ` `1000` `)) ` `print` `(A) ` ` ` `A[` `0` `,:` `100` `] ` `=` `np.random.rand(` `100` `) ` `A[` `1` `,` `100` `:` `200` `] ` `=` `A[` `0` `,:` `100` `] ` `A.setdiag(np.random.rand(` `1000` `)) ` `print` `(A) ` |

*chevron_right*

*filter_none*

Output :<1000x1000 sparse matrix of type '' with 0 stored elements in LInked List format> <1000x1000 sparse matrix of type '' with 1199 stored elements in LInked List format>

**Linear Algebra for Sparse Matrices**`from`

`scipy.sparse`

`import`

`linalg`

`# Convert this matrix to Compressed Sparse Row format.`

`A.tocsr()`

`A`

`=`

`A.tocsr()`

`b`

`=`

`np.random.rand(`

`1000`

`)`

`ans`

`=`

`linalg.spsolve(A, b)`

`# it will print ans array of 1000 size`

`print`

`(ans)`

*chevron_right**filter_none***Output :**array([-2.53380006e+03, -1.25513773e+03, 9.14885544e-01, 2.74521543e+00, 5.99942835e-01, 4.57778093e-01, 1.87104209e-01, 2.15228367e+00, 8.78588432e-01, 1.85105721e+03, 1.00842538e+00, 4.33970632e+00, 5.26601699e+00, 2.17572231e-01, 1.79869079e+00, 3.83800946e-01, 2.57817130e-01, 5.18025462e-01, 1.68672669e+00, 3.07971950e+00, 6.20604437e-01, 1.41365890e-01, 3.18167429e-01, 2.06457302e-01, 8.94813817e-01, 5.06084834e+00, 5.00913942e-01, 1.37391305e+00, 2.32081425e+00, 4.98093749e+00, 1.75492222e+00, 3.17278127e-01, 8.50013844e-01, 1.17524493e+00, 1.70173722e+00, .............))

### Integration

When a function is very difficult to integrate analytically, one simply find a solution through numerical integration methods. SciPy has a capability for doing numerical integration also. Scipy has integration methods in **scipy.integrate** module.

**Single Integrals**

The Quad routine is the important function out of SciPy’s integration functions. If integration in over f(x) function where x ranges from a to b, then integral looks like this.

The parameters of quad is scipy.integrate.quad(f, a, b), Where ‘f’ is the function to be integrated. Whereas, ‘a’ and ‘b’ are the lower and upper ranges of x limit. Let us see an example of integrating over the**range of 0 and 1 with respect to dx**.

We will first define the function f(x)=e^(-x^2) , this is done using a lambda expression and then use quad routine.`import`

`scipy.integrate`

`f`

`=`

`lambda`

`x:np.exp(`

`-`

`x`

`*`

`*`

`2`

`)`

`# print results`

`i`

`=`

`scipy.integrate.quad(f,`

`0`

`,`

`1`

`)`

`print`

`(i)`

*chevron_right**filter_none*(0.7468241328124271, 8.291413475940725e-15)

The quad function returns the two values, in which the first number is the value of integral and the second value is the probable error in the value of integral.

**Double Integrals**

The parameters of**dblquad**function is**scipy.integrate.dblquad(f, a, b, g, h)**. Where, ‘f’ is the function to be integrated, ‘a’ and ‘b’ are the lower and upper ranges of the x variable, respectively, while ‘g’ and ‘h’ are the functions that tells the lower and upper limits of y variable.

As an example, let us perform the double integral of x*y^2 over x range from 0 to 2 and y ranges from 0 to 1.

We define the functions f, g, and h, using the lambda expressions. Note that even if g and h are constants, as they may be in many cases, they must be defined as functions, as we have done here for the lower limit.`from`

`scipy`

`import`

`integrate`

`f`

`=`

`lambda`

`y, x: x`

`*`

`y`

`*`

`*`

`2`

`i`

`=`

`integrate.dblquad(f,`

`0`

`,`

`2`

`,`

`lambda`

`x:`

`0`

`,`

`lambda`

`x:`

`1`

`)`

`# print the results`

`print`

`(i)`

*chevron_right**filter_none*Output : (0.6666666666666667, 7.401486830834377e-15)

There is a lot more that SciPy is capable of, such as Fourier Transforms, Bessel Functions, etc.

You can refer the Documentation for more details!

## Recommended Posts:

- Exploratory Data Analysis in Python | Set 2
- Data Analysis and Visualization with Python | Set 2
- Exploratory Data Analysis in Python | Set 1
- Exploratory Data Analysis in Python
- Python | Data analysis using Pandas
- Data analysis and Visualization with Python
- Violin Plot for Data Analysis
- Multidimensional data analysis in Python
- Python | Math operations for Data analysis
- Analysis of test data using K-Means Clustering in Python
- Replacing strings with numbers in Python for Data Analysis
- scipy stats.chi() | Python
- scipy stats.f() | Python
- SciPy | Curve Fitting
- scipy stats.cauchy() | Python

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.