Open In App

How to reduce dimensionality on Sparse Matrix in Python?

A matrix usually consists of a combination of zeros and non-zeros. When a matrix is comprised mostly of zeros, then such a matrix is called a sparse matrix. A matrix that consists of maximum non-zero numbers, such a matrix is called a dense matrix. Sparse matrix finds its application in high dimensional Machine learning and deep learning problems. In other words, when a matrix has many of its coefficients as zero, such a matrix is said to be sparse.

The common area where we come across such sparse dimensionality problems is



In such cases, we cannot afford to have a matrix of the large dimensional matrix, as it can increase the time and space complexity of the problem, so it is recommended to reduce the dimensionality of the sparse matrix. In this article let us discuss the implementation of how to reduce the dimensionality of the sparse matrix in python

The dimensionality of the sparse matrix can be reduced by first representing the dense matrix as a Compressed sparse row representation in which the sparse matrix is represented using three one-dimensional arrays for the non-zero values, the extents of the rows, and the column indexes. Then, by using scikit-learn’s TruncatedSVD, it is possible to reduce the dimensionality of the sparse matrix.



Example:

First load the inbuilt digits dataset from the scikit-learn package, Standardize each data point using standardscaler. Represent the Standardized matrix in its sparse form using csr_matrix as shown. Now import the TruncatedSVD from sklearn and specify the no. of dimensions required in the final output Finally check for the shape of the reduced matrix




from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import TruncatedSVD
from scipy.sparse import csr_matrix
from sklearn import datasets
from numpy import count_nonzero
 
# load the inbuilt digits dataset
digits = datasets.load_digits()
 
print(digits.data)
 
# shape of the dense matrix
print(digits.data.shape)
 
# standardizing the data points
X = StandardScaler().fit_transform(digits.data)
print(X)
 
# representing in CSR form
X_sparse = csr_matrix(X)
print(X_sparse)
 
# specify the no of output features
tsvd = TruncatedSVD(n_components=10)
 
# apply the truncatedSVD function
X_sparse_tsvd = tsvd.fit(X_sparse).transform(X_sparse)
print(X_sparse_tsvd)
 
# shape of the reduced matrix
print(X_sparse_tsvd.shape)

Output:

Code:

Let us cross verify the original dimension and transformed dimension




print("Original number of features:", X.shape[1])
print("Reduced number of features:", X_sparse_tsvd.shape[1])

Output:


Article Tags :