Spectral Embedding

In the modern world, machine learning and data analysis are essential for seeing patterns, connections, and structures in huge datasets. A powerful method for dimensionality reduction and grouping is spectral embedding. This extensive lesson will walk you through Spectral Embedding from its fundamentals to practical implementation, giving novices a clear grasp while making sure you have all the knowledge you require.

Spectral Embedding

Data is projected onto a lower-dimensional subspace using the spectral embedding method, which reduces the dimensionality of the data while retaining some of its original characteristics. It is predicated on the notion of employing a matrix’s eigenvectors, which stand for the affinity or resemblance between the data points. The visualization of high-dimensional data, clustering, manifold learning, and other applications can all benefit from spectral embedding.

The idea of spectral embedding, how it functions, and how to apply it in Python using the scikit-learn module are all covered in this article. We will also examine some examples of spectral embedding being used on various datasets and contrast the outcomes with other approaches.

Mathematical Concept of Spectral Embedding

A dimensionality reduction method that is frequently applied in data analysis and machine learning is called spectral embedding. High-dimensional data can be visualized and clustered with great benefit from it. Based on spectral graph theory, spectral embedding shares a tight relationship with Principal Component Analysis (PCA).

The first step in spectral embedding is to represent the data as a graph. There are several methods to build this graph, including similarity, epsilon, and k-nearest-neighbor, among others. The graph’s nodes stand in for data points, while the edges connecting them indicate similarities or pairwise relationships.

The creation of the Laplacian matrix, which encodes the graph’s structure, comes next. Laplacian matrices come in various forms, but the most widely used type is the unnormalized Laplacian or ￰ L. It can be computed in the following ways:

Where,

L = Laplacian Matrix

D = Diagonal Degree matrix . Each diagonal entry D_iiis the sum of weights of the edges connected to node i.

W = weighted adjacency matrix, where W_ijrepresents the similarity or weight between nodes i and j.

The Laplacian matrix L’s eigenvalues and eigenvectors must then be calculated. These can be acquired by the resolution of the subsequent generalized eigenvalue issue:

ƛ = eigenvalues

v = corresponding eigenvectors

Once the eigenvalues and eigenvectors are obtained, dimensionality reduction can be carried out by choosing the top k eigenvectors that match the lowest k eigenvalues. These k eigenvectors combine to create a new matrix, V_k .

The eigenvectors are used as the new feature vectors for the data points in order to achieve spectral embedding. The data points’ coordinates in the lower-dimensional space are determined by the k eigenvectors in V_k. At this point, every data point is represented by a k-dimensional vector.

Parameters of Spectral Embedding

We can use the scikit-learn framework and a class named SpectralEmbedding1 to create spectral embedding in Python. Several factors in this class determine how the affinity matrix is built and how the eigenvalue decomposition is carried out. These are a few of the parameters:

n_components: The dimension of the projected subspace.
affinity: How to construct the affinity matrix. It can be one of {‘nearest_neighbors’, ‘rbf’, ‘precomputed’, ‘precomputed_nearest_neighbors’} or a callable function that takes in a data matrix and returns an affinity matrix.
gamma: The kernel coefficient for rbf kernel. If None, gamma will be set to 1/n_features.
random_state: A pseudo random number generator used for initializing some algorithms.
eigen_solver: The eigenvalue decomposition strategy to use. It can be one of {‘arpack’, ‘lobpcg’, ‘amg’}. AMG requires pyamg to be installed and can be faster on very large sparse problems.
eigen_tol: The stopping criterion for eigendecomposition.
norm_laplacian: Whether to use the normalized Laplacian or not.
drop_first: Whether to drop the first eigenvector or not.

Implementation of Spectral Embedding

In this section, we will see some examples of applying spectral embedding to different datasets and compare the results with other methods.

Implementation using Swiss roll dataset

The Swiss roll dataset is a synthetic dataset that consists of points sampled from a two-dimensional manifold (a Swiss roll) embedded in a three-dimensional space. It is often used to demonstrate the effectiveness of manifold learning methods. Here is how it looks like:

Python

# Import the make_swiss_roll function

from sklearn.datasets import make_swiss_roll

import matplotlib.pyplot as plt
 
# Generate the Swiss roll dataset

X, y = make_swiss_roll(n_samples=1000, noise=0.1, random_state=42)
 
# Plot the dataset

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()

ax = fig.add_subplot(111, projection='3d')

ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y)

ax.set_title('Swiss roll dataset')
plt.show()

Output :

We can observe that the dataset has a distinct structure that the three-dimensional coordinates fail to represent. We may use spectral embedding with two components and closest neighbors affinity to reveal this structure:

Python

from sklearn.manifold import SpectralEmbedding
# Apply spectral embedding

se = SpectralEmbedding(

    n_components=2, affinity='nearest_neighbors', n_neighbors=10, random_state=42)

X_se = se.fit_transform(X)
 
# Plot the embedded data

plt.scatter(X_se[:, 0], X_se[:, 1], c=y)

plt.title('Spectral embedding of Swiss roll dataset')
plt.show()

Output:

We can see that spectral embedding successfully unrolled the Swiss roll and preserved the colors of the original data. The embedded data has a linear shape that can be easily analyzed or clustered.

For comparison, let’s see how principal component analysis (PCA) and multidimensional scaling (MDS) perform on the same dataset. PCA is a linear dimensionality reduction method that finds the directions of maximum variance in the data. MDS is a non-linear dimensionality reduction method that tries to preserve the pairwise distances between the data points.

Python

# Import PCA and MDS classes

from sklearn.decomposition import PCA

from sklearn.manifold import MDS
 
# Apply PCA

pca = PCA(n_components=2, random_state=42)

X_pca = pca.fit_transform(X)
 
# Apply MDS

mds = MDS(n_components=2, random_state=42)

X_mds = mds.fit_transform(X)
 
# Plot the results

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))

ax1.scatter(X_pca[:, 0], X_pca[:, 1], c=y)

ax1.set_title('PCA of Swiss roll dataset')

ax2.scatter(X_mds[:, 0], X_mds[:, 1], c=y)

ax2.set_title('MDS of Swiss roll dataset')
plt.show()

Output:

We can see that PCA failed to capture the structure of the Swiss roll and only projected it onto a plane. MDS did better than PCA but still distorted some of the distances and angles between the data points.

Implementation using Digits dataset

The digits dataset is a real-world dataset that consists of images of handwritten digits from 0 to 9. Each image has 8×8 pixels and is represented as a 64-dimensional vector of grayscale values. The dataset has 1797 samples in total and is often used to benchmark classification algorithms. Here is how some of the images look like:

Python

# Import the load_digits function

from sklearn.datasets import load_digits
 
# Load the digits dataset

X, y = load_digits(return_X_y=True)
 
# Plot some of the images

plt.figure(figsize=(10, 10))

for i in range(100):

    plt.subplot(10, 10, i + 1)

    plt.imshow(X[i].reshape(8, 8), cmap='gray')

    plt.axis('off')
plt.show()

Output:

We can see that the digits have different shapes, sizes, orientations, and styles. To classify them correctly, we need to extract features that are invariant to these variations. One way to do that is to apply spectral embedding with two components and rbf affinity:

Python

# Apply spectral embedding

se = SpectralEmbedding(

    n_components=2, affinity='nearest_neighbors', gamma=0.03, random_state=42)

X_se = se.fit_transform(X)
 
# Plot the embedded data

plt.scatter(X_se[:, 0], X_se[:, 1], c=y, cmap='tab10')

plt.title('Spectral embedding of digits dataset')
plt.colorbar()
plt.show()

Output:

Spectral embedding of digits dataset

We can see that spectral embedding separated the digits into clusters that correspond to their labels. The embedded data has a circular shape that reflects the similarity between the digits. For example, the digits 0 and 8 are close to each other because they have similar shapes, while the digits 1 and 7 are far apart because they have different shapes.

Let’s compare the results of PCA and MDS using the same dataset. The directions having the greatest variation in the data are found using the PCA, a linear dimensionality reduction technique. The pairwise distances between the data points are attempted to be preserved through the non-linear dimensionality reduction technique known as MDS.

Python

# Import PCA and MDS classes

from sklearn.decomposition import PCA

from sklearn.manifold import MDS
 
# Apply PCA

pca = PCA(n_components=2, random_state=42)

X_pca = pca.fit_transform(X)
 
# Apply MDS

mds = MDS(n_components=2, random_state=42)

X_mds = mds.fit_transform(X)
 
# Plot the results

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))

ax1.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='tab10')

ax1.set_title('PCA of digits dataset')

ax2.scatter(X_mds[:, 0], X_mds[:, 1], c=y, cmap='tab10')

ax2.set_title('MDS of digits dataset')
plt.show()

Output:

We can see that PCA and MDS did not perform as well as spectral embedding on this dataset. PCA failed to separate some of the digits that have similar variance, such as 3 and 5 or 4 and 9. MDS did better than PCA but still mixed some of the digits that have different distances, such as 0 and 6 or 2 and 8.

Advantages of Spectral Embedding

There are several advantages of Spectral Embedding. Some of the advantages are:

Preservation of Local and Global Structure: Both local and global structure in the data can be effectively preserved using spectral embedding techniques. Both relationships between close-by data points and relationships between distant data points can be captured by them.
Dimensionality Reduction: Reducing dimensionality without sacrificing structural information about the data is possible using spectral embedding. When data is embedded in a higher-dimensional space on a non-linear manifold, it is especially helpful.
Non-linearity Handling: Spectral embedding is able to identify non-linear patterns in the data, in contrast to linear methods such as PCA. Because of this, it can be used with a variety of datasets when linear approaches might not work.

Disadvantages of Spectral Embedding

Although spectral embedding techniques have many benefits, they also have several drawbacks and restrictions:

Sensitivity to Hyperparameters: Hyperparameters, such as the number of eigenvalues or the distance metric selected, can have a significant impact on the performance of spectral embedding techniques. The quality of the embedding may be affected by the selection of these parameters, which can be difficult to tune.
Computational Complexity: It can take a lot of computing power to calculate eigenvectors and eigenvalues, particularly for huge datasets. For big data applications, spectral embedding is less feasible due to its complexity.
Scalability: High-dimensional data usually does not scale well with spectral embedding. Rather than high-dimensional data processing, it works better for dimensionality reduction or visualization.

Conclusion

This article introduced us to spectral embedding, a method for dimensionality reduction that keeps some of the original data’s characteristics by projecting it onto a lower-dimensional subspace. By creating an affinity matrix, calculating its graph Laplacian, and identifying its eigenvectors, we were able to understand how spectral embedding functions. Additionally, using the scikit-learn module, we learnt how to utilize spectral embedding in Python and saw several examples on how to use it with various datasets. We found that spectral embedding can better capture the structure of non-linear data than linear approaches when we compared the findings with those of other techniques like PCA and MDS.

A potent tool for manifold learning, spectral embedding may be applied for a number of tasks, including visualization, clustering, classification, and dimensionality reduction. It also has some restrictions and difficulties, though, which should be taken into account. Among them are:

Dealing with noise and outliers: Because they can change the affinity matrix and the graph Laplacian, noise and outliers in the data might make spectral embedding vulnerable to them. Poor spectral embedding can be caused by noise and outliers, which can add erroneous connections or disconnect some of the data points. Before using spectral embedding, various preprocessing processes, such as denoising, filtering, or robust estimation, may be required to reduce this issue.

Choosing the right affinity matrix: Spectral embedding quality can be significantly impacted by the selection of the appropriate affinity matrix. Different affinity matrices could be appropriate for various data or activity kinds. For instance, rbf affinity may be effective for data with a smooth manifold structure whereas closest neighbors affinity may be effective for data with a distinct neighborhood structure. To prevent overfitting or underfitting the data, the affinity matrix’s parameters, such as the number of neighbors or the kernel coefficient, must also be carefully set.

Scaling to huge datasets: Because it necessitates building and deconstructing a large matrix, spectral embedding can be computationally costly for large datasets. Spectral embedding has an O(n3) complexity, where n is the number of data points. For datasets of tens of thousands or millions of data points, this may be prohibitive. The size or complexity of the problem may be reduced using approximation or optimization techniques like sparse matrices, random projections, or spectral clustering to address this issue.

Spectral embedding is still a useful and popular method for manifold learning that may uncover the underlying structure of high-dimensional data in spite of these drawbacks and difficulties. To gain fresh perspectives and patterns from your data, it is worthwhile to explore and experiment with spectral embedding.

Article Tags :

AI-ML-DS

Geeks Premier League

Machine Learning

Geeks Premier League 2023

Python scikit-module