Open In App

Isomap | A Non-linear Dimensionality Reduction Technique

Last Updated : 02 Jan, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

A nonlinear dimensionality reduction method used in data analysis and machine learning is called isomap, short for isometric mapping. Isomap was developed to maintain the inherent geometry of high-dimensional data as a substitute for conventional techniques like Principal Component Analysis (PCA). Isomap creates a low-dimensional representation, usually a two- or three-dimensional map, by focusing on the preservation of pairwise distances between data points.

This technique works especially well for extracting the underlying structure from large, complex datasets, like those from speech recognition, image analysis, and biological systems. Finding patterns and insights in a variety of scientific and engineering domains is made possible by Isomap’s capacity to highlight the fundamental relationships found in data.

Isomap

An understanding and representation of complicated data structures are crucial for the field of machine learning. To achieve this, Manifold Learning, a subset of unsupervised learning, has a significant role to play. Among the manifold learning techniques, ISOMAP (Isometric Mapping) stands out for its prowess in capturing the intrinsic geometry of high-dimensional data. In the case of situations in which linear methods are lacking, they have proved particularly efficient.

ISOMAP is a flexible tool that seamlessly blends multiple learning and dimensionality reduction intending to obtain more detailed knowledge of the underlying structure of data. This article takes a look at ISOMAP’s inner workings and sheds light on its parameters, functions, and proper implementation with SkLearn.

Isometric mapping is an approach to reduce the dimensionality of machine learning.

Manifold Learning

To understand the underlying structure of complex data, Manifold Learning is an unsupervised method of learning. Basically, it is aimed at capturing the inherent characteristics of High Definition Datasets and representing them from a less dimensional space. Multiple learning allows the discovery of nonlinear relationships hidden within data, which is a valuable asset in different applications compared to linear techniques.

Isometric Mapping Concept

The idea of an Isometric Map, which aims to preserve pairwise distance between points, is central to ISOMAP. In doing so, it seeks to achieve low dimensionality representation for the data while at the same time keeping geodesic distances as shortest possible along the curving edge of the data manifold. This is particularly important in situations where the underlying structure has been broken or folded, since traditional methods such as PCA are not able to take these nuances into account.

Relation between Geodesic Distances and Euclidean Distances

Understanding the distinction between equatorial and elliptic distances is of vital importance for ISOMAP. The geodesic distance considers the shortest path along the curved surface of the manifold, as opposed to Euclidean distances which are measured by measuring straight Line distances in the input space. In order to provide a more precise representation of the data’s internal structure, ISOMAP exploits these quantum distances.

ISOMAP Parameters

ISOMAP comes with several parameters, each influencing the dimensionality reduction process:

  • n_neighbors: Determines the number of neighbors used to approximate geodesic distances. Higher values may make it possible to achieve higher results, but they still require more computing power.
  • n_components: Determines the number of dimensions in a low dimensional representation.
  • eigen_solver: Determines the method used for decomposing an Eigenvalue. There are options such as “auto”, “arpack” and “dense.”
  • radius: You can designate a radius within which neighbors are taken into account in place of using a set number of neighbors. Outside of this range, data points are not regarded as neighbors.
  • tol: tolerance in the eigenvalue solver to attain convergence. While a lower value might result in a more accurate solution, it might also lengthen the computation time.
  • max_iter: The maximum number of times the eigenvalue solver can run. It continues if None is selected, unless convergence or additional stopping conditions are satisfied.
  • path_method: chooses the approximation technique for geodesic distances on the graph. ‘auto’ (automatic selection) and ‘FW’ (Floyd-Warshall algorithm) are available options.
  • neighbors_algorithm: A method for calculating the closest neighbors. ‘Auto’, ‘ball_tree’, ‘kd_tree’, and ‘brute’ are among the available options. ‘auto’ selects the best algorithm according to the input data.
  • metric: The nearest neighbor search’s distance metric. ‘Minkowski‘ is the default; however, ‘euclidean’,’manhattan’, and several other options are also available.

Working of ISOMAP

  • Calculate the pairwise distances: The algorithm starts by calculating the Euclidean distances between the data points.
  • Find nearest neighbors according to these distances: For each data point, its k nearest neighbor is determined by that distance.
  • Create a neighborhood plot: the edges of each point are aligned with their closest neighbors, which creates a diagram that represents the data’s regional structure.
  • Calculate geodesic distances: The Floyd algorithm sorts through all the pairs of data points in a neighborhood graph and finds the most distant paths. geodesic distances are represented by these shortest paths.
  • Perform dimensional reduction: Classical Multi Scaling MDS is used for geodesic distance matrices that result in low dimensional embedding of data.

Implementation of Isomap

Python




from sklearn.datasets import make_s_curve
from sklearn.manifold import Isomap
import matplotlib.pyplot as plt
 
# Generate S-curve data
X, color = make_s_curve(n_samples=1000, random_state=42)
 
# Apply Isomap
isomap = Isomap(n_neighbors=10, n_components=2)
X_isomap = isomap.fit_transform(X)
 
# Plot the original and reduced-dimensional data
fig, ax = plt.subplots(1, 2, figsize=(12, 5))
 
ax[0].scatter(X[:, 0], X[:, 2], c=color, cmap=plt.cm.Spectral)
ax[0].set_title('Original 3D Data')
 
ax[1].scatter(X_isomap[:, 0], X_isomap[:, 1], c=color, cmap=plt.cm.Spectral)
ax[1].set_title('Isomap Reduced 2D Data')
 
plt.show()


Output:

Screenshot-(1391)

The output of the above code

This sample of code illustrates how to apply the dimensionality reduction method Isomap to a dataset of S curves. Plotting the original 3D data next to the reduced 2D data for visualization follows the generation of S-curve data with 3D coordinates using Isomap. The fundamental connections between data points in a lower-dimensional space are preserved by the Isomap transformation, which captures the underlying geometric structure. With the inherent patterns in the data still intact, the resultant visualization shows how effective Isomap is at unfolding the S-curve structure in a more manageable 2D representation.

Implementation on 2D Data

Python




from sklearn.datasets import load_digits
from sklearn.manifold import Isomap
import matplotlib.pyplot as plt
 
# Load the digits dataset
digits = load_digits()
 
# Apply Isomap
isomap = Isomap(n_neighbors=30, n_components=2)
digits_isomap = isomap.fit_transform(digits.data)
 
# Plot the original and reduced-dimensional data
fig, ax = plt.subplots(1, 2, figsize=(12, 5))
 
ax[0].scatter(digits.data[:, 0], digits.data[:, 1], c=digits.target, cmap=plt.cm.tab10)
ax[0].set_title('Original 2D Data (First Two Features)')
 
ax[1].scatter(digits_isomap[:, 0], digits_isomap[:, 1], c=digits.target, cmap=plt.cm.tab10)
ax[1].set_title('Isomap Reduced 2D Data')
 
plt.show()


Output:

Screenshot-(1392)

The output of the above code

This code applies the renowned digits dataset to Isomap, an algorithm for dimensionality reduction. It loads the digit images first, then uses Isomap to reduce the dimensionality of the data to two dimensions. Plotting the original 2D data using the first two features, the code compares its 2D representation with that of the isomap transformation. A digit is represented by each point in the scatter plots, and the color indicates the digit’s actual value. By maintaining relationships and facilitating the visualization of digit clusters in a lower-dimensional space, the visualization shows how Isomap captures the inherent structure of the high-dimensional digit data.

Advantages and Disadvantages of Isomap

Advantages

  • Capturing non linear relationships: Unlike linear dimensional reduction techniques such as PCA, Isomap is able to capture the underlying non linear structure of the data.
  • Global structure: Isomap’s goal is to preserve the overall relationship between data points, which will give a better representation of the entire manifold.
  • Globally optimised: The algorithm guarantees that on the built neighborhood graph, where geodesic distances are defined, a global optimal solution will be found.

Disadvanatges

  • Computational cost: for large datasets, computation of geodesic distance using Floyd’s algorithm can be computationally expensive and lead to a longer run time.
  • Sensitive to parameter settings: incorrect selection of the parameters may lead to a distortion or misleading insert.
  • May be difficult for manifolds with holes or topological complexity, which may lead to inaccurate representations: Isomap is not capable of performing well in a manifold that contains holes or other topological complexity.

Applications of Isomap

  • Visualization: High-dimensional data like face images can be visualized in a lower-dimensional space, enabling easier exploration and understanding.
  • Data exploration: Isomap can help identify clusters and patterns within the data that are not readily apparent in the original high-dimensional space.
  • Anomaly detection: Outliers that deviate significantly from the underlying manifold can be identified using Isomap.
  • Machine learning tasks: Isomap can be used as a pre-processing step for other machine learning tasks, such as classification and clustering, by improving the performance and interpretability of the models.

Conclusion

In order to study and analyse multidimensional data, an isomap is a valuable tool. It is an excellent addition to any data scientist’s toolbox, as it allows for capturing non-linear relationships and the visualization of complex data structures. Researchers will gain deeper insight into their data and leverage new opportunities to use machine learning applications thanks to Isomap with the ease of Scikit-learn.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads