Faces dataset decompositions in Scikit Learn

The Faces dataset is a database of labeled pictures of people’s faces that can be found in the well-known machine learning toolkit Scikit-Learn. Face recognition, facial expression analysis, and other computer vision applications are among the frequent uses for it. The Labeled Faces in the Wild (LFW) benchmark includes the dataset.

What is Decompositions?

Decomposition is the process of disassembling a complicated data matrix into smaller, easier-to-understand parts. For high-dimensional data, such as photographs, principal component analysis, or PCA, is a frequently used decomposition approach. It finds the highest variance in the data by identifying the principal components, which are linear combinations of the original characteristics.

Concepts related to the topic:

Principal Component Analysis (PCA): Finding a dataset’s main components is accomplished using the dimensionality reduction approach known as principal component analysis (PCA).
Eigenfaces: The principal components derived by PCA are often referred to as eigenfaces in the context of face recognition.
Singular Value Decomposition (SVD): A further matrix decomposition technique for reducing dimensionality is called singular value decomposition (SVD).

Implementing Faces Dataset Decompositions

1.Import necessary libraries:

Python3

import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import fetch_lfw_people

from sklearn.decomposition import PCA

The necessary libraries are imported in this step: NumPy for numerical operations, Matplotlib for charting, and Scikit-Learn for PCA implementation and access to the Faces dataset.

2.Load the Faces dataset:

Python3

faces_data = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

The code uses Scikit-Learn’s fetch_lfw_people method to get the Labeled Faces in the Wild (LFW) dataset. The photographs are resized to 40% of their original size, and the minimum number of faces per person is set at 70.

3.Preprocess the data:

Python3

X = faces_data.data

n_samples, n_features = X.shape

In this stage, the feature matrix X is extracted from the dataset, and the number of features (n_features) and samples (n_samples) in the dataset are calculated.

4.Apply PCA for decomposition:

Python3

n_components = 150

pca = PCA(n_components=n_components, svd_solver='randomized', whiten=True).fit(X)

The code applies PCA to the data using the fit technique and sets the number of components (n_components) for PCA to 150. For efficiency, we use a randomized solution, whitening the data in the process.

5.Visualize eigenfaces:

Python3

eigenfaces = pca.components_.reshape(

    (n_components, faces_data.images.shape[1], faces_data.images.shape[2]))

In this stage, the principal components from PCA are transformed into the form of pictures, or eigenfaces. The directions of highest variance in the original face pictures are represented by these eigenfaces.

6.Plot the first 10 eigenfaces:

Python3

plt.figure(figsize=(10, 3))

for i in range(10):

    plt.subplot(2, 5, i + 1)

    plt.imshow(eigenfaces[i], cmap='gray')

    plt.title(f"Eigenface {i + 1}")
plt.show()

Output:

Eigenfaces

The code uses Matplotlib to plot the first ten eigenfaces, visualizing them in a 2×5 grid.

7.Reconstruct faces using a subset of principal components:

Python3

n_faces = 5

random_faces_indices = np.random.randint(0, n_samples, n_faces)

random_faces = X[random_faces_indices]

Five faces are chosen at random from the dataset in this section to illustrate the reconstruction procedure.

8.Transform faces into principal components:

Python3

faces_pca = pca.transform(random_faces)

With the previously fitted PCA model, the chosen faces are converted into the space of principle components.

9.Reconstruct faces from principal components:

Python3

faces_reconstructed = pca.inverse_transform(faces_pca)

The inverse_transform function is used by the algorithm to recreate the faces from the changed main components.

10.Visualize original and reconstructed faces:

Python3

plt.figure(figsize=(10, 3))

for i in range(n_faces):

    plt.subplot(2, n_faces, i + 1)

    plt.imshow(random_faces[i].reshape(

        faces_data.images.shape[1], faces_data.images.shape[2]), cmap='gray')

    plt.title("Original")
 
    plt.subplot(2, n_faces, i + 1 + n_faces)

    plt.imshow(faces_reconstructed[i].reshape(

        faces_data.images.shape[1], faces_data.images.shape[2]), cmap='gray')

    plt.title("Reconstructed")
plt.show()

Output:

Faces dataset decompositions

Similarly, we can perform Non-Negative Matrix Factorization (NMF).

Non-Negative Matrix Factorization (NMF)

Non-Negative Matrix Factorization (NMF) is a mathematical technique used in machine learning and data analysis for dimensionality reduction and feature extraction. It is particularly useful when the data involved has non-negative values, such as images, audio spectrograms, or text data represented as term-document matrices.

In the following code snippet, we have demonstrated how NMF can be used for facial image decomposition and reconstruction. Through visualizations help in understanding the learned facial features and the effectiveness of the NMF model in reconstructing faces from the reduced feature space. Adjusting parameters such as the number of components (n_components) can impact the quality of reconstruction.

Python3

from sklearn.decomposition import NMF

nmf = NMF(n_components=n_components, tol=5e-3)

nmf.fit(X)  # original non- negative dataset
 
# Visualize

nmf_faces = nmf.components_.reshape(

    (n_components, faces_data.images.shape[1], faces_data.images.shape[2]))
 
# Plot the first 10 faces

plt.figure(figsize=(10, 3))

for i in range(10):

    plt.subplot(2, 5, i + 1)

    plt.imshow(nmf_faces[i], cmap='gray')

    plt.title(f"NMF face {i + 1}")
 
plt.show()
 
# Reconstruct faces

n_faces = 5

random_faces_indices = np.random.randint(0, n_samples, n_faces)

random_faces = X[random_faces_indices]
 
# Transform faces

faces_nmf = nmf.transform(random_faces)
 
# Reconstruct faces

faces_reconstructed = nmf.inverse_transform(faces_nmf)
 
# Visualize original and reconstructed faces

plt.figure(figsize=(10, 3))

for i in range(n_faces):

    plt.subplot(2, n_faces, i + 1)

    plt.imshow(random_faces[i].reshape(

        faces_data.images.shape[1], faces_data.images.shape[2]), cmap='gray')

    plt.title("Original")
 
    plt.subplot(2, n_faces, i + 1 + n_faces)

    plt.imshow(faces_reconstructed[i].reshape(

        faces_data.images.shape[1], faces_data.images.shape[2]), cmap='gray')

    plt.title("Reconstructed")
 
plt.show()

Output:

Non-Negative Matrix Factorization (NMF)

Conclusion

Facial recognition systems may be understood and implemented with the help of the Faces dataset and the eigenfaces decomposition method using Scikit-Learn. In order to use the generated eigenfaces for face-related tasks, the dataset must be loaded, the photos must be preprocessed, and PCA must be used to reduce dimensionality. A rudimentary approach of Scikit-Learn’s features is shown in the sample code.

Article Tags :

AI-ML-DS

Geeks Premier League

Machine Learning

AI-ML-DS With Python

Geeks Premier League 2023

Python scikit-module