Open In App

Faces dataset decompositions in Scikit Learn

The Faces dataset is a database of labeled pictures of people’s faces that can be found in the well-known machine learning toolkit Scikit-Learn. Face recognition, facial expression analysis, and other computer vision applications are among the frequent uses for it. The Labeled Faces in the Wild (LFW) benchmark includes the dataset.

What is Decompositions?

Decomposition is the process of disassembling a complicated data matrix into smaller, easier-to-understand parts. For high-dimensional data, such as photographs, principal component analysis, or PCA, is a frequently used decomposition approach. It finds the highest variance in the data by identifying the principal components, which are linear combinations of the original characteristics.



Concepts related to the topic:

  1. Principal Component Analysis (PCA): Finding a dataset’s main components is accomplished using the dimensionality reduction approach known as principal component analysis (PCA).
  2. Eigenfaces: The principal components derived by PCA are often referred to as eigenfaces in the context of face recognition.
  3. Singular Value Decomposition (SVD): A further matrix decomposition technique for reducing dimensionality is called singular value decomposition (SVD).

Implementing Faces Dataset Decompositions

1.Import necessary libraries:




import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_lfw_people
from sklearn.decomposition import PCA

The necessary libraries are imported in this step: NumPy for numerical operations, Matplotlib for charting, and Scikit-Learn for PCA implementation and access to the Faces dataset.

2.Load the Faces dataset:




faces_data = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

The code uses Scikit-Learn’s fetch_lfw_people method to get the Labeled Faces in the Wild (LFW) dataset. The photographs are resized to 40% of their original size, and the minimum number of faces per person is set at 70.



3.Preprocess the data:




X = faces_data.data
n_samples, n_features = X.shape

In this stage, the feature matrix X is extracted from the dataset, and the number of features (n_features) and samples (n_samples) in the dataset are calculated.

4.Apply PCA for decomposition:




n_components = 150
pca = PCA(n_components=n_components, svd_solver='randomized', whiten=True).fit(X)

The code applies PCA to the data using the fit technique and sets the number of components (n_components) for PCA to 150. For efficiency, we use a randomized solution, whitening the data in the process.

5.Visualize eigenfaces:




eigenfaces = pca.components_.reshape(
    (n_components, faces_data.images.shape[1], faces_data.images.shape[2]))

In this stage, the principal components from PCA are transformed into the form of pictures, or eigenfaces. The directions of highest variance in the original face pictures are represented by these eigenfaces.

6.Plot the first 10 eigenfaces:




plt.figure(figsize=(10, 3))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(eigenfaces[i], cmap='gray')
    plt.title(f"Eigenface {i + 1}")
plt.show()

Output:

Eigenfaces

The code uses Matplotlib to plot the first ten eigenfaces, visualizing them in a 2×5 grid.

7.Reconstruct faces using a subset of principal components:




n_faces = 5
random_faces_indices = np.random.randint(0, n_samples, n_faces)
random_faces = X[random_faces_indices]

Five faces are chosen at random from the dataset in this section to illustrate the reconstruction procedure.

8.Transform faces into principal components:




faces_pca = pca.transform(random_faces)

With the previously fitted PCA model, the chosen faces are converted into the space of principle components.

9.Reconstruct faces from principal components:




faces_reconstructed = pca.inverse_transform(faces_pca)

The inverse_transform function is used by the algorithm to recreate the faces from the changed main components.

10.Visualize original and reconstructed faces:




plt.figure(figsize=(10, 3))
for i in range(n_faces):
    plt.subplot(2, n_faces, i + 1)
    plt.imshow(random_faces[i].reshape(
        faces_data.images.shape[1], faces_data.images.shape[2]), cmap='gray')
    plt.title("Original")
 
    plt.subplot(2, n_faces, i + 1 + n_faces)
    plt.imshow(faces_reconstructed[i].reshape(
        faces_data.images.shape[1], faces_data.images.shape[2]), cmap='gray')
    plt.title("Reconstructed")
plt.show()

Output:

Faces dataset decompositions

Similarly, we can perform Non-Negative Matrix Factorization (NMF).

Non-Negative Matrix Factorization (NMF)

Non-Negative Matrix Factorization (NMF) is a mathematical technique used in machine learning and data analysis for dimensionality reduction and feature extraction. It is particularly useful when the data involved has non-negative values, such as images, audio spectrograms, or text data represented as term-document matrices.

In the following code snippet, we have demonstrated how NMF can be used for facial image decomposition and reconstruction. Through visualizations help in understanding the learned facial features and the effectiveness of the NMF model in reconstructing faces from the reduced feature space. Adjusting parameters such as the number of components (n_components) can impact the quality of reconstruction.




from sklearn.decomposition import NMF
nmf = NMF(n_components=n_components, tol=5e-3)
nmf.fit(X)  # original non- negative dataset
 
# Visualize
nmf_faces = nmf.components_.reshape(
    (n_components, faces_data.images.shape[1], faces_data.images.shape[2]))
 
# Plot the first 10 faces
plt.figure(figsize=(10, 3))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(nmf_faces[i], cmap='gray')
    plt.title(f"NMF face {i + 1}")
 
plt.show()
 
# Reconstruct faces
n_faces = 5
random_faces_indices = np.random.randint(0, n_samples, n_faces)
random_faces = X[random_faces_indices]
 
# Transform faces
faces_nmf = nmf.transform(random_faces)
 
# Reconstruct faces
faces_reconstructed = nmf.inverse_transform(faces_nmf)
 
# Visualize original and reconstructed faces
plt.figure(figsize=(10, 3))
for i in range(n_faces):
    plt.subplot(2, n_faces, i + 1)
    plt.imshow(random_faces[i].reshape(
        faces_data.images.shape[1], faces_data.images.shape[2]), cmap='gray')
    plt.title("Original")
 
    plt.subplot(2, n_faces, i + 1 + n_faces)
    plt.imshow(faces_reconstructed[i].reshape(
        faces_data.images.shape[1], faces_data.images.shape[2]), cmap='gray')
    plt.title("Reconstructed")
 
plt.show()

Output:

Non-Negative Matrix Factorization (NMF)

Conclusion

Facial recognition systems may be understood and implemented with the help of the Faces dataset and the eigenfaces decomposition method using Scikit-Learn. In order to use the generated eigenfaces for face-related tasks, the dataset must be loaded, the photos must be preprocessed, and PCA must be used to reduce dimensionality. A rudimentary approach of Scikit-Learn’s features is shown in the sample code.


Article Tags :