Fashion MNIST with Python Keras and Deep Learning
Deep learning is a subfield of machine learning related to artificial neural networks. The word deep means bigger neural networks with a lot of hidden units. Deep learning’s CNN’s have proved to be the state-of-the-art technique for image recognition tasks. Keras is a deep learning library in Python which provides an interface for creating an artificial neural network. It is an open-sourced program. It is built on top of Tensorflow.
The prime objective of this article is to implement a CNN to perform image classification on the famous fashion MNIST dataset. In this, we will be implementing our own CNN architecture. The process will be divided into three steps: data analysis, model training, and prediction.
First, let’s include all the required libraries
In the data analysis, we will see the number of images available, the dimensions of each image, etc. We will then split the data into training and testing.
The fashion MNIST dataset consists of 60,000 images for the training set and 10,000 images for the testing set. Each image is a 28 x 28 size grayscale image categorized into ten different classes.
Each image has a label associated with it. There are, in total, ten labels available, and they are:
- Ankle boot
Now we will see some of the sample images from the fashion MNIST dataset. For this, we will use the library matplotlib to show our np array data in the form of plots of images.
We will add an empty color dimension to the dataset. Now the dimensions of the images will be 28 x 28 x 1, so now the images have become three-channel images.
Convolutional Neural Networks (CNN)
Convolutional Neural Network(CNN) is a subclass of an artificial neural network(ANN) which is mostly used for image-related applications. The input for a CNN is an image, and there are different operations performed on that image to extract its important features of it and then decide the weights to give the correct output. These features are learned using filters. Filters help to detect certain image properties such as horizontal lines, vertical lines, edges, corners, etc. As we go deep into the network, the network learns to defect complex features such as objects, face, background, foreground, etc.
CNNs have three main types of layers:
- Convolutional Layer: This layer is the main layer of CNN. When an image is fed into the convolution layer, a filter or a kernel of varying size but generally of size 3×3 is used to detect the features. The dot product is carried out with the image, and the kernel is the output is stored in a cell of a matrix which is called a feature map or an activation map. Once the operation is done, the filter moves by a distance and then repeats the process. This distance is called a stride. After each convolution operation, a ReLu transformation is applied to the feature map to introduce non-linearity into the model.
- Pooling Layer: This layer is responsible for reducing the number of parameters in the next layer. It is also known as downsampling or dimensionality reduction.
- Fully Connected Layer: Neurons in this layer have full connectivity to all the neurons in the preceding layer and the succeeding layer. FC layer helps to map the input with the output.
We will create a straightforward CNN architecture with three convolutional layers followed by three max-pooling layers for this dataset. Convolutional layers will perform the convolutional operation and extract the features, while the max-pooling layer will downsample the features.
Once the model architecture is defined, we will compile and build the model.
We use Adam optimizers in most CNN architectures because it is very efficient on larger problems and helps us achieve correct weights and learning rates with minimum loss. The summary of the model is as follows.
Once all the model parameters are set, the model is ready to be trained. We will train the model for ten epochs, with each epoch having 100 steps.
Let us save the model.
In this section, we will plot some graphs related to accuracy and loss to evaluate model performance. First, we will see the accuracy and plot the loss.
To make the predictions call the predict() function on the model and pass the image into it. To perform the prediction, we will first create a list of labels in order of the corresponding output layer of the CNN. The predict() function will return the list of values of probabilities that the current input belongs probably belongs to which class. Then by using the argmax(), we will find the highest value and then output the correct label.
Hence we have successfully performed image classification on the fashion MNIST dataset.