Machine Learning for recognizing hand written digits (MNIST dataset) | Set 1

This article demonstrates the basic workflow of recognizing hand written digits. After loading the so-called MNIST data-set with images of hand-written digits, we define and optimize a simple mathematical model in TensorFlow. The results are then plotted and discussed.

This was developed using Python 3.5.2 (Anaconda) and TensorFlow version ‘0.12.0-rc1’


import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.metrics import confusion_matrix
    Input : tf.__version__
    Output : '0.12.0-rc1'

Loading Data

from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets("data/MNIST/", one_hot=True)
Extracting data/MNIST/train-images-idx3-ubyte.gz
Extracting data/MNIST/train-labels-idx1-ubyte.gz
Extracting data/MNIST/t10k-images-idx3-ubyte.gz
Extracting data/MNIST/t10k-labels-idx1-ubyte.gz

The MNIST data-set has now been loaded and consists of 70.000 images and associated labels (i.e. classifications of the images). The data-set is split into 3 mutually exclusive sub-sets. We will only use the training and test-sets in this tutorial.

print("Size of:")
print("- Training-set:\t\t{}".format(len(data.train.labels)))
print("- Test-set:\t\t{}".format(len(data.test.labels)))
print("- Validation-set:\t{}".format(len(data.validation.labels)))
    Size of:
    - Training-set:        55000
    - Test-set:        10000
    - Validation-set:    5000

One-Hot Encoding
The data-set has been loaded as so-called One-Hot encoding. This means the labels have been converted from a single number to a vector whose length equals the number of possible classes. All elements of the vector are zero except for the $i$th element which is one and means the class is $i$. For example, the One-Hot encoded labels for the first 5 images in the test-set are:

data.test.labels[0:5, :]
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.]])

We can now see the class for the first five images in the test-set. Compare these to the One-Hot encoded vectors above. For example, the class for the first image is 7, which corresponds to a One-Hot encoded vector where all elements are zero except for the element with index 7.

    array([7, 2, 1, 0, 4])

Data dimensions
The data dimensions are used in several places in the source-code below. In computer programming it is generally best to use variables and constants rather than having to hard-code specific numbers every time that number is used. This means the numbers only have to be changed in one single place. Ideally these would be inferred from the data that has been read, but here we just write the numbers.

    # We know that MNIST images are 28 pixels in each dimension.
    img_size = 28
    # Images are stored in one-dimensional arrays of this length.
    img_size_flat = img_size * img_size
    # Tuple with height and width of images used to reshape arrays.
    img_shape = (img_size, img_size)
    # Number of classes, one class for each of 10 digits.
    num_classes = 10

Helper-function for plotting images
Function used to plot 9 images in a 3×3 grid, and writing the true and predicted classes below each image.

    def plot_images(images, cls_true, cls_pred=None):
        assert len(images) == len(cls_true) == 9
        # Create figure with 3x3 sub-plots.
        fig, axes = plt.subplots(3, 3)
        fig.subplots_adjust(hspace=0.3, wspace=0.3)
        for i, ax in enumerate(axes.flat):
            # Plot image.
            ax.imshow(images[i].reshape(img_shape), cmap='binary')
            # Show true and predicted classes.
            if cls_pred is None:
                xlabel = "True: {0}".format(cls_true[i])
                xlabel = "True: {0}, Pred: {1}".format(cls_true[i], cls_pred[i])
            # Remove ticks from the plot.

Plot a few images to see if data is correct

# Get the first images from the test-set.
images = data.test.images[0:9]

# Get the true classes for those images.
cls_true = data.test.cls[0:9]

# Plot the images and labels using our helper-function above.
plot_images(images=images, cls_true=cls_true)

Placeholder variables

    x = tf.placeholder(tf.float32, [None, img_size_flat])
    y_true = tf.placeholder(tf.float32, [None, num_classes])
    y_true_cls = tf.placeholder(tf.int64, [None])

Variables to be optimized
Apart from the placeholder variables that were defined above and which serve as feeding input data into the model, there are also some model variables that must be changed by TensorFlow so as to make the model perform better on the training data.

    weights = tf.Variable(tf.zeros([img_size_flat, num_classes]))
    biases = tf.Variable(tf.zeros([num_classes]))

This simple mathematical model multiplies the images in the placeholder variable x with the weights and then adds the biases.

    logits = tf.matmul(x, weights) + biases
    """Now logits is a matrix with num_images rows and num_classes columns,
    where the element of the ith row and jth column is an estimate
    of how likely the ith input image is to be of the jth class."""
    y_pred = tf.nn.softmax(logits)
    """The predicted class can be calculated from the
    y_pred matrix by taking the index of the largest
    element in each row."""
    y_pred_cls = tf.argmax(y_pred, dimension=1)

Cost-function to be optimized
TensorFlow has a built-in function for calculating the cross-entropy. Note that it uses the values of the logits because it also calculates the softmax internally.

    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits,
    cost = tf.reduce_mean(cross_entropy)

Optimization Method
Note that optimization is not performed at this point. In fact, nothing is calculated at all, we just add the optimizer-object to the TensorFlow graph for later execution.

    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(cost)

Performance Measure
We need a few more performance measures to display the progress to the user.

    # This is a vector of booleans whether the predicted class
    # equals the true class of each image.
    correct_prediction = tf.equal(y_pred_cls, y_true_cls)
    # then calculating the average of these numbers.
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Running Our Tensorflow Program
Create Tensorflow Session

    # Tensorflow Session
    session = tf.Session()

Helper Functions

    # Optimizer
    def optimize(num_iterations):
        for i in range(num_iterations):
            # Get a batch of training examples.
            # x_batch now holds a batch of images and
            # y_true_batch are the true labels for those images.
            x_batch, y_true_batch = data.train.next_batch(batch_size)
            # Put the batch into a dict with the proper names
            # for placeholder variables in the TensorFlow graph.
            # Note that the placeholder for y_true_cls is not set
            # because it is not used during training.
            feed_dict_train = {x: x_batch,
                               y_true: y_true_batch}
            # Run the optimizer using this batch of training data.
            # TensorFlow assigns the variables in feed_dict_train
            # to the placeholder variables and then runs the optimizer.
  , feed_dict=feed_dict_train)
    # for performance
    feed_dict_test = {x: data.test.images,
                      y_true: data.test.labels,
                      y_true_cls: data.test.cls}
    # for accuracy
    def print_accuracy():
        # Use TensorFlow to compute the accuracy.
        acc =, feed_dict=feed_dict_test)
        # Print the accuracy.
        print("Accuracy on test-set: {0:.1%}".format(acc))
    # printing confusion matrix using scikit-learn
    def print_confusion_matrix():
        # Get the true classifications for the test-set.
        cls_true = data.test.cls
        # Get the predicted classifications for the test-set.
        cls_pred =, feed_dict=feed_dict_test)
        # Get the confusion matrix using sklearn.
        cm = confusion_matrix(y_true=cls_true,
        # Print the confusion matrix as text.
        # Plot the confusion matrix as an image.
        plt.imshow(cm, interpolation='nearest',
        # Make various adjustments to the plot.
        tick_marks = np.arange(num_classes)
        plt.xticks(tick_marks, range(num_classes))
        plt.yticks(tick_marks, range(num_classes))
    # plotting model weights
    def plot_weights():
        # Get the values for the weights from the TensorFlow variable.
        w =
        # Get the lowest and highest values for the weights.
        # This is used to correct the colour intensity across
        # the images so they can be compared with each other.
        w_min = np.min(w)
        w_max = np.max(w)
        # Create figure with 3x4 sub-plots,
        # where the last 2 sub-plots are unused.
        fig, axes = plt.subplots(3, 4)
        fig.subplots_adjust(hspace=0.3, wspace=0.3)
        for i, ax in enumerate(axes.flat):
            # Only use the weights for the first 10 sub-plots.
            if i<10:
                # Get the weights for the i'th digit and reshape it.
                # Note that w.shape == (img_size_flat, 10)
                image = w[:, i].reshape(img_shape)
                # Set the label for the sub-plot.
                ax.set_xlabel("Weights: {0}".format(i))
                # Plot the image.
                ax.imshow(image, vmin=w_min, vmax=w_max, cmap='seismic')
            # Remove ticks from each sub-plot.

Performance before any optimization
The accuracy on the test-set is 9.8%. This is because the model has only been initialized and not optimized at all, so it always predicts that the image shows a zero digit, as demonstrated in the plot below, and it turns out that 9.8% of the images in the test-set happens to be zero digits.Its predicting 0 for every digits and there is 9.8% 0 digits in dataset.

Accuracy on test-set: 9.8%

Performance after 1 optimization iteration
the model has increased its accuracy on the test-set to 40.7% up from 9.8%

    Accuracy on test-set: 40.7%

Performance after 10 optimization iterations

#plotting weights
Accuracy on test-set: 78.2%

After 1000 iteration accuracy will be 91.7% which is good

This article is contributed by Shubham Singh. If you like GeeksforGeeks and would like to contribute, you can also write an article using or mail your article to See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

GATE CS Corner    Company Wise Coding Practice

Recommended Posts:

Writing code in comment? Please use, generate link and share the link here.