Sudoku Solver using TensorFlow

Last Updated : 14 Dec, 2023

The goal of the project is to build a Sudoku solver that can complete Sudoku problems autonomously using the capabilities of TensorFlow, a Google open-source machine learning toolkit. The algorithm aims to recognize patterns and relationships within the incomplete grids; the solver will be able to predict the missing numbers and ultimately provide a solution.

Sudoku Solver using TensorFlow

The architecture of the Sudoku Solver model, and the training process using TensorFlow during this journey. This project offers a fascinating junction of logic and technology, whether you are a Sudoku enthusiast, a machine learning enthusiast, or both. Let’s take on the challenge of creating a Sudoku solver with TensorFlow.

What is Sudoku?

Sudoku is a classic puzzle that has captured the minds of millions worldwide. It’s not only an excellent way to challenge your logical thinking and problem-solving skills but also serves as an intriguing subject for machine learning and artificial intelligence.

Sudoku is a 9×9 grid puzzle with numbers filled in some cells, leaving others empty. The objective is to fill in the empty cells so that each row, column, and 3×3 sub-grid contains all the digits from 1 to 9 without repetition. Solving Sudoku is traditionally done using backtracking algorithms, but we’ll combine this approach with TensorFlow for digit recognition.

Let’s build a sudoku solver.

Importing required libraries

The implementation requires the following libraries:

Numpy
Pandas
Keras is a high-level API for TensorFlow.

Python3

import numpy as np
import pandas as pd
import keras
import keras.backend as K
from keras.optimizers import Adam
from keras.models import Sequential
from keras.utils import Sequence
from keras.layers import *

Loading data

Using the following code, we create a DataFrame with columns “quizzes” and “solutions” based on the “puzzle” and “solution” columns in the dataset.The dataset is assumed to have columns “puzzle” containing the initial Sudoku puzzle configuration and “solution” containing the correct solution.

Dataset: 9 million Sudoku Puzzles and Solutions

Python3

data = pd.read_csv("/content/sudoku.csv")
try:
    data = pd.DataFrame({"quizzes": data["puzzle"], "solutions": data["solution"]})
except:
    pass

Define a Data generator

The following code defines a custom data generator class (DataGenerator) that inherits from Keras’s Sequence class. This class is used to generate batches of data for training the neural network.

Python3

class DataGenerator(Sequence):
    def __init__(self, df,batch_size = 16,subset = "train",shuffle = False, info={}):
        super().__init__()
        self.df = df
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.subset = subset
        self.info = info
 
        self.on_epoch_end()
 
    def __len__(self):
        return int(np.floor(len(self.df)/self.batch_size))
    def on_epoch_end(self):
        self.indexes = np.arange(len(self.df))
        if self.shuffle==True:
            np.random.shuffle(self.indexes)
 
    def __getitem__(self,index):
        X = np.empty((self.batch_size, 9,9,1))
        y = np.empty((self.batch_size,81,1))
        indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
        for i,f in enumerate(self.df['quizzes'].iloc[indexes]):
            self.info[index*self.batch_size+i]=f
            X[i,] = (np.array(list(map(int,list(f)))).reshape((9,9,1))/9)-0.5
        if self.subset == 'train':
            for i,f in enumerate(self.df['solutions'].iloc[indexes]):
                self.info[index*self.batch_size+i]=f
                y[i,] = np.array(list(map(int,list(f)))).reshape((81,1)) - 1
        if self.subset == 'train': return X, y
        else: return X

In the code snippet,

Initialization (__init__):
- df: DataFrame containing “quizzes” and “solutions” columns.
- batch_size: Number of samples in each batch.
- subset: “train” or “validation” subset.
- shuffle: Whether to shuffle the data.
- info: Dictionary to store information (optional).
__len__ Method:
- Returns the number of batches in the dataset.
on_epoch_end Method:
- Shuffles the indexes at the end of each epoch if shuffle is set to True.
__getitem__ Method:
- Generates one batch of data.
- Normalizes the input Sudoku puzzles.
- For the training subset, also prepares the target solutions.

Building the Neural Network

The following code snippet,

Creates a Sequential model in Keras.
Adds Convolutional Neural Network (CNN) layers to the model.
Compiles the model using the Adam optimizer with a specified learning rate and sparse categorical crossentropy loss.

Python3

model = Sequential()
 
model.add(Conv2D(64, kernel_size=(3,3), activation='relu', padding='same', input_shape=(9,9,1)))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(3,3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(128, kernel_size=(1,1), activation='relu', padding='same'))
 
model.add(Flatten())
model.add(Dense(81*9))
model.add(Reshape((-1, 9)))
model.add(Activation('softmax'))
 
model.compile(loss='sparse_categorical_crossentropy',  optimizer=keras.optimizers.Adam(learning_rate=0.001), metrics=['accuracy'])
model.summary()

Output:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 9, 9, 64)          640       
                                                                 
 batch_normalization (Batch  (None, 9, 9, 64)          256       
 Normalization)                                                  
                                                                 
 conv2d_1 (Conv2D)           (None, 9, 9, 64)          36928     
                                                                 
 batch_normalization_1 (Bat  (None, 9, 9, 64)          256       
 chNormalization)                                                
                                                                 
 conv2d_2 (Conv2D)           (None, 9, 9, 128)         8320      
                                                                 
 flatten (Flatten)           (None, 10368)             0         
                                                                 
 dense (Dense)               (None, 729)               7559001   
                                                                 
 reshape (Reshape)           (None, 81, 9)             0         
                                                                 
 activation (Activation)     (None, 81, 9)             0         
                                                                 
=================================================================
Total params: 7605401 (29.01 MB)
Trainable params: 7605145 (29.01 MB)
Non-trainable params: 256 (1.00 KB)
_________________________________________________________________

In the above code snippet,

The model architecture is defined with
- Three Convolutional layers with batch normalization and ReLU activation.
- Flattened the output and pass through a Dense layer.
- Reshaped to match the Sudoku grid dimensions.
- Applied softmax activation to predict a probability distribution for each cell.
The model is compiled with
- Loss: Sparse categorical crossentropy.
- Optimizer: Adam with a learning rate of 0.001.
- Metrics: Accuracy.
Trained the model using the fit_generator method, training_generator and validation_generator are instances of the DataGenerator class. Epoch is set to 5 with specified callbacks

Python3

train_idx = int(len(data)*0.95)
data = data.sample(frac=1).reset_index(drop=True)
training_generator = DataGenerator(data.iloc[:train_idx], subset = "train", batch_size=640)
validation_generator = DataGenerator(data.iloc[train_idx:], subset = "train",  batch_size=640)
 
from keras.callbacks import Callback, ModelCheckpoint, ReduceLROnPlateau
filepath1="weights-improvement-{epoch:02d}-{val_accuracy:.2f}.hdf5"
filepath2 = "best_weights.hdf5"
checkpoint1 = ModelCheckpoint(filepath1, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
checkpoint2 = ModelCheckpoint(filepath2, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
 
reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    patience=3,
    verbose=1,
    min_lr=1e-6
)
callbacks_list = [checkpoint1,checkpoint2,reduce_lr]
 
history = model.fit_generator(training_generator, validation_data = validation_generator, epochs = 5, verbose=1,callbacks=callbacks_list )

Output:

Epoch 1/5
1485/1485 [==============================] - 140s 95ms/step - loss: 0.7082 - accuracy: 0.7124 - val_loss: 0.4045 - val_accuracy: 0.8139

Epoch 00001: val_accuracy improved from -inf to 0.81389, saving model to weights-improvement-01-0.81.hdf5

Epoch 00001: val_accuracy improved from -inf to 0.81389, saving model to best_weights.hdf5
Epoch 2/5
1485/1485 [==============================] - 133s 90ms/step - loss: 0.3917 - accuracy: 0.8190 - val_loss: 0.3809 - val_accuracy: 0.8201

Model Training Callbacks:

In line, callbacks_list = [checkpoint1, checkpoint2, reduce_lr]

filepath1: Saves model weights for every epoch.
filepath2: Saves model weights for the best validation accuracy.
ReduceLROnPlateau Callback: Reduces the learning rate if the validation loss plateaus.

Loading the best model weights

Python3

model.load_weights('/content/best_weights.hdf5')

Sudoku solver function:

Python3

def solve_sudoku_with_nn(model, puzzle):
    # Preprocess the input Sudoku puzzle
    puzzle = puzzle.replace('\n', '').replace(' ', '')
    initial_board = np.array([int(j) for j in puzzle]).reshape((9, 9, 1))
    initial_board = (initial_board / 9) - 0.5
 
    while True:
        # Use the neural network to predict values for empty cells
        predictions = model.predict(initial_board.reshape((1, 9, 9, 1))).squeeze()
        pred = np.argmax(predictions, axis=1).reshape((9, 9)) + 1
        prob = np.around(np.max(predictions, axis=1).reshape((9, 9)), 2)
 
        initial_board = ((initial_board + 0.5) * 9).reshape((9, 9))
        mask = (initial_board == 0)
 
        if mask.sum() == 0:
            # Puzzle is solved
            break
 
        prob_new = prob * mask
 
        ind = np.argmax(prob_new)
        x, y = (ind // 9), (ind % 9)
 
        val = pred[x][y]
        initial_board[x][y] = val
        initial_board = (initial_board / 9) - 0.5
 
    # Convert the solved puzzle back to a string representation
    solved_puzzle = ''.join(map(str, initial_board.flatten().astype(int)))
 
    return solved_puzzle

In the above code snippet,

def solve_sudoku_with_nn(model, puzzle):: Defines a function named solve_sudoku_with_nn that takes a neural network model (model) and a string representation of a Sudoku puzzle (puzzle) as input.
puzzle = puzzle.replace(‘\n’, ”).replace(‘ ‘, ”): Removes newline characters and spaces from the input puzzle string.
initial_board = np.array([int(j) for j in puzzle]).reshape((9, 9, 1)): Converts the string to a NumPy array of integers and reshapes it to a 3D array representing the Sudoku grid.
initial_board = (initial_board / 9) – 0.5: Scales the values in the array to be between -0.5 and 0.5.

Solving the Puzzle with the Neural Network:

“while True:”: Initiates an infinite loop for solving the Sudoku puzzle.
predictions = model.predict(initial_board.reshape((1, 9, 9, 1))).squeeze(): Uses the neural network to predict values for empty cells in the Sudoku puzzle.
pred = np.argmax(predictions, axis=1).reshape((9, 9)) + 1: Extracts the most probable digit predictions and reshapes them into a 9×9 grid.
prob = np.around(np.max(predictions, axis=1).reshape((9, 9)), 2): Extracts the maximum probability for each prediction and reshapes it into a 9×9 grid.

Updating the Sudoku Grid:

initial_board = ((initial_board + 0.5) * 9).reshape((9, 9)): Rescales the Sudoku grid to the original range (0 to 9).
mask = (initial_board == 0): Creates a mask for identifying empty cells in the Sudoku grid.

Checking for Completion:

if mask.sum() == 0:: Checks if there are no more empty cells in the Sudoku grid, indicating the puzzle is solved.
break: Breaks out of the loop if the puzzle is solved.

Selecting the Next Cell:

prob_new = prob * mask: Applies the mask to the probabilities to consider only empty cells.
ind = np.argmax(prob_new): Finds the index of the maximum probability among the empty cells.
x, y = (ind // 9), (ind % 9): Converts the 1D index to 2D coordinates.

Updating the Grid with Predicted Value:

val = pred[x][y]: Gets the predicted digit for the selected empty cell.
initial_board[x][y] = val: Updates the Sudoku grid with the predicted digit.
initial_board = (initial_board / 9) – 0.5: Rescales the Sudoku grid for the next iteration.

Conversion to String Representation:

solved_puzzle = ”.join(map(str, initial_board.flatten().astype(int))): Converts the solved Sudoku grid back to a string representation.

Returning the Solved Puzzle:

return solved_puzzle: Returns the solved Sudoku puzzle as a string.Example Sudoku puzzles

Python3

def print_sudoku_grid(puzzle):
    puzzle = puzzle.replace('\n', '').replace(' ', '')
    for i in range(9):
        if i % 3 == 0 and i != 0:
            print("-"*21)
 
        for j in range(9):
            if j % 3 == 0 and j != 0:
                print("|", end=" ")
            print(puzzle[i*9 + j], end=" ")
        print()
new_game = '''
          0 0 0 0 0 0 0 0 0
          0 0 0 0 0 0 0 0 0
          0 0 0 0 0 0 0 0 0
          0 0 0 0 0 0 0 0 0
          0 0 0 0 0 0 0 0 0
          0 0 0 0 0 0 0 0 0
          0 0 0 0 0 0 0 0 0
          0 0 0 0 0 0 0 0 0
          0 0 0 0 0 0 0 0 0
      '''
 
game = '''
          0 0 0 7 0 0 0 9 6
          0 0 3 0 6 9 1 7 8
          0 0 7 2 0 0 5 0 0
          0 7 5 0 0 0 0 0 0
          9 0 1 0 0 0 3 0 0
          0 0 0 0 0 0 0 0 0
          0 0 9 0 0 0 0 0 1
          3 1 8 0 2 0 4 0 7
          2 4 0 0 0 5 0 0 0
      '''
 
solved_puzzle_nn = solve_sudoku_with_nn(model, game)
 
# Print the solved puzzle as a grid
print("Sudoku Solution (NN):")
print_sudoku_grid(solved_puzzle_nn)

Output:
Capture

Here is the link for the kaggle notebook: Kaggle notebook on Sudoku solver

Suggest improvement

Stock Price Prediction Project using TensorFlow

Share your thoughts in the comments

Sudoku Solver using TensorFlow