Skip to content
Related Articles

Related Articles

Improve Article
Implementing Deep Q-Learning using Tensorflow
  • Last Updated : 18 Jun, 2019

Prerequisites: Deep Q-Learning

This article will demonstrate how to do reinforcement learning on a larger environment than previously demonstrated. We will be implementing Deep Q-Learning technique using Tensorflow.

Note: A graphics rendering library is required for the following demonstration. For Windows operating system, PyOpenGl is suggested while for Ubuntu operating system, OpenGl is recommended.

Step 1: Importing the required libraries




import numpy as np
import gym
  
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam
  
from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory

Step 2: Building the Environment



Note: A preloaded environment will be used from OpenAI’s gym module which contains many different environments for different purposes. The list of environments can be viewed from their website.

Here, the ‘MountainCar-v0’ environment will be used. In this, a car(the agent) is stuck between two mountains and has to drive uphill on one of them. The car’s engine is not strong enough to drive up on it’s own and thus the car has to build momentum to get uphill




# Building the environment
environment_name = 'MountainCar-v0'
env = gym.make(environment_name)
np.random.seed(0)
env.seed(0)
  
# Extracting the number of possible actions
num_actions = env.action_space.n

Step 3: Building the learning agent

The learning agent will be built using a deep neural network and for the same purpose, we will be using the Sequential class of the Keras module.




agent = Sequential()
agent.add(Flatten(input_shape =(1, ) + env.observation_space.shape))
agent.add(Dense(16))
agent.add(Activation('relu'))
agent.add(Dense(num_actions))
agent.add(Activation('linear'))

Step 4: Finding the Optimal Strategy




# Building the model to find the optimal strategy
strategy = EpsGreedyQPolicy()
memory = SequentialMemory(limit = 10000, window_length = 1)
dqn = DQNAgent(model = agent, nb_actions = num_actions,
               memory = memory, nb_steps_warmup = 10,
target_model_update = 1e-2, policy = strategy)
dqn.compile(Adam(lr = 1e-3), metrics =['mae'])
  
# Visualizing the training 
dqn.fit(env, nb_steps = 5000, visualize = True, verbose = 2)

The agent tries different methods to reach the top and thus gaining knowledge from each episode.

Step 5: Testing the Learning Agent




# Testing the learning agent
dqn.test(env, nb_episodes = 5, visualize = True)

The agent tries to apply it’s knowledge to reach the top.

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course




My Personal Notes arrow_drop_up
Recommended Articles
Page :