Implementing Deep Q-Learning using Tensorflow

Prerequisites: Deep Q-Learning

This article will demonstrate how to do reinforcement learning on a larger environment than previously demonstrated. We will be implementing Deep Q-Learning technique using Tensorflow.

Note: A graphics rendering library is required for the following demonstration. For Windows operating system, PyOpenGl is suggested while for Ubuntu operating system, OpenGl is recommended.



Step 1: Importing the required libraries

filter_none

edit
close

play_arrow

link
brightness_4
code

import numpy as np
import gym
  
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam
  
from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory

chevron_right


Step 2: Building the Environment

Note: A preloaded environment will be used from OpenAI’s gym module which contains many different environments for different purposes. The list of environments can be viewed from their website.

Here, the ‘MountainCar-v0’ environment will be used. In this, a car(the agent) is stuck between two mountains and has to drive uphill on one of them. The car’s engine is not strong enough to drive up on it’s own and thus the car has to build momentum to get uphill

filter_none

edit
close

play_arrow

link
brightness_4
code

# Building the environment
environment_name = 'MountainCar-v0'
env = gym.make(environment_name)
np.random.seed(0)
env.seed(0)
  
# Extracting the number of possible actions
num_actions = env.action_space.n

chevron_right


Step 3: Building the learning agent

The learning agent will be built using a deep neural network and for the same purpose, we will be using the Sequential class of the Keras module.

filter_none

edit
close

play_arrow

link
brightness_4
code

agent = Sequential()
agent.add(Flatten(input_shape =(1, ) + env.observation_space.shape))
agent.add(Dense(16))
agent.add(Activation('relu'))
agent.add(Dense(num_actions))
agent.add(Activation('linear'))

chevron_right


Step 4: Finding the Optimal Strategy

filter_none

edit
close

play_arrow

link
brightness_4
code

# Building the model to find the optimal strategy
strategy = EpsGreedyQPolicy()
memory = SequentialMemory(limit = 10000, window_length = 1)
dqn = DQNAgent(model = agent, nb_actions = num_actions,
               memory = memory, nb_steps_warmup = 10,
target_model_update = 1e-2, policy = strategy)
dqn.compile(Adam(lr = 1e-3), metrics =['mae'])
  
# Visualizing the training 
dqn.fit(env, nb_steps = 5000, visualize = True, verbose = 2)

chevron_right


The agent tries different methods to reach the top and thus gaining knowledge from each episode.

Step 5: Testing the Learning Agent

filter_none

edit
close

play_arrow

link
brightness_4
code

# Testing the learning agent
dqn.test(env, nb_episodes = 5, visualize = True)

chevron_right


The agent tries to apply it’s knowledge to reach the top.



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.