Prerequisites: Deep Q-Learning
This article will demonstrate how to do reinforcement learning on a larger environment than previously demonstrated. We will be implementing Deep Q-Learning technique using Tensorflow.
Note: A graphics rendering library is required for the following demonstration. For Windows operating system, PyOpenGl is suggested while for Ubuntu operating system, OpenGl is recommended.
Step 1: Importing the required libraries
import numpy as np import gym from keras.models import Sequential from keras.layers import Dense, Activation, Flatten from keras.optimizers import Adam from rl.agents.dqn import DQNAgent from rl.policy import EpsGreedyQPolicy from rl.memory import SequentialMemory |
Step 2: Building the Environment
Note: A preloaded environment will be used from OpenAI’s gym module which contains many different environments for different purposes. The list of environments can be viewed from their website.
Here, the ‘MountainCar-v0’ environment will be used. In this, a car(the agent) is stuck between two mountains and has to drive uphill on one of them. The car’s engine is not strong enough to drive up on it’s own and thus the car has to build momentum to get uphill
# Building the environment environment_name = 'MountainCar-v0' env = gym.make(environment_name) np.random.seed( 0 ) env.seed( 0 ) # Extracting the number of possible actions num_actions = env.action_space.n |
Step 3: Building the learning agent
The learning agent will be built using a deep neural network and for the same purpose, we will be using the Sequential class of the Keras module.
agent = Sequential() agent.add(Flatten(input_shape = ( 1 , ) + env.observation_space.shape)) agent.add(Dense( 16 )) agent.add(Activation( 'relu' )) agent.add(Dense(num_actions)) agent.add(Activation( 'linear' )) |
Step 4: Finding the Optimal Strategy
# Building the model to find the optimal strategy strategy = EpsGreedyQPolicy() memory = SequentialMemory(limit = 10000 , window_length = 1 ) dqn = DQNAgent(model = agent, nb_actions = num_actions, memory = memory, nb_steps_warmup = 10 , target_model_update = 1e - 2 , policy = strategy) dqn. compile (Adam(lr = 1e - 3 ), metrics = [ 'mae' ]) # Visualizing the training dqn.fit(env, nb_steps = 5000 , visualize = True , verbose = 2 ) |
The agent tries different methods to reach the top and thus gaining knowledge from each episode.
Step 5: Testing the Learning Agent
# Testing the learning agent dqn.test(env, nb_episodes = 5 , visualize = True ) |
The agent tries to apply it’s knowledge to reach the top.