Dynamic vs Static Computational Graphs – PyTorch and TensorFlow

TensorFlow and Pytorch are two of the most popular deep learning libraries recently. Both libraries have developed their respective niches in mainstream deep learning with excellent documentation, tutorials, and, most importantly, an exuberant and supportive community behind them.

Difference between Static Computational Graphs in TensorFlow and Dynamic Computational Graphs in Pytorch

Though both libraries employ a directed acyclic graph(or DAG) for representing their machine learning and deep learning models, there is still a big difference between how they let their data and calculations flow through the graph. The subtle difference between the two libraries is that while Tensorflow(v < 2.0) allows static graph computations, Pytorch allows dynamic graph computations. This article will cover these differences in a visual manner with code examples. The article assumes a working knowledge of computation graphs and a basic understanding of the TensorFlow and Pytorch modules. For a quick refresher of these concepts, the reader is suggested to go through the following articles:

Static Computation graph in Tensorflow

Properties of nodes & edges: The nodes represent the operations that are applied directly on the data flowing in and out through the edges. For the above set of equations, we can keep the following things in mind while implementing it in TensorFlow:

Since the inputs act as the edges of the graph, we can use the tf.Placeholder() object which can take any input of the desired datatype.
For calculating the output ‘c’, we define a simple multiplication operation and start a tensorflow session where we pass in the required input values through the feed_dict attribute in the session.run() method for calculating the outputs and the gradients.

Now let’s implement the above calculations in TensorFlow and observe how the operations occur:

Python3

# Importing tensorflow version 1 

import tensorflow.compat.v1 as tf 
tf.disable_v2_behavior() 

# Initializing placeholder variables of 
# the graph 

a = tf.placeholder(tf.float32) 

b = tf.placeholder(tf.float32) 

# Defining the operation 

c = tf.multiply(a, b) 

# Instantiating a tensorflow session 
with tf.Session() as sess: 

    # Computing the output of the graph by giving 

    # respective input values 

    out = sess.run(, feed_dict={a: [15.0], b: [20.0]})[0][0] 

    # Computing the output gradient of the output with 

    # respect to the input 'a' 

    derivative_out_a = sess.run(tf.gradients(c, a), feed_dict={ 

                                a: [15.0], b: [20.0]})[0][0] 

    # Computing the output gradient of the output with 

    # respect to the input 'b' 

    derivative_out_b = sess.run(tf.gradients(c, b), feed_dict={ 

                                a: [15.0], b: [20.0]})[0][0] 

    # Displaying the outputs 

    print(f'c = {out}') 

    print(f'Derivative of c with respect to a = {derivative_out_a}') 

    print(f'Derivative of c with respect to b = {derivative_out_b}')

Output:

c = 300.0
Derivative of c with respect to a = 20.0
Derivative of c with respect to b = 15.0

As we can see, the output matches correctly with our calculations in the Introduction section, thus indicating successful completion. The static structure is evident from the code, as we can see that once, inside a session, we can not define new operations(or nodes), but we can surely change the input variables using the feed_dict attribute in the sess.run() method.

Advantages:

Since the graph is static, it provides many possibilities of optimizations in structure and resource distribution.
The computations are slightly faster than a dynamic graph because of the fixed structure.

Disadvantages:

Scales poorly to variable dimension inputs. For example, A CNN(Convolutional Neural network) architecture with a static computation graph trained on 28×28 images wouldn’t perform well on images of different sizes like 100×100 without a lot of pre-processing boilerplate code.
Poor debugging. These are very difficult to debug, primarily because the user doesn’t have any access to how the information flow occurs. erg: Suppose a user creates a malformed static graph, the user can’t track the bug directly until the TensorFlow session finds an error while computing backpropagation and forward propagation. This becomes a major issue when the model is enormous as it wastes both the time and computation resources of the users.

Dynamic computation graph in Pytorch

Properties of nodes & edges: The nodes represent the data(in form of tensors) and the edges represent the operations applied to the input data.

For the equations given in the Introduction, we can keep the following things in mind while implementing it in Pytorch:

Since everything in Pytorch is created dynamically, we don’t need any placeholders and can define our inputs and operations on the fly.
After defining the inputs and computing the output ‘c’, we call the backward() method, which calculates the corresponding partial derivatives with respect to the two inputs accessible through the .grad specifier.

Now let’s check out a code example to verify our findings:

Python3

# Importing torch 

import torch 

# Initializing input tensors 

a = torch.tensor(15.0, requires_grad=True) 

b = torch.tensor(20.0, requires_grad=True) 

# Computing the output 

c = a * b 

# Computing the gradients 
c.backward() 

# Collecting the output gradient of the 
# output with respect to the input 'a' 

derivative_out_a = a.grad 

# Collecting the output gradient of the 
# output with respect to the input 'b' 

derivative_out_b = b.grad 

# Displaying the outputs 

print(f'c = {c}') 

print(f'Derivative of c with respect to a = {derivative_out_a}') 

print(f'Derivative of c with respect to b = {derivative_out_b}')

Output:

c = 300.0
Derivative of c with respect to a = 20.0
Derivative of c with respect to b = 15.0

As we can see, the output matches correctly with our calculations in the Introduction section, thus indicating successful completion. The dynamic structure is evident from the code. We can see that all the inputs and outputs can be accessed and changed during the runtime only, which is entirely different from the approach used by Tensorflow.

Advantages:

Scalability to different dimensional inputs: Scales very well for different dimensional inputs as a new pre-processing layer can be dynamically added to the network itself.
Ease in debugging: These are very easy to debug and are one of the reasons why many people are shifting from Tensorflow to Pytorch. As the nodes are created dynamically before any information flows through them, the error becomes very easy to spot as the user is in complete control of the variables used in the training process.

Disadvantages:

Allows very little room for graph optimization because a new graph needs to be created for each training instance/batch.

Conclusion

This article sheds light on the difference between the modeling structure of Tensorflow and Pytorch. The article also lists some advantages and disadvantages of both approaches by going through code examples. The respective organizations behind the development of these libraries keep improving in subsequent iterations, but the reader can now take a more well-informed decision before choosing the best framework for their next project.

Article Tags :

Machine Learning

Python

Python-PyTorch

Python-Tensorflow