391. Graph Execution VS Eager Execution

▮ Execution Methods

Execution methods can become confusing to people who have just started machine learning. So for this post, I’d like to share the difference between graph execution and eager execution.

▮ Graph Execution

Fig.1 – Graph Execution

Graph execution first creates a graph by extracting tensor computations and then starts the calculation. When you run a code, executing one by one is not always the most efficient. There may be other execution procedures that can accelerate the calculation speed. Graph execution takes advantage of the following benefits:

  1. High Speed
  2. Highly Flexible
  3. Highly Efficient
  4. Parallel Execution

However, this method suffers from the following.
1. Difficult to learn
2. Difficult to test
3. Non-intuitive

▮ Eager Execution

Fig.2 – Eager Exection

Unlike graph execution, eager execution will run your code calculating the values of each tensor immediately in the same order as your code, solving all the cons graph execution was facing.

  1. Easy to debug
  2. Natural control flow with Python
  3. Write with natural Python code and data structures.

As you can imagine, eager execution will be slower compared to graph execution. If the model size is large, it would be better to transit to graph execution. Or maybe, you can use eager execution during model architecture development, and enable graph execution for inference.

▮ Speed Comparison

Now that we know the basic difference between the 2 execution methods, let’s actually log the actual inference time.

  1. Imports
import tensorflow as tf
import timeit
  1. Prepare Data and model
#parameters
batch_size = 1
img_height = 224
img_width = 224
channels = 3

#input_data
input_data = tf.random.uniform([batch_size, img_height, img_width, channels])

#load model in eager mode
eager_densenet = tf.keras.applications.DenseNet121(
        input_shape=(img_height, img_width, channels), include_top=True, weights='imagenet')

#convert to graph
graph_densenet = tf.function(eager_densenet)
  1. Log inference speed
print("Inference Count: 1")
print("Eager time:", timeit.timeit(lambda: eager_densenet(input_data), number=1)
print("Graph time:", timeit.timeit(lambda: graph_densenet(input_data), number=1))

print("\nInference Count: 50")
print("Eager time:", timeit.timeit(lambda: eager_densenet(input_data), number=50))
print("Graph time:", timeit.timeit(lambda: graph_densenet(input_data), number=50))

This would output the log below.

Inference Count: 1
Eager time: 0.07804840000005697
Graph time: 0.9398225999999568

Inference Count: 50
Eager time: 4.1049653000000035
Graph time: 0.6856538999999202

As you can see, if you run inference only once, eager execution is faster. This is because graph execution needs time to create a graph first to run inference.
However, if you run inference 50 times, graph execution is about 16% of the time of eager execution. You can see how the calculation being is optimized by initially creating graphs.