391. Graph Execution VS Eager Execution

▮ Execution Methods

Execution methods can become confusing to people who have just started machine learning. So for this post, I’d like to share the difference between graph execution and eager execution.

▮ Graph Execution

Graph execution first creates a graph by extracting tensor computations and then starts the calculation. When you run a code, executing one by one is not always the most efficient. There may be other execution procedures that can accelerate the calculation speed. Graph execution takes advantage of the following benefits:

High Speed
Highly Flexible
Highly Efficient
Parallel Execution

However, this method suffers from the following.
1. Difficult to learn
2. Difficult to test
3. Non-intuitive

▮ Eager Execution

Unlike graph execution, eager execution will run your code calculating the values of each tensor immediately in the same order as your code, solving all the cons graph execution was facing.

Easy to debug
Natural control flow with Python
Write with natural Python code and data structures.

As you can imagine, eager execution will be slower compared to graph execution. If the model size is large, it would be better to transit to graph execution. Or maybe, you can use eager execution during model architecture development, and enable graph execution for inference.

▮ Speed Comparison

Now that we know the basic difference between the 2 execution methods, let’s actually log the actual inference time.

Imports

import tensorflow as tf
import timeit

Prepare Data and model

#parameters
batch_size = 1
img_height = 224
img_width = 224
channels = 3

#input_data
input_data = tf.random.uniform([batch_size, img_height, img_width, channels])

#load model in eager mode
eager_densenet = tf.keras.applications.DenseNet121(
        input_shape=(img_height, img_width, channels), include_top=True, weights='imagenet')

#convert to graph
graph_densenet = tf.function(eager_densenet)

Log inference speed

print("Inference Count: 1")
print("Eager time:", timeit.timeit(lambda: eager_densenet(input_data), number=1)
print("Graph time:", timeit.timeit(lambda: graph_densenet(input_data), number=1))

print("\nInference Count: 50")
print("Eager time:", timeit.timeit(lambda: eager_densenet(input_data), number=50))
print("Graph time:", timeit.timeit(lambda: graph_densenet(input_data), number=50))

This would output the log below.

Inference Count: 1
Eager time: 0.07804840000005697
Graph time: 0.9398225999999568

Inference Count: 50
Eager time: 4.1049653000000035
Graph time: 0.6856538999999202

As you can see, if you run inference only once, eager execution is faster. This is because graph execution needs time to create a graph first to run inference.
However, if you run inference 50 times, graph execution is about 16% of the time of eager execution. You can see how the calculation being is optimized by initially creating graphs.

▮ Execution Methods

▮ Graph Execution

▮ Eager Execution

▮ Speed Comparison

Related Posts

414. Graph Neural Network Basics

413. Tips For Developing Vector Databases

412. Augmenting LLMs with Private Data