401. Optimizing GPU Utilization

▮ Low GPU Utilization

Ideally, we want to use our GPU machines during training/inference fully. However, if you are not considering GPU utilization when creating training scripts for your deep learning model, the odds are that the GPU utilization rate will be pretty low; lower than 30%.

GPU Metrics

Today I’d like to share where the bottlenecks might be that is causing that, and how to adjust them to be able to reach close to 90% GPU utilization rate.

▮ Monitor GPU Metrics

Before moving on, I’d like to first share how to check the GPU metrics.

Here is one way you can monitor GPU metrics to see whether the training process has a high GPU utilization rate(If you are using Linux OS).

nvidia-smi \
--query-gpu=timestamp,name,utilization.gpu,utilization.memory,memory.used,memory.free,memory.used\
--format=csv \
-l 1

The output would be like the following.

timestamp, name, utilization.gpu [%], utilization.memory [%], memory.used [MiB], memory.free [MiB], memory.used [MiB]
2023/03/14 07:52:06.453, NVIDIA GeForce RTX 3080 Ti, 82 %, 57 %, 5959 MiB, 6148 MiB, 5959 MiB
2023/03/14 07:52:07.454, NVIDIA GeForce RTX 3080 Ti, 81 %, 57 %, 5959 MiB, 6148 MiB, 5959 MiB
2023/03/14 07:52:08.455, NVIDIA GeForce RTX 3080 Ti, 82 %, 56 %, 5959 MiB, 6148 MiB, 5959 MiB
2023/03/14 07:52:09.456, NVIDIA GeForce RTX 3080 Ti, 78 %, 52 %, 5959 MiB, 6148 MiB, 5959 MiB

▮ Utilization Methods

The basic workflow when training a model would take the following step.

  1. Load Batch Data
  2. Run Inference
  3. Calculate Metrics For Model Evaluation
Lazy Load Workflow

Usually, steps 1 and 3 are done on the CPU and this can easily become a bottleneck when training a model. This is because no matter how fast the inference is being made on the GPU if the steps on the CPU are slow, the GPU will have to wait for the next data to come. During that “waiting” the machine is not going to be able to utilize GPU.

Considering the above, here are some ways to improve GPU utilization.

Method 1. Preload Data

As in the workflow in the previous section, some cases use “lazy loading”(read image whenever you retrieve a batch of data) during training. This means you’ll have to load data using the CPU every time the process reaches out for the next input batch data.

Instead, you can preload the data by caching all training data before training, so that when the training process starts, it won’t have to load the input data whenever it is called.

If you are interested, go check out my previous post on implementing preloading with PyTorch.

Method 2. Remove Unnecessary Metric Calculation

Let’s say our training process calculates the F1 score and IOU after model prediction to evaluate the model.

The calculation for each metric is done with the CPU so this also can become a bottleneck. If you don’t need ALL of the metrics, removing some of them might be able to help improve GPU utilization.

Method 3. Batch Size

Changing the batch size can also help with increasing GPU utilization. Increasing batch size means more data being inputted into the model leading to more parallel computation.

However, how much you can increase your batch size depends on your hardware resources. If you increase your batch size too much the training process may be killed due to being out of memory.

Method 4. Memory Allocation Methods

How you store each variable within the training script can also affect GPU utilization. How you do this depends on the framework you are using, so here are some links that might be able to help you.