▮ Stack Structure
For this post, I’d like to share an intuitive overview of the basic steps of how your code communicates with the GPU when implementing deep learning models.
Starting from the GPU side, the general process takes the following steps.
- GPU
- CUDA
- cuDNN
- C++ Backend
- Deep Learning Framework
▮ CUDA: Compute Unified Device Architecture
CUDA performs computations with the GPU making sure that the compiler is directed to the part that matched the GPU core.
In other words, CUDA is used as a way to talk to the GPU, being the closest layer to the GPU hardware.
▮ cuDNN: CUDA Deep Neural Network
Deep learning requires A LOT of computation. This means effectively communicating with the GPU and efficiently running computation becomes crucial. You CAN use CUDA to communicate with the GPU, but CUDA itself does not know HOW. That is where cuDNN becomes helpful. CuDNN provides a highly tuned implementation for standard deep learning operations such as forward/backward convolution, pooling, normalization, and activation layers.
Therefore, you can think of cuDNN as a library for deep learning using CUDA.
▮ C++ Backend
Most deep learning frameworks(including Pytorch, TensorFlow, ONNX, etc) uses C++ as their backend due to its incredibly fast processing speed. CuDNN can be called directly through this c++ layer and can also be called after another layer such as TensorRT for further graph optimization.
▮ Deep Learning Framework
This will be the “front end” of the deep learning software. In most cases, You can use deep learning frameworks such as PyTorch and Tensorflow with Python as the high-level interface. Some cases have a C++ front-end such as LibTorch, a PyTorch C++ API.