Two Frameworks
There are mainly two large frameworks for training large-scale deep learning models.
- Data Parallelism
This method is usually taken whenever your model CAN fit completely into your GPU memory, sending different batches of data for each GPU. - Model Parallelism
This method is usually taken whenever your model CAN’T fit completely into your GPU memory. Each GPU will have different layers of the model, inputting from and outputting to different GPUs.