331. Basics Of Training Large Models

Two Frameworks

There are mainly two large frameworks for training large-scale deep learning models.

  1. Data Parallelism
    This method is usually taken whenever your model CAN fit completely into your GPU memory, sending different batches of data for each GPU.
  2. Model Parallelism
    This method is usually taken whenever your model CAN’T fit completely into your GPU memory. Each GPU will have different layers of the model, inputting from and outputting to different GPUs.