When you want to load your data for training, the data preparation pipeline would be the following.
- Randomly shuffle your data
- Turn them into batches
- Iterate
Although you can manually do this, when the data size becomes large this can be time consuming.
That is where DataLoader in Pytorch comes in handy. It let’s you skip the batching phase and go directly to the iteration phase.
An implementation might look something like this.
DataLoader(dataset, batch_size=1, shuffle=False, sampler=None,
batch_sampler=None, num_workers=0, collate_fn=None,
pin_memory=False, drop_last=False, timeout=0,
worker_init_fn=None, *, prefetch_factor=2,
persistent_workers=False)