333. Domain Shift

Domain Shift occurs when the distribution of the training set(Source domain) is different from the test set(Target domain) leading to poor results after deployment. Recent works show that Transformers are more robust than CNNs with respect to these properties.

332. TorchServe

Deploying Your Model TorchServe allows you to expose a WEB API for your Pytorch model that may be accessed directly or via your application. 3 Steps Choose a default handler or author a custom model handler. You will define a…

331. Basics Of Training Large Models

Two Frameworks There are mainly two large frameworks for training large-scale deep learning models. Data Parallelism This method is usually taken whenever your model CAN fit completely into your GPU memory, sending different batches of data for each GPU. Model…

330. The Minimalist Entrepreneur

3 Messages Here are my top 3 key takeaways from the book, “The Minimalist Entrepreneur” by Sahil Lavingia. There are thousands of “creator first, entrepreneur second” MVP(manual valuable process): As you fulfill the first customer cycle, document each part of…

229. Quantization using Pytorch

Quantization Quantization is a technique to change the data type used to compute neural networks for faster inference. After you’ve deployed your model, there is no need to backpropagate(which is sensitive to precision). This means, if a slight decrease in…

228. Pruning Using Pytorch

Pruning State-of-the-art deep learning techniques rely on over-parameterized models which makes it hard when the deploying destination has limited resources. Pruning is used to learn the differences between over-parameterized and under-parameterized networks and sparsify your neural networks. In Pytorch, you…

227. Polynomial Features

Adding Linear Complexity When we want to train a model, we can easily imagine that we are unable to capture patterns of the training data if only using straight lines. Polynomial features is useful when you want to add more…

226. Training Methods for EBMs

Constrastive Method Push down on the energy of training samples while pulling up on the energies of suitably placed contrastive samples. The Disadvantage is that you always need contrastive samples in order to constrain the low-energy region. Regularized Method Push…

225. Latent Variable Energy-Based Model

World Model If you haven’t read my previous blog post about the “world model” please go check it out. Training the world model is a prototypical example of Self-Supervised Learning; Learning the mutual dependencies between its inputs. It is said…