221. SimSiam - My Blog

Abstract

Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. However, previous works include the following that causes all outputs to “collapse” to a constant.

Negative Sampling
Large Batches
Momentum Encoders

This paper proposes a solution by removing these elements.

Architecture

The architecture consists of 2 “views”. Each “view” has an encoder that shares weights and only the first “view” has a predictor after the encoder which transforms the input to match the other “view”. The model tries to minimize the negative cosine similarity between the two outputs, and applies stop-gradient only to the second “view”.

Reference:Exploring Simple Siamese Representation Learning

Abstract

Architecture

Related Posts

398. Findings Report

359. Future Frame Prediction For Anomaly Detection

358. Different Settings of Transfer Learning