-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sik-Ho Tang | Review -- MoCo v3: An Empirical Study of Training Self-Supervised Vision Transformers. #140
Comments
OverviewInstability Study of ViT for Self-Supervised Learning. An Empirical Study of Training Self-Supervised Vision Transformers MoCo v3 is an incremental improvement of MoCo v1/MoCo v2, studying the instability issue when ViT is used for self-supervised learning. |
MoCo v3 Using ResNet (Before Using ViT)
|
Stability Study for Basic Factors When Using ViTIt is straightforward to replace a ResNet backbone with a ViT backbone. But in practice, a main challenge is the instability of training. Batch Size
A larger batch is also beneficial for accuraacy. A batch of 1k and 2k produces reasonably smooth curves, with 71.5% and 72.6% linear probing accuracy.
Learning Rate
When
Optimizer
|
Tricks for Improving StabilityRandom Patch Projection
It is found that a sudden change of gradients (a "spike") causes a "dip" in the training curve.
Better with residual connection when patch projection???
The instability happens earlier in the shallower layers.
Random Patch Projection on SimCLR and BYOL
|
Experimental ResultsModels
Training Time
Self-Supervised Learning Frameworks
|
Sik-Ho Tang. Review — MoCo v3: An Empirical Study of Training Self-Supervised Vision Transformers.
The text was updated successfully, but these errors were encountered: