Question on Masking ratio #47

sukun1045 · 2023-12-15T22:50:41Z

Hi, thanks for sharing the Pytorch implementation! I am curious about how you select the stats for varied masking ratios. In the paper, you mentioned 'a truncated Gaussian distribution centered at 0.55, left truncated by 0.5, and right truncated by 1.' What is the motivation for using such a distribution? Why not use the cosine schedule as done in MaskGIT? Thank you!

LTH14 · 2023-12-16T22:04:49Z

Thanks for your interest! The masking ratio is left truncated by 0.5 so that we can always drop 50% of the input tokens in the ViT encoder, which largely saves computation (a similar idea as MAE). In Table 5 of the paper, we show ablations about the center and std of the Gaussian distribution. We also tried the cosine masking ratio scheduling similar to MaskGIT, and the performance is slightly worse.

sukun1045 · 2023-12-19T09:28:42Z

Thanks for your reply! I am also curious about the training convergence and finding the best model between variants. I think the best eval loss could be varied according to different masking strategies. How can I find the best masking strategy when I conduct these experiments? For example, if I choose the truncnorm with mu=0.55 and std=0.25, should I run the training until it converges, check the FID score, and then run another experiments?

LTH14 · 2023-12-19T09:45:06Z

Our evaluation protocol is based on both FID and linear probing accuracy -- once we train a model with certain hyper-parameters, we evaluate it on ImageNet and pick the best hyper-parameters based on FID and linear probing.

sukun1045 · 2023-12-30T19:32:19Z

Thanks again for your reply. Regarding linear probing, have you tried using CLS token output instead of average pooling the rest of the encoder output features? I saw that in your code, but I wondered how it might affect the performance.

LTH14 · 2024-01-02T16:00:41Z

We tried using CLS token. However, the performance is not very stable -- normally it achieves similar performance as average pooled features, but occasionally it gets very poor accuracy (~10%). Therefore we choose the global average pooling feature for stability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on Masking ratio #47

Question on Masking ratio #47

sukun1045 commented Dec 15, 2023

LTH14 commented Dec 16, 2023

sukun1045 commented Dec 19, 2023

LTH14 commented Dec 19, 2023

sukun1045 commented Dec 30, 2023

LTH14 commented Jan 2, 2024 •

edited

Loading

Question on Masking ratio #47

Question on Masking ratio #47

Comments

sukun1045 commented Dec 15, 2023

LTH14 commented Dec 16, 2023

sukun1045 commented Dec 19, 2023

LTH14 commented Dec 19, 2023

sukun1045 commented Dec 30, 2023

LTH14 commented Jan 2, 2024 • edited Loading

LTH14 commented Jan 2, 2024 •

edited

Loading