You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I really like the idea of your paper to use a GAN discriminator as feature extractor for perceptual losses to improve MAE pre-training.
I tried to play around a bit with the idea myself but the codebase is unfortunately incomplete and the implementation details in the paper are lacking to say the least.
Are there any plans to update the repo and/or publish the trained models?
In my opinion, your strong claims in the paper require at least some form of reproducability. Claiming insanely good results without anything to back it up is quite questionable. To clarify: your paper claims a ImageNet-1K finetuning accuracy of 88.1% with a ViT-L/16, which would be 0.3% better than the ViT-H/14_448 trained from the original MAE paper.
The text was updated successfully, but these errors were encountered:
Hi,
I really like the idea of your paper to use a GAN discriminator as feature extractor for perceptual losses to improve MAE pre-training.
I tried to play around a bit with the idea myself but the codebase is unfortunately incomplete and the implementation details in the paper are lacking to say the least.
Are there any plans to update the repo and/or publish the trained models?
In my opinion, your strong claims in the paper require at least some form of reproducability. Claiming insanely good results without anything to back it up is quite questionable. To clarify: your paper claims a ImageNet-1K finetuning accuracy of 88.1% with a ViT-L/16, which would be 0.3% better than the ViT-H/14_448 trained from the original MAE paper.
The text was updated successfully, but these errors were encountered: