-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inquiry about the Fig-6 #4
Comments
Thanks for your interest. For Figure 6 we don't add noise to the extracted representation -- the SSL representation extracted from the pre-trained encoder is directly fed into the pixel generator to generate the images. In the "GT Representation Reconstruction" section of this Jupyter notebook, we provide code for this functionality. If you are interested in how to add random noise during training and unconditional generation, you can check the DDPM and DDIM code here. |
Thanks for your reply. |
I'm sorry, I don't quite understand what you mean. Did you input GT image into MAGE, and then SSL(GT image) is used as the condition of MAGE, and then do it by changing random seeds constantly? |
@mapengsen Thanks for your interest. For Figure 6, we extract representation from GT image and generate image pixels conditioned on this representation. You can refer to the provided visualization notebook for more implementation details. For Figure 7, please refer to this issue #20. |
Thank you very much! I've understood. |
Thank you for your exceptional work! Could you please clarify if the Representation Reconstruction function depicted in Figure 6 also applies to images that are not part of the ImageNet dataset? |
Thanks for your interest! The provided Moco v3 and MAGE checkpoints are both trained on ImageNet. Therefore, it should give reasonable results on natural images that are not contained in ImageNet. However, if the image is too far away from ImageNet, the reconstruction performance can be bad. |
Thanks for your reply! |
Does anyone know how to generate the visual results in Figure 6? I see that they extract SSL representations from image samples, and the authors don't seem to describe how they combine these features with randomly generated noise in the RDM.
The text was updated successfully, but these errors were encountered: