You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently read your fascinating paper titled "Self-conditioned Image Generation via Generating Representations" and have a question regarding the training and inference processes of the RCG model, particularly about the image masking strategy.
In the paper, it's mentioned that during training, the pixel generator is trained with partially masked images. However, during inference, images are fully masked. I am curious about how this difference in masking (partial during training and full during inference) affects the model's performance and its ability to reconstruct images.
Your insights into this aspect of the RCG model would be greatly appreciated, as it would deepen my understanding of your novel approach.
The text was updated successfully, but these errors were encountered:
Thanks for your interest! During training, the masking ratio is randomly selected from 50%-100%, so it covers both the fully-masked scenario and the partially-masked scenario. We use a multi-step parallel decoding strategy during inference, which means that the image is generated starting from a 100% masked image, and is gradually filled in until all masked tokens are generated. You might refer to the MaskGIT and MAGE paper for more detailed illustrations of the parallel decoding strategy.
I recently read your fascinating paper titled "Self-conditioned Image Generation via Generating Representations" and have a question regarding the training and inference processes of the RCG model, particularly about the image masking strategy.
In the paper, it's mentioned that during training, the pixel generator is trained with partially masked images. However, during inference, images are fully masked. I am curious about how this difference in masking (partial during training and full during inference) affects the model's performance and its ability to reconstruct images.
Your insights into this aspect of the RCG model would be greatly appreciated, as it would deepen my understanding of your novel approach.
The text was updated successfully, but these errors were encountered: