-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for Download Link for VideoMAEv2 Pretraining Model Checkpoint #8
Comments
Hi Olga, Yes! The provided code, by default, uses the checkpoint fine-tuned on ssv2, but it should be able to load any VideoMAE-v2 checkpoint. They do have a pre-trained VideoMAE model, which you can find here. Hope this helps! |
Thanks for the swift reply! |
I've downloaded the
I've received the following error:
|
Hi Songwei, I've managed to resolve the loading issue of the pretrained models by modifying the However, I'm now curious about the feature extraction process using the pretrained model, as described in your paper. Specifically, I'd like to know if the feature extraction was performed similarly to the following code snippet: If so, could you please clarify what value was used for the mask parameter in this context? Was it a tensor of ones (unmasking all patches)? Thank you for your time and assistance! Olga |
Updated question: Based on this description, I'm wondering if the feature extraction code for the pretrained model is similar to the following: Could you please confirm if this is the correct interpretation of the feature extraction process described in your paper? Or if there's any discrepancy, can you please provide the correct code snippet for feature extraction using the pretrained VideoMAE model? Thank you! |
Hi Olga, this is what I did before:
It seems that the only difference is that the input range should be [0, 1] for the function |
Thanks for the clarification. Super helpful and will definitely check my input range. 😀 |
@songweige I have a follow-up question regarding preprocessing, related to this issue: issue link. It appears that you didn't normalize the features using the mean Thanks again for all your help. We appreciate it and will definitely acknowledge your help in our project! |
Hi Olga, thank you for your kind words and I think you are correct. I mainly followed this function to extract the features from the VideoMAE models and didn't check their training code before. It looks like they did normalization as part of the augmentation during both training and fine-tuning. It would be good to know from the authors what is the proper way to do preprocessing during the inference! |
Hi,
Can you confirm that the model provided in the code is the VideoMAE v2 model fine-tuned on the SSv2 dataset? Additionally, is a pre-trained (not fine-tuned) VideoMAE model available, and if so, can you provide the link?
Thank you for your help!
The text was updated successfully, but these errors were encountered: