-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing code #3
Comments
See my implementation below
|
Hmm not sure I quite understand what you're asking for here ... do you just want some code to test the model? Also, the first image linked is that the results from the model or the test code you pasted? |
Hi, thanks for your reply. Yes, that would be great. The image is the result of the test code. |
I also find that training diverges -- I get similar-looking results (i.e. terrible results) after 100 epochs. In other words, we are saying that we cannot get the model to train properly with this code. @brydenfogelman have you been able to successfully train a model with this code? |
Hello, I just wanted to follow up on this issue. |
Hi! I was able to successfully train the model ... the resulting image in the README was from this model. I can try rerunning the model and seeing if I can replicate the issue you all are having. I may have also introduced a bug in 2fdd396 by switching the LR scheduler to match the paper. I'll test this over the weekend. In the meantime, @greeneggsandyaml you could try reverting the model back to the Exponential LR scheduler and see if that works? |
Hi, thanks for the response! Yes, I will try with the exponential scheduler and report back results here when the experiment is complete.
|
As promised, here is an update. I ran the code from commit For those who are interested, here is my Weights and Biases log: https://wandb.ai/lukemelas2/public-experiments/runs/td2j9zcn?workspace=user- Overall, this is great to see. I'll be doing more investigation into this as well. |
Hello, I'm back with another update. Also, @brydenfogelman, did you manage to run the code again? I'm finding that sometimes I get results that look good: and sometimes I get results that look bad: Have you seen these sorts of "splotchy" results before? Is it just due random initialization? It feels to me like it is too much variation to be caused solely by random initialization. |
@greeneggsandyaml How long did you train for? I think even the original authors found that results can vary on the network and slot initialization. I think this figure from the paper demonstrates this finding. (Looking at this caption again I also realized that I didn't increase the number of slot iteration at test time, increasing this would probably make the results look better) Here's an image of one of my earlier experiments where it did randomly learn to separate the background image. They also trained their model for significantly longer than I trained it here (5 days wall clock time). My best guess is that increasing the number of slot iterations at test time will improve the visualizations. What are your thoughts @greeneggsandyaml? |
Hi, How did you solve the problem? I used the test code from above but get the gray-scale output. I checked the images in wandb log files, which looks acceptable. Is there anything wrong in the testing code above? How should I change it? Thank you, |
Hi, this is a follow-up, I know where the test code above might go wrong. Depending on how you save your checkpoint file, the state_dict loaded might not match the model state_dict. Using You should check the state_dict keys while loading the model. :) |
@ZiwenZhuang mentioned above is that the LR scheduler changed with ended up breaking subsequent runs. I'll try and push the fixes to the repo later today |
Hi, thank you very much for the really nice implementation! I have trained the model for 100 epochs and the evaluation results look nice. I was wondering if there's also testing code available. I implemented my own, but I get results such as the image below.
Thank you very much for your reply.
The text was updated successfully, but these errors were encountered: