-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable torch.complile to work with classification #3758
Enable torch.complile to work with classification #3758
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work. LGTM, but how about adding an integration test for this?
And it seems that there is no gain for speed referring experiment result. Do you know why?
This will require more experimentation. Currently, it uses a lot of time at 0 epoch to do the model compile. In my opinion, this time is relatively high for the current use-case of OTX Classification. Also, custom models may not be optimized in most cases. I'll have to look into this further and experiment with improving it later. For now, this PR is about getting compile just to work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Related to following results, as you said, it seems to need more experiments because there is no gain for iter_time.
I know compiling time is required and it can be bottleneck for e2e time, but official results show improved training performance and I think very short iter_time is required to reduce e2e time as the official results said.
And I searched some custom experimental results and usually torch.compile
showed enhanced results about large batch size or large input resolution.
As far as I know, we use (maybe?) large batch size (=64) but small input resolution (=224) for classification.
I'd like to suggest extensive experiments to find which case affects critical damage to training performance.
Summary
How to test
otx train ~~~ --model.torch_compile True
Checklist
License
Feel free to contact the maintainers if that's a concern.