-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Tensorboard #65
Comments
Hi @remicres , may be this repo could be useful : https://github.com/RustingSword/tensorboard_logger |
The Tensorboard support (which is not developed yet) would be something to enable the user to follow the computed metrics (accuracy, precision, f-score, loss value, ...) during the training. I don't know if it would be useful to monitor the GPU/CPU usage.
While @vidlb this looks really useful to implement the feature in |
Hi @remicres, Thanks for your reply! You're right; TensorBoard doesn't show much helpful GPU info out of the box. It was something I came over when I googled how to improve GPU utilization: https://www.tensorflow.org/guide/gpu_performance_analysis. I was thinking the TensorBoard could be helpful. And yes, as you mention, it is with the use of the TensorBoard Profiler plugin. This one is not working with the current OTBTF/SR4RS because of the different TF versions, right?
I would also like to add the I still haven't dived into the TF python yet, but I will likely do so later. For now, I look forward to the OTBTF-3.0! Thanks a lot, |
Hi @FerdinandKlingenberg, may be you should try with TF 2.4.3 and CUDA 11.0.3 (with TF_CUDA_COMPUTE_CAPABILITIES=7.0) ? But it's also possible that a low usage is due to your shared GPU setup, thus it would require some additional configuration on the server side... Good luck ! |
Hi @vidlb, A bit late follow-up. I tried your suggestion earlier, unfortunately with no change. With some more testing with other models, I think it is instead related to the patch size/grid size I am using. @remicres, my part of the problem is solved. I will let you decide if you want to close this issue, given the headline is still valid. Thank you both a lot, |
Dear Rémi,
First of all, thank you for this great module!
I notice I have very low GPU utilization during my training, and I would like to monitor it using TensorBoard to find the bottleneck. I see issue #26 is closed, but I hope it can be done like you did for the SR4RS.
Thanks,
Ferdinand Klingenberg
The text was updated successfully, but these errors were encountered: