-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training GPU+CPU Utilization Stops #20
Comments
I believe it might be related to another issue I was having with evaluating detections on the hicodet repo. When I run |
Hi, @vjsrinivas Thanks for reporting the problem. I did encounter something similar in a new environment different from what the repo was developed in. You are correct, the problem seems to be related to The code was developed under Cheers, |
Thanks for the quick response @fredzzhang |
The OS could be the reason. I had the same issue when I was running on Ubuntu 20.04 LTS. See if disabling the multiprocessing works. Start with the
By specifying the
Let me know if this solves the issue. Cheers, |
Thank you for your solution. It seems to have worked! I was also working on a quick solution for this, and got the multi-threading to work by modifying ...
"""
The Australian National University
Australian Centre for Robotic Vision
"""
import multiprocessing
multiprocessing.set_start_method("spawn", force=True)
import time
... and replacing Although, I have no idea if this fix will cause problems in other distros. I used this as reference. |
Thank you very much for the reference! I'll update Fred |
I'm trying the training procedure as laid out by the README file, and ran
CUDA_VISIBLE_DEVICES=0 python main.py &>log &
.It seems to run fine up until the near end of the first epoch, where the GPU and CPU utilization completely stops. This drop in utilization never recovers and makes it so that the first epoch never actually finishes.
Here is the output from my log:
My system specs as well:
OS: Pop!_OS 20.04 LTS x86_64
CPU: AMD Ryzen 7 2700X (16) @ 3.700G
GPU: NVIDIA GeForce RTX 2070 SUPER
Memory: 16017MiB
CUDA: 10.2
The text was updated successfully, but these errors were encountered: