You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all, I've been to reproduce your work in your alanine_dipeptide_basics notebook by running it as a .py from terminal and I had removed the n_workers=1 argument from the energy model to speed up the KLL training section but I've noticed that the program never finishes in this format. I get the same sorts of output as from the notebook - all the graphs appear but the process must be cancelled manually (keyboard interrupt) to get it to finish. From the traceback I get, it looks like the issue is caused by worker processes still existing, traceback points to: bgflow/distribution/energy/openmm.py", line 380, in run for task in iter(self._task_queue.get, None):
and then to: python3.7/multiprocessing/queues.py
I've been trying to determine if this behaviour alters the performance of the generator but I'm unsure how to solve the problem, I've not had much experience with multiprocessing. Do you have any ideas about what could be causing the issue?
I'm running this using CUDA version 11.0, driver version 450.51.06 on a GeForce GTX 980 Ti,
cudatoolkit 10.2.89
Python 3.7,
pytorch 1.9.1
Hi, sorry for the late reply. Just saw that now. This is a known issue that we have yet not fixed (but plan to). It seems to be a race-condition / dead-lock issue. For now, it is best, to keep n_workers=1.
Any updates on this? When doing energy based training, the energy evaluation is the slowest part, so it would be really helpful if we could use more workers
Hi all, I've been to reproduce your work in your alanine_dipeptide_basics notebook by running it as a .py from terminal and I had removed the n_workers=1 argument from the energy model to speed up the KLL training section but I've noticed that the program never finishes in this format. I get the same sorts of output as from the notebook - all the graphs appear but the process must be cancelled manually (keyboard interrupt) to get it to finish. From the traceback I get, it looks like the issue is caused by worker processes still existing, traceback points to:
bgflow/distribution/energy/openmm.py", line 380, in run for task in iter(self._task_queue.get, None)
:and then to:
python3.7/multiprocessing/queues.py
I've been trying to determine if this behaviour alters the performance of the generator but I'm unsure how to solve the problem, I've not had much experience with multiprocessing. Do you have any ideas about what could be causing the issue?
I'm running this using CUDA version 11.0, driver version 450.51.06 on a GeForce GTX 980 Ti,
cudatoolkit 10.2.89
Python 3.7,
pytorch 1.9.1
@JenkeScheen @jmichel80
The text was updated successfully, but these errors were encountered: