Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel pool starting very late (not starting at all?) #113

Open
ggrrll opened this issue Jun 18, 2019 · 13 comments
Open

Parallel pool starting very late (not starting at all?) #113

ggrrll opened this issue Jun 18, 2019 · 13 comments
Assignees
Labels

Comments

@ggrrll
Copy link
Contributor

ggrrll commented Jun 18, 2019

hi,

recently it happened sometimes that with multi_runs nothing happen within the first 20 sec, in the sense that I don't see the cpus workload ramping (one 1 at 100% -- usually I check via htop)

Is this a known behaviour ? I am not sure if they don't run at all or if it takes simply more time to start the pool...

However, when I interrupt the kernel of my jupyter notebook, I get this

Screenshot 2019-06-18 at 16 30 15

(again, at the time I kill the kernel, only 1 cpu is at 100%)

Any idea ? thanks

@ggrrll ggrrll changed the title Parallel pool not starting at all Parallel pool not starting (?) Jun 18, 2019
@ggrrll ggrrll changed the title Parallel pool not starting (?) Parallel pool starting very late (not starting at all?) Jun 18, 2019
@ggrrll
Copy link
Contributor Author

ggrrll commented Jun 18, 2019

I am running 2 N executions, with N = n_processes, and between the 2 'waves ' of cpus loads there was more waiting time (wt) then expected ... (like, I have seen other times wt ~ few sec. while now was wt ~ 1 min) ... don't know what it might be due to...

@GiulioRossetti
Copy link
Owner

It seems something related to the multiprocessing python library... I'll try to understand if something can be done to overcome this issue from our side.

@ggrrll
Copy link
Contributor Author

ggrrll commented Jun 18, 2019

Thanks -- feel free to close it, as it's not probably really an issue...
(I am wondering if this delay depends for instance on the n_processes involved, or similar ...)

@ggrrll
Copy link
Contributor Author

ggrrll commented Jun 19, 2019

ok, I had another run and I can see now that after 8 min I still have only 1 cpu...so, clearly there is an issue here...

I can see from htop that there are an exact N process started, but they are all at 0 %

(the processes look something like: python -m ipykernel_launcher -f ... .json )

@GiulioRossetti
Copy link
Owner

I assume you are using a jupyter notebook to run the experiments: have you tried to run the the same code directly to the interpreter?
Of course, if there is an issue with multiprocessing this will not address it, but it will help to cut down one of the possible players from the equation.

@ggrrll
Copy link
Contributor Author

ggrrll commented Jun 19, 2019

Yes, correct. The weird think is that I always run on notebook and sometimes it works, sometimes it doesn't (the waiting time, like in that case it's just too long, that I decide to kill the kernel).

to the interpreter?

from a script *.py ?

@GiulioRossetti
Copy link
Owner

Exactly, just try to run the classic:

python your_scipt.py

@ggrrll
Copy link
Contributor Author

ggrrll commented Jun 20, 2019

well, I did it in the past, and it worked ( there were 2 nested parallelization loops)...I will check again in case...

@ggrrll
Copy link
Contributor Author

ggrrll commented Jun 20, 2019

I was wondering... are you shutting down the mp pool after it's done with the multi_runs?
I see you are using from contextlib import closing -- don't know it... I am just wondering if that's enough...

@GiulioRossetti
Copy link
Owner

Actually, it should be.

My feeling is that the issue is related to the maxtasksperchild parameter value.
I set it to 10 to avoid continuously creating new threads and reassinging tasks to existing ones. However, it could have side effects: I don't remember having tested for alternative setups.

@ggrrll
Copy link
Contributor Author

ggrrll commented Jun 21, 2019

not sure...as nothing starts...in htop I see only 1 cpu running, and all others n_process on but 'silent'... no idea what's going on....

@ggrrll
Copy link
Contributor Author

ggrrll commented Oct 29, 2019

I am running some simulations again and I noticed that there are large period of time ( up to few min) in which there is no parallelization happening...
is it maybe waiting for all cpus from 'the same batch' to finish ? is there a way to skip this and run the processes 'asyncronously' ?

@GiulioRossetti
Copy link
Owner

Hi, unfortunately, I haven't had the time to check this issue lately.

I'm not sure how to force the batch-parallel execution to perform an async allocation of the processes... honestly, I'm not even sure that this can be done. If you have time to look at it this will be for sure a nice improvement for the library; otherwise, I'll try to tackle it as soon as I can (but it could take a while).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants