You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running distributed code, worker processes do not, by default, inherit the project environment of the primary process. I.e., julia --project -p 4 creates a pool of workers that are started without --project. Same for addprocs(4). The only solution I'm aware of is
The fact that the --project flag is not propagated from the command line is non-intuitive.
The addprocs solution is, in my opinion, both inelegant and non-intuitive.
The errors caused by this are non-intuitive to troubleshoot.
Point 3 is what I think makes this worth posting as an issue. The first two items are an annoyance, but the real problem is that when users encounter this behavior, the error messages appear to contradict obvious facts. If the user is running code that is not registered, workers will report that a module does not exist, despite the primary process having no problem calling functions from it. Worse still, if a package is in the machine's global registry, but the local package version is different, then the workers will run the incorrect version of the code! This can result in a Sisyphean experience of endlessly altering a distributed calculation and having the result remain the same (because the workers do not run your new code, despite it being loaded on the primary process).
Is there some compelling reason that the --project flag should not propagate to the workers by default? If so, would it not be appropriate to at least raise a warning if a worker loads a module that is not identical to that which loaded on the primary?
The text was updated successfully, but these errors were encountered:
When running distributed code, worker processes do not, by default, inherit the project environment of the primary process. I.e.,
julia --project -p 4
creates a pool of workers that are started without--project
. Same foraddprocs(4)
. The only solution I'm aware of is--project
flag is not propagated from the command line is non-intuitive.addprocs
solution is, in my opinion, both inelegant and non-intuitive.Point 3 is what I think makes this worth posting as an issue. The first two items are an annoyance, but the real problem is that when users encounter this behavior, the error messages appear to contradict obvious facts. If the user is running code that is not registered, workers will report that a module does not exist, despite the primary process having no problem calling functions from it. Worse still, if a package is in the machine's global registry, but the local package version is different, then the workers will run the incorrect version of the code! This can result in a Sisyphean experience of endlessly altering a distributed calculation and having the result remain the same (because the workers do not run your new code, despite it being loaded on the primary process).
Is there some compelling reason that the
--project
flag should not propagate to the workers by default? If so, would it not be appropriate to at least raise a warning if a worker loads a module that is not identical to that which loaded on the primary?The text was updated successfully, but these errors were encountered: