Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wreck: better integration of global lwj.environ environment for wreck jobs #1405

Merged
merged 4 commits into from
Apr 1, 2018

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Mar 31, 2018

I was thinking that perhaps reducing content in per-job environment lwj.x.y.ID.environ might reduce the size of content store with many jobs. Turns out in most cases of submitting a long series of jobs, the environment in the KVS is identical and squashed by the content-store, but I thought I'd push up the work I did in support of better integration of the "global" job environment lwj.environ, which might be useful if each job has a slightly different environment, or just for better control of the environment.

This PR adds support to wreckrun/submit to first check lwj.environ table by default and only submit any different variables as the per job environment. A set of utility commands flux wreck setenv,getenv,unsetenv are provided to set and manipulate this global environment. A new option -S, --skip-env is also added to flux-wreckrun and submit to skip exporting the environment completely, so that the global environment is always used.

To export the current environment to all future jobs, use flux wreck setenv all. If you want to run all jobs using this environment (FLUX_URI and other Flux env vars are set explicitly per-job by wrexecd), then pass -S, --skip-env in flux-submit or flux-wreckrun and avoid the extra KVS fetch of the global lwj.environ, otherwise any environment differences are exported to the job in the job-specific env table.

E.g.:

(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck getenv
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck setenv FOO=bar
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck getenv
FOO=bar
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck unsetenv FOO
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck getenv
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck setenv all    
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck getenv | wc -l
84
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreckrun -n4 /bin/true
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux kvs get lwj.0.0.1.environ
{ }
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ FOO=bar flux wreckrun -n4 /bin/true
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux kvs get lwj.0.0.2.environ
{ "FOO": "bar" }
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ FOO=bar flux wreckrun -S -n4 /bin/true
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux kvs get lwj.0.0.3.environ
{ }
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck getenv LANG
LANG=C
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck setenv LANG=en_US
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck getenv LANG
LANG=en_US
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreckrun -S -n4 printenv LANG
en_US
en_US
en_US
en_US

@dongahn
Copy link
Member

dongahn commented Mar 31, 2018

Excellent! Thanks @grondo. I am sure this in combination with R_lite instead of rdl emitted from the scheduler will make the memory size and growth day and night difference. At some point, it seems measuring the actual differences would provide a good data point towards the final design!

@coveralls
Copy link

coveralls commented Mar 31, 2018

Coverage Status

Coverage increased (+0.05%) to 78.928% when pulling 42b982e on grondo:wreck-setenv into 1e7d621 on flux-framework:master.

@garlick
Copy link
Member

garlick commented Mar 31, 2018

This is great! Will merge once travis finishes.

@grondo
Copy link
Contributor Author

grondo commented Mar 31, 2018

This may not make as much difference as expected, unless for some reason each flux submit has a different environment. I tried out the soak test and discovered that the environment for all jobs was squashed by content-store (so that was working as designed!)

In addition to .rdl key, I'm now thinking that another source of content growth might be the job "lwj" directories themselves, given how they are built up piecemeal, so each job directory may have several versions as new directory entries are created and committed. I'm wondering if another short-term hack would be to precreate all know keys and directories in the lwj.x.y.z dir at job.create time by the job module...

@grondo
Copy link
Contributor Author

grondo commented Mar 31, 2018

Had to force-push a fix for the new tests under --chain-lint. Hopefully should pass Travis now.

@garlick
Copy link
Member

garlick commented Mar 31, 2018

I'm wondering if another short-term hack would be to precreate all know keys and directories in the lwj.x.y.z dir at job.create time by the job module...

Do you mean have the commands stop writing to the KVS, and instead send their info to the job module to be put in (like request parameters, environment, etc?)

@grondo
Copy link
Contributor Author

grondo commented Mar 31, 2018

Do you mean have the commands stop writing to the KVS, and instead send their info to the job module to be put in (like request parameters, environment, etc?)

That's too big a change for this implementation I think.

I think actually I retract my previous statement. I was at first thinking that if we precreate all directory entries for the job (e.g. rank 0-n dirs, exit_status, etc), then the directory object itself would not have as many revisions, but I forgot for a second how the kvs works, sorry...

@garlick
Copy link
Member

garlick commented Mar 31, 2018

restarted clang build that hung here

PASS: t4000-issues-test-driver.t 2 - t0505-msg-handler-reg.lua
PASS: t4000-issues-test-driver.t 3 - t0821-kvs-segfault.sh

@garlick
Copy link
Member

garlick commented Apr 1, 2018

I think this just needs a rebase? I can take care of it for you @grondo if you're already heading out of town.

grondo added 4 commits April 1, 2018 02:58
wreck execution first reads a global lwj.environ table and uses
that as the default environment for all jobs, overridden by any
environment variables encoded in the per-job `environ` table.

Currently, flux-wreckrun and flux-submit do not check for lwj.environ
and instead export the entire (filtered) current environment to each
job, so the lwj.environ is largely going ignored.

Fix this by first reading a "default environment" from lwj.environ
in wreckrun/submit, and only push variables that are not already
the same as the default env to the job. This should cut down on
KVS size for large numbers of jobs with largely the same, but slightly
different, environments. (If all environment tables are the same, then
this change does nothing since all environ entries in KVS will be
squashed by CAS)
Add an option -S, --skip-env to submit and wreckrun to skip the
export of current environment to the job. This can be used to speed
up submit/wreckrun when the global lwj.environ is sufficient, since
that default table doesn't need to be fetched from the KVS before
submitting the job to the instance.
Add flux-wreck setenv, getenv, unsetenv commands to set, get, and
manipulate the global wreck environment under lwj.environ.
Add tests for wreck global environment lwj.environ, and corresponding
commands to manipulate and use that environemnt.

Note: the test is split from t2000-wreck.t but still prefixed with
t2000 for easier test devlepment, parallel tests, etc. Since this is
short-lived code, it is probably ok for now.
@grondo
Copy link
Contributor Author

grondo commented Apr 1, 2018

Rebased. Thanks!

@garlick
Copy link
Member

garlick commented Apr 1, 2018

Restarted another build hung in the same t4000-issues test as above.

@garlick garlick merged commit 4a51fe1 into flux-framework:master Apr 1, 2018
@grondo
Copy link
Contributor Author

grondo commented Apr 1, 2018

Thanks, @garlick!

@grondo grondo deleted the wreck-setenv branch April 26, 2018 16:23
@grondo grondo mentioned this pull request May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants