Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement per-job private KVS namespace #1460

Closed
wants to merge 6 commits into from

Conversation

garlick
Copy link
Member

@garlick garlick commented Apr 13, 2018

Now that we have @chu11's excellent work on private KVS namespaces merged, it seems useful to implement per-job "guest" namespaces as envisioned in RFC 16. This PR arranges for the job module to create a guest namespace named job<jobid> during job creation, and to "reap" it when the job reaches a terminal state.

During execution, the job may access the the namespace via the FLUX_JOB_KVSPATH environment variable, which is set to the direct namespace path, e.g. ns:job<jobid>/.. The enclosing instance can access it via the lwj.X.Y.<jobid>.guest symlink. After the job completes or fails, the namespace is "reaped", e.g. the symlink is replaced with a snapshot of the final content of the private namespace and the namespace is deleted.

Migrating job data to a private namespace can potentially reduce the bottleneck on serialized commits in the primary KVS namespace and protects the instance against an unintentional KVS DoS by programs that use the KVS heavily.

In this PR, the only data placed in the private namespace is the flux.local_uri and flux.remote_uri values recently added in support of flux wreck uri. It is empty if the job is not a flux instance. In a future PR we may want to migrate stdio and other per-job info.

garlick added 5 commits April 13, 2018 11:33
Problem: valgrind test fails when private KVS namespace
is made part of job setup.

Looks like some memory allocated in namespace_create()
that was freed in the error path, was not freed in
the non-error path.  Fix that.
When a job is initiated, create a private KVS namespace
for the job and link it into the job's primary namespace
subdirectory, e.g.
  lwj.X.Y.<jobid>.guest -> ns:job42/.

This implements the guest namespace described in RFC 16
for wreck, albeit in a KVS base path that does not conform
to RFC 16.
When job enters a terminal state (complete or
failed), replace symlink to per-job KVS namespace e.g.
  lwj.X.Y.<jobid>.guest -> ns:job42/.
with a snapshot reference, then delete the namespace.

Thus the final content of the private namespace is made
part of the primary namespace.
Add --kvs-guest option to override the default value
of FLUX_JOB_KVSPATH.
Arrange for the FLUX_JOB_KVSPATH environment
variable to point to the guest KVS namespace
in a launched program, rather than the KVS path
in the primary namespace.

This changes the KVS path used to look up a
sub-instance's FLUX_URI from
  <job_kvs_dir>.flux.remote_uri
to
  <job_kvs_dir>.guest.flux.remote_uri

Update flux-wreck and t2003-recurse.t accordingly.
@grondo
Copy link
Contributor

grondo commented Apr 13, 2018

This is pretty nifty!

I know this is an experiment meant to get us to RFC 16 guest namespace, but since the wreck prototype isn't multi-user capable anyway, I wonder if it would be interesting to have an option to replace the whole job kvs directory with a namespace. This would be interesting because

  1. Most of the data in the lwj.x.y.z dirs would normally be under the guest namespace in the exec replacement anyway, so we'd get more representative usage
  2. It would allow us to run soak or other tests to see an upper bound in improvement by using namespaces for job kvs data without much work in the other wreck tools we're throwing away anyway
  3. It would abstract the kvs path to jobs from the data, e.g. tools could iterate namespaces to get data for jobs instead of iterating kvs dirs.

Maybe the third item above wouldn't really be worth pursuing at this point, but I do find it interesting that the list of current job* namespaces is de-facto a list of currently "active" jobs (with potential for a few stale complete jobs). It would be interesting to leverage that somehow.

Other that that, my only other comment is that I'm confused by the FLUX_JOB_KVSPATH environment variable, which now holds the path to the guest namespace only. Will this be confusing to users that stumble across it and try to use it to get to the actual kvs path? Should it be named FLUX_JOB_KVS_NAMESPACE for clarity, or did I miss the point (very likely)?

Add a sharness test intended to test conformance to
RFC 16: KVS job schema.

Initially this just checks behavior of the "guest"
subdirectory that is a private namespace during
program execution.
@chu11
Copy link
Member

chu11 commented Apr 13, 2018

Looks pretty good to me. One thought:

+test_expect_success 'schema: guest namespace converted on completion' '
+	flux wreckrun /bin/true &&
+	test_must_fail flux kvs readlink $(last_job_path).guest
+'

should there be the opposite test, to make sure $(last_job_path).guest is a link when the job is running?

@grondo
Copy link
Contributor

grondo commented Apr 13, 2018

should there be the opposite test, to make sure $(last_job_path).guest is a link when the job is running?

Good idea. The wreckrun option --wait-until=<state> is useful for this kind of test (in case the option had slipped by unnoticed)

@garlick
Copy link
Member Author

garlick commented Apr 13, 2018

Thanks for the thoughts guys! I'm especially intrigued by the idea of moving the whole job directory into the namespace. It gets us further along and will probably be a simpler change, even if it doesn't wind up looking quite like RFC 16.

@garlick
Copy link
Member Author

garlick commented Apr 18, 2018

Closing pending rework.

@garlick garlick closed this Apr 18, 2018
@garlick garlick deleted the guestns branch February 25, 2020 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants