-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wreck: use cmb.exec instead of direct fork/exec for wrexecd execution in job module #1508
Conversation
This change does seem pretty clean! In
so maybe that case should be fatal as long as the rule holds on the server side? |
Great thanks! I had forgotten about that case. In the current |
I've reworked this PR to simply replace direct The Perhaps this simple step forward can be merged until we figure out the best way to recover from failed wrexecd launch. |
Ensure that jobids in the job.kvspath rpc are encoded as integers.
Problem: wreck/io.lua uses a hand-coded rpc to the job.kvspath service to lookup the kvs path for a jobid if needed, resulting in multiple call sites to check for fixes and proper args to the rpc. Use the wreck.id_to_path() function built-in to wreck.lua instead. This not only consolidates the rpc to a single call site, but also makes use of memoization of the wreck version of this function.
Remove the last bits of json-c and shortjson from the wreck "job" module and fully convert to jansson instead.
Add a reference counter to struct wreck_job objects to allow use in multiple job hashes.
Add tests for wreck_job_incref expected functionality to wreck_job unit tests.
0f41bd2
to
0371bd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I added a few minor review comments inline.
src/modules/wreck/job.c
Outdated
* nor even advisable. With the setsid above, the wrexecd | ||
* process should be reparented to init. | ||
*/ | ||
if (flux_future_get (f, &msg) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than unpacking msg
piecemeal, would it make sense to call flux_rpc_unpack()
once here with s?
optional keys for the conditional status
key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. I was possibly just preserving the existing structure, unnecessarily.
src/modules/wreck/job.c
Outdated
exit (255); | ||
if (type && strcmp (type, "io") == 0) | ||
spawn_io_cb (h, job, msg); | ||
else if (strcmp (state, "Exited") == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State is optionally unpacked and so may be NULL here.
src/modules/wreck/job.c
Outdated
#endif | ||
exec_handler (wrexecd_path, job); | ||
n = snprintf (buf, sizeof (buf), "--kvs-path=%s", job->kvs_path); | ||
if ((n >= sizeof (buf)) || (n < 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is snprintf cannot return < 0, because man page says that only happens on an "output error" which I assume only applies to the fprintf variants. I bring it up not so much to nitpick but to see if you agree because I generally don't check for < 0 result from snprintf and if I'm wrong that's more serious!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a portability habit I picked up from somewhere else. Though unlikely, snprintf
can return -1
on ENOMEM and possibly other errors. For example, found this reproducer on the internet of webpages:
$ cat t.c
#include <stdio.h>
#include <string.h>
#include <errno.h>
int
main(int argc, char **argv)
{
char buf[200];
char *fmt = argv[1];
if (argc < 2)
return 1;
int n = snprintf (buf, sizeof buf, fmt, 1);
int saved_errno = errno;
printf ("fmt: %s retval=%d errno=%s\n", fmt, n,
n < 0 ? strerror(saved_errno) : "");
return 0;
}
$ gcc -o tm t.c
$ bash -c 'ulimit -v 5000; ./tm %$((5*2**20))d'
fmt: %5242880d retval=-1 errno=Cannot allocate memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, the atrocities I've committed.
src/modules/wreck/job.c
Outdated
} | ||
json_array_append_new (o, json_string (buf)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json_string() can return NULL on error.
src/modules/wreck/job.c
Outdated
struct wreck_job *job = arg; | ||
const char *state = NULL; | ||
int rank; | ||
pid_t pid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pid is uninitialized, then optionally unpacked, then not used. Initialize or drop it entirely?
src/modules/wreck/job.c
Outdated
const char *wrexecd_path; | ||
char buf [4096]; | ||
int n; | ||
|
||
if (!(wrexecd_path = flux_attr_get (h, "wrexec.wrexecd_path", NULL))) { | ||
flux_log_error (h, "spawn_exec_handler: flux_attr_get"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spawn_exec_handler -> wrexecd_cmdline_create? (or use __FUNCTION__
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really dislike using __FUNCTION__
because you can't effectively search code based on the actual error message. I'll update the error text if that is required, but the error is during spawn_exec...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - reasonable. Maybe I should have said the error prefix in this message is different than the one right after it. And it's minor so not worth making a big deal over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, the inconsistency is a good catch and it should be fixed.
src/modules/wreck/job.c
Outdated
} | ||
if (!(o = json_array ())) { | ||
flux_log_error (h, "wrexecd_cmdlin_create: json_array"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
misspelled "cmdlin"
Ok, I've pushed some fixes to address your comments @garlick. If they seem ok I'll squash. Sorry this PR's been so sloppy! |
Not at all! LGTM. |
Problem: The job module spawns wrexecd on ranks involved in a job by direct fork/exec. Not only does this duplicate code from the cmb.exec service, but this is a fork from a threaded program, it may be problematic to reap exit status of wrexecd since the job module may not be successful in registering child watchers. Switch from direct fork/exec in job module to and rpc to the `cmb.exec` service on the local broker. Use the new flux_future_reset() facility to accept multiple responses from the `cmb.exec` service. Finally destroying the job and the future after the remote wrexecd has exited. For now non-zero exit codes and failure of exec(2) for wrexecd are simply logged, and no further action is taken.
Don't create a separate wreck_job object for run event handling, instead pull the existing job from the active_jobs hash.
Now that all job allocation mechanisms (wreck internal, sched) use R_lite for resources, drop support for rank.N dirs in wreck job launch. Update tests accordingly.
Somehow the ngpus tests in t2000-wreck.t were prefixed with a '+' and more concerning, this just caused the tests to be skipped. Remove the stray character and fix one of the tests.
Ok, I've squashed the fixup commits |
Restarted a stuck clang builder. I'll merge once that's successful. |
This is a very early WIP, posted now for early comments and discussion if necessary.
The end goal of this work is to detect nonzero exit status from launched wrexecd and set jobs to "failed" state, as well as enable the potential to cancel/kill jobs in the starting/runrequest state.
Currently the main work here is to convert the job module from direct fork/exec to the
cmb.exec
service. At this point failed wrexecd is logged but there is no mechanism yet to cancel the rest of the job.There is an active_pids hash added in addition to active_jobs here, but that might be removed as this PR moves forward. It was only strictly necessary before
flux_future_reset
was implemented. Right now it just serves as a way to stash wreck_job structures that have a second reference because they are tied to local wrexecds.