-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhancement] Need to emit R_lite #312
Comments
@grondo: I integrated What was the expected BTW, as part of post processing hostname to rank resolution, the
|
Any error message in dmesg or on stderr?
The R_lite element below looks ok (the decode is done in rcalc.c in
modules/wreck if you want to see the source). Json objects are unordered
dictionaries, so the order of keys shouldn't matter.
Not at a computer now, but you could try running wreckrun and compare
R_lite green created there.
On Mon, Apr 23, 2018, 10:09 PM Dong H. Ahn ***@***.***> wrote:
@grondo <https://github.com/grondo>: I integrated
resrc_tree_serialize_lite with the scheduler framework module. It now
emits R_lite like this, but somehow the job is not running but gets stuck
in runrequest.
What was the expected R_lite format? I found our discussion at flux-framework/flux-core#1378
(comment)
<flux-framework/flux-core#1378 (comment)>
and also vaguely remember we wanted to use core instead of cores.
BTW, as part of post processing hostname to rank resolution, the rank key
appears after the children key in my R_lite. Could this be a problem?
quartz770{dahn}24: flux kvs dir -R lwj.0.0.1
lwj.0.0.1.R_lite = [ { "children": { "core": "0" }, "rank": 0 } ]
lwj.0.0.1.cmdline = ["sleep", "0"]
lwj.0.0.1.create-time = 1524544985.712936
lwj.0.0.1.cwd = /g/g0/dahn/workspace/flux-cancel/flux-sched
lwj.0.0.1.environ = {"BASH_ENV": "/usr/share/lmod/lmod/init/bash", "CVS_RSH": "ssh", "DISPLAY": "192.168.131.8:16.0", "DK_NODE": "/usr/global/tools/dotkit", "DK_ROOT"...
lwj.0.0.1.input.config = [{"dst": "*", "src": "stdin"}]
lwj.0.0.1.ncores = 1
lwj.0.0.1.ngpus = 0
lwj.0.0.1.nnodes = 1
lwj.0.0.1.ntasks = 1
lwj.0.0.1.opts.nnodes = 1
lwj.0.0.1.opts.tasks-per-node = 1
lwj.0.0.1.rank.0.cores = 1
lwj.0.0.1.rank.0.gpus = 0
lwj.0.0.1.state = runrequest
lwj.0.0.1.walltime = 0
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#312 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAtSUvchXmH5qUSROdygURIpTp_Ib7T0ks5trrN9gaJpZM4TQmFE>
.
On Apr 23, 2018 10:09 PM, "Dong H. Ahn" <[email protected]> wrote:
@grondo <https://github.com/grondo>: I integrated resrc_tree_serialize_lite
with the scheduler framework module. It now emits R_lite like this, but
somehow the job is not running but gets stuck in runrequest.
What was the expected R_lite format? I found our discussion at
flux-framework/flux-core#1378
(comment)
<flux-framework/flux-core#1378 (comment)>
and also vaguely remember we wanted to use core instead of cores.
BTW, as part of post processing hostname to rank resolution, the rank key
appears after the children key in my R_lite. Could this be a problem?
quartz770{dahn}24: flux kvs dir -R lwj.0.0.1
lwj.0.0.1.R_lite = [ { "children": { "core": "0" }, "rank": 0 } ]
lwj.0.0.1.cmdline = ["sleep", "0"]
lwj.0.0.1.create-time = 1524544985.712936
lwj.0.0.1.cwd = /g/g0/dahn/workspace/flux-cancel/flux-sched
lwj.0.0.1.environ = {"BASH_ENV": "/usr/share/lmod/lmod/init/bash",
"CVS_RSH": "ssh", "DISPLAY": "192.168.131.8:16.0", "DK_NODE":
"/usr/global/tools/dotkit", "DK_ROOT"...
lwj.0.0.1.input.config = [{"dst": "*", "src": "stdin"}]
lwj.0.0.1.ncores = 1
lwj.0.0.1.ngpus = 0
lwj.0.0.1.nnodes = 1
lwj.0.0.1.ntasks = 1
lwj.0.0.1.opts.nnodes = 1
lwj.0.0.1.opts.tasks-per-node = 1
lwj.0.0.1.rank.0.cores = 1
lwj.0.0.1.rank.0.gpus = 0
lwj.0.0.1.state = runrequest
lwj.0.0.1.walltime = 0
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#312 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAtSUvchXmH5qUSROdygURIpTp_Ib7T0ks5trrN9gaJpZM4TQmFE>
.
|
I think the issue is related to flux-framework/flux-core#1378 (comment) Because we wanted to remove json type from the public interfaces, we converted json type to string type in JSC. So I was writing I will see how I can avoid this conversion. But it seems we will need a big surgery to JSC to use "getter/setter" pattern to avoid excessive json to string type conversions.. |
Do we have an encoded JSON object being stored as a JSON string? |
@dongahn - just realized I'm not sure what to look at. Where is the the code that is writing |
Still in my repo. I will spend a bit more time this afternoon before creating a PR. If I need you to look at it, i will call out @garlick. Thanks! |
You can also push to a branch on your fork and point us to it here.
I was going to ask the same question. That was how the |
Yes I was using the same interface as From this sched code, the rdl is encoded into a JSON string (Jtostr (jcb)). if (resrc_tree_serialize (ro, job->resrc_tree)) {
flux_log (h, LOG_ERR, "%"PRId64" resource serialization failed: %s",
job->lwj_id, strerror (errno));
goto done;
}
json_object_set_new (jcb, JSC_RDL, ro);
jcbstr = Jtostr (jcb);
if (jsc_update_jcb (h, job->lwj_id, JSC_RDL, jcbstr) != 0) {
flux_log (h, LOG_ERR, "error jsc udpate: %"PRId64" (%s)", job->lwj_id,
strerror (errno));
goto done;
}
Jput (jcb); Then, in this JSC code, it is stored as a string. if (flux_kvs_txn_pack (txn, 0, key, "s", rs) < 0) {
flux_log_error (h, "update %s", key);
goto done;
}
if (!(f = flux_kvs_commit (h, 0, txn)) || flux_future_get (f, NULL) < 0) {
flux_log_error (h, "commit failed");
goto done;
} |
If the If this JSC function is also used to update JSON string values, then it seems like it should be split or something similar to the "pack" interface would be more appropriate? |
@garlick: I actually reached the same conclusion with some more testing and I now have working R_lite. Thanks. |
Ok great! |
Both flux-framework/flux-core#1485 and #321 have just been merged. |
Scheduler needs to emit
R_lite
. WIP is at here. This is related to flux-framework/flux-core#1439 as well.The text was updated successfully, but these errors were encountered: