-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial stab jobspec language definition #53
Conversation
It occurs to me that |
|
Ok, I pushed a new Jobspec Language Definition section which I hope captures the result of yesterday's meeting, and @trws comments above. This version is still pretty rough, but I'd like to narrow down on something acceptable to merge soon. @trws, given the above, I'm still assuming a conformant jobspec "SHALL" be a dictionary consisting of the keys |
@grondo: looks good! My only confusion is with the |
Yeah, jobspec only deals with query so I think we're only talking about the use as a label. (The other use of |
I like that to help clarify the difference. There may be a time when using an id or a UUID in a query is appropriate, if someone is requesting a specific resource by id or something. I like how this is looking, and given our discussions the other day I agree that requiring resources and tasks as keys at the program level is the way to go. Programs, tasks, and resources are distinct types now, and the "slot" resource type serves as the task target in the resource specification. My parser does not currently work that way, but it will make life noticeably easier once I make it work that way. |
Sorry... I glossed over the earlier comments before lunch. I don't mean to rehash what's been said. |
Hm, I perhaps wasn't clear on this particular point from yesterday. Are you saying that a "slot" is now a full type, e.g. resources:
- type: core
label: default
# other keys left out for brevity
tasks:
- command: myapp
slot: default vs resources:
- type: slot
label: default
with:
- type: core
# ...
tasks:
- command: myapp
slot: default Either way is fine with me actually, though the resources:
- type: group
label: default
count: { min: 1, max: 1, operator: "+", operand: 1 }
with:
- type: core
count: { min: 1, max: 1, operator: "+", operand: 1 }
sharing: exclusive
- type: memory
count: { min: Xmb_per_core, max: Xmb_per_core, operator: "+", operand: 1 }
units: "MB",
sharing: exclusive
tasks:
- command: myapp
slot: default
count_per_slot: 1
distribution: default
attrs: {} |
Formatting comment: consider using capitalized subheadings for top level keys. The indentation is a bit subtle in the github rendering with the current approach. |
Thanks, I assume you are talking about the RFC not the yaml snippets? I think I noticed that about the document as well, however I was hesitant to change the case of top level jobspec keys just for the RFC. I don't actually like the way the whole thing in the RFC is presented and would be open to comments on a different way to present the constraints... a table perhaps? |
Table, nested bullets so we have bullet style, or similar might be practical. I'm thinking we might add an actual schema based on something like the Rx yaml/json schema setup also, since that makes it trivial to verify the structure, if not the semantic content, of a spec. |
Unfortunately it appears Github rendering does not indent definition lists. I've changed formatting as @garlick suggested, but I'm afraid the indentation is still a bit subtle, but hopefully enough of an improvement that we can parse the content and review that part of the doc. Sadly, when I was formatting I mixed in the changes from However, @trws, the current language in this doc allows any resource vertex to be labeled as a slot with the I like the idea of formalizing a schema here as well -- If we do it as JSON, we could use the JSON content rules we've used elsewhere (as explained to me by @garlick). |
That looks quite a bit more readable to me :-) Since a min/max count of a resource can be requested, does wallclock need to be expressible as a function of the actual quantity of resources allocated? (maybe using a label like with slots?) |
Great point! Perhaps this was already discussed but perhaps it would be sufficient for now to say walltime is also a dict with at least one supported key, where the first key we support is Your comment also reminds me that we should ensure we have some way in the future for users to be able to submit a list of resource descriptors that represent a set of alternatives ( |
I agree, this is a good point. This should be important for expressing moldable jobs. A simple way to express a Specifically, users can either specify a constant
|
BTW, I just noticed this in one of the resource vertex fields: I think this meant to be a system-defined id? It would also be good if we can pencil in the purpose of having two id space as well. |
@dongahn, I don't think we should conflate the "resource spec" or resource description which is currently defined in RFC4, and the jobspec "resource query spec" which is what we are defining in this RFC. This RFC now uses "label" instead of "id" to denote task slot labels which can then be referred to in other parts of the jobspec, most notably task specifications. As @trws points out, we will need to support |
@dongahn: Sorry I just realized you were correcting a typo above, and not necessarily talking about adding |
@grondo: yes the last comment was a typo correction. Sorry, I should have inlined my comment with the original posting. |
Three misc comments:
|
@lipari, good points. I think 'walltime' as a range support the same concept as the scale fator-based spec I sugested, if i understand you right. One is just more explicit than the other. So as far as we are clear on what a walltime range means in the spec and covers the scale factor case, I think this is a good idea. |
Using a range for walltime, that matches or exceeds the number of levels allowed by the corresponding resource ranges, as @lipari suggests was also the idea that Suraj, @tpatki and I settled on when hashing over the walltime issue last summer. I think that's the easiest way to apply it in the short term, and we can always extend it later. As to supporting both per and total, we certainly can, but we need to be clear about which overrides the other. I would probably say that total is the maximum, regardless of "per_slot," but either way it would need to be explicit. |
@dongahn, I see two approaches to consider and I was inserting the first:
|
As @grondo points out, RFC4 defines the actual resource hierarchy spec, and we have diverged in a couple of places. It might be good to reconcile some of these, since this is effectively defining the find and/or match components of RFC4 in terms of an RFC4 resource graph. The resource spec syntax, or what goes into the "resources" key, could also be considered a valid format for a serialization of at least an abstract resource graph as defined in RFC4, though it might need a couple of tweaks. The only things that jump out at me:
|
I think this was the case @garlick had mentioned above. I had suggested, for extensibility, we promote |
I like the idea of making it a dict so it's easier to extend later. Especially to support a user-defined function or expression embedded in it. |
@grondo: making it extensible makes sense to me. I think duration, range and function already covers a wide range of cases. |
Actually, it should not require basename and id, basename is is optional and defaults to the type name (e.g. "core" for a "core", but a node can have a basename of "hype" for instance), name defaults to We should have a way in the jobspec language to request (and exclude!) a set of resource by id or name.
The important distinction in RFC4 is that properties are a shared attribute of a common resource type, inherited by all instances, and attributes/tags are specific to an instance of a resource type. For the query language that distinction probably does not matter. (Perhaps that is what you were saying above.) In general, I like the direction @trws is going with normalizing the RFC4 resource hierarchy terms and the jobspec resource spec. This is actually getting quite close to one of our original goals of making the language used for resource queries and resource configuration the same (borrowed from ClassAd). We should be able to define "matcher" for each of the components of a hierarchical resource and make this part of the resource query language. (e.g. we should be able to query against Using this spec as the serialization language seems like a very good idea, and it would be interesting to explore that after we've settled on the basic components here (e.,g. merge this PR) ;-) (Sorry if I kind of got off on a tangent here) |
BTW at the expense of making noise, i sort of see commonality between extensible walltime spec and task shape spec. Do we want to make it so that task count/shape spec extensibl also? Right now we support task per slot and total bur later we might want to extend it to cover some odd shapes? A task counts list or some mapping function etc which conventional distribution policy and count cannot easily create the shape? To be clear, i am not suggesting to specify these now to make it extensible for later use. |
I believe it be more understandable to me anyway if you would provide the YAML slot definitions for each of the two scenarios: a request for an exclusive allocation of a node that has at least two sockets and a request for two sockets on the same node. |
type: slot
with:
type: node
type: socket
count: 2
type: node
type: slot
with:
type: socket
count: 2 |
Good point. Your idea here works pretty well, however I fear use of |
I like the optional exclusive boolean idea, it shouldn't be needed very often, and if it is, it will get set. |
@trws's case of resources exclusively allocated to a job but not part of any task slot makes a great use case. However, I'm having trouble coming up with a non-trivial example, like a license which is obviously exclusively allocated since it will be a leaf in the request graph. Any ideas of a use case requesting a hierarchical resource exclusively, but not part of any task slot? (I guess a node that doesn't run anything is an example, but anything better?) |
@grondo, if we go by the resources under a slot are exclusive, then we wouldn't have the exclusive leaf resources anymore necessarily. A license is actually the best case I can immediately think of. Alternately, a user might want to exclusively allocate every node they run on, but confine their task slot to the socket level. |
Good point, shared/exclusive and the slot are really two different things, and I agree it does seem we need both. The slight changes we've kind of agreed on so far are indeed making it less awkward for me. Let me make sure I've got the rules correct:
That works for me, but I realize the last point might still be in question. For me, it doesn't make sense to request a resource "shared" without saying what portion of that resource you actually need to allocate. |
I agree with the first two, the last I'm less sure. That said I'm not sure I have a good counter-example. The only thing that's bugging at me is it seems possible, despite my present inability to come up with an example, that a user would want to request that the structure of the graph contain something at a leaf location that they are not actually allocating. Actually, maybe that's the issue. If it's limited to leaves of the |
That is what I was thinking, too. My rules above only apply to the My problem with not requiring leaf vertex in However, I realize it is taking me awhile to get some of this stuff, so I'm willing to admit I may be wrong. |
That's actually not quite how I see it. If the thing is shared, it has On 25 Jul 2016, at 9:35, Mark Grondona wrote:
|
Ah, I think your conceptual view of resources differs from mine. In my mental model (of the Your mental model is more what I was thinking of for something like the "topology" graph, which as you say could have entities in it which are non-allocatable, and which could affect the structure or relation of the resulting allocated resources -- but I would hesitate to call these things "resources" if they cannot be allocated by the resource manager. So I do agree with you that a leaf vertex in the topology or other strucutural matching case does not imply exclusive. However, if the resource is non-allocatable then the exclusivity of leaves could just be ignored? Also, I still wonder if a request for a single shared node by itself makes any sense, as the request has no "shape". |
Also, I still wonder if a request for a single shared node by itself
makes any sense, as the request has no "shape".
Maybe a good way to deal with this is to say that a slot must have at
least one "with:" child, and that's what's exclusively allocated. That
way, there's no way to request a single shared anything without at least
one exclusive resource?
|
@grondo As a user, there can be a scenario when you want to allocate say a core/set of cores to yourself because you don't need the full node, right? I think the question is what does the default graph look like if someone doesn't specify a "with" clause (one core and some memory?)? Or making a "with" clause mandatory in that scenario as @trws suggested. |
That works for me, and actually seems similar to what we were discussing before. |
Yes, I think we're saying the same thing. It may have been incorrect to say it "doesn't make sense", but rather a request for one shared node alone is incomplete, and therefore should not be allowed in canonical jobspec. |
Another question: what if the user gives a request that can't be translated successfully into a "shape"? Do we have a mechanism for addressing this yet -- do we just throw an error or do we default to a basic setup (for example, if you request a shared node and forget the with clause)? I don't know how SLURM or current resource managers address this. |
@tpatki It depends on the type of issue. If what they specify doesn't follow the spec, it will get rejected with an error by the parser. If it's something we can determine can't be supplied as part of the resources that can reasonably be made available, they'll get an error or a long-waiting job depending on policy. |
OK, in this PR I've update the description of the |
If we're getting close to ready to merge this, I can squash down all the incremental work. I think maybe there is a lot of editing to do, and the spec language definition could be presented better, but perhaps we could do that as a future PR |
I'm good with that. It seems like we're at the point where it doesn't really make sense to do much more tweaking without trying it out and seeing what happens. |
Take an initial stab at defining the version 1 jobspec language, including a sample of JSON Content Rules summarizing the requirements.
f178e64
to
5007b18
Compare
Ok, squashed! |
Very nice, thanks for all the hard work on this @grondo! |
Thanks!! On Wed, Jul 27, 2016 at 11:20 AM, Tom Scogland [email protected]
|
Based on our meeting yesterday I took an initial stab at fleshing out some of the Jobspec Language Definition section. I'm not too happy with what's here, but maybe this can generate some discussion and we can get an initial version merged today?