-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TEP-0076] Propose array results and indexing 🐠 #477
Conversation
p.s. why 76 you might ask? what happened to 74 and 75 XD? well you see, i started working on 74, then i realized i needed 75 (dictionary params + results) and THEN i realized i needed 76, so here we are 😆 (should see 74 + 75 as well asap!!) |
/assign @sbwsg @pierretasci |
teps/0076-array-result-types.md
Outdated
|
||
* What if the file contains json we don't support? (e.g. we don't yet support dictionaries, and wouldn't | ||
initially support arrays of arrays) | ||
* The TaskRun (or Pipeline Task) would fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could at least partially validate this on the creation of the resource too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain a bit more? I'm not sure how we'd know until runtime that something was written to the file in an unexpected format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is what I mean. We can validate it at runtime and potentially fail the pipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding a section "Validating JSON results at runtime" to cover this in more detail and moving this list out of the notes/caveats section
tasks: | ||
- name: deploy | ||
params: | ||
- name: environment | ||
value: '$(params.environments[0])' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where I think dynamism could be super cool. What if we did:
tasks: | |
- name: deploy | |
params: | |
- name: environment | |
value: '$(params.environments[0])' | |
tasks: | |
- name: deploy | |
params: | |
- name: environment | |
value: '$(params.environments[])' |
Having the non-indexed array there, similar to jq
would produce a copy of task deploy for each valid in the input params.environments
. Seems fairly straight-forward and a reasonable addition to the api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree there, or at least in the scope of the TEP. This might be something that appear confusing to the users, and most importantly, I would like to make sure we really need this (and custom task are not enough) before commiting to that level of complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can think of many use cases related to test sharding where this would be super useful. We also have many "build these X services and deploy them to an ephemeral cluster" uses cases. All involve a ton of copy+pasta of the same exact spec with different params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but as of today, this is what task-loop
custom task is there for, isn't it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pierretasci 100% agree with the test sharding use case and i have a languishing todo list item about trying to write up a TEP about better supporting test sharding specifically (which ultimately would lead to needing looping support)
I also agree with @vdemeester though that I don't personally see that as being exaclty within the scope of this particular TEP to add looping support BUT I do see array result support as a requirement for being able to support looping (and sharding) - and maybe that's what you're getting at @pierretasci (not that this TEP includes implementing looping, but that we need this TEP for looping and sharding is a great use case to list to justify adding array results)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I was specifically advocating for supporting it directly in the Tekton API because I think that unlocks many more use-cases but I can also see the argument against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am 💯 in favor of looping in the pipelines api at this point but i think we need to explore the use cases and alternatives a bit (and hopefully @vdemeester will find them convincing :D )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#600 🎉 🎉 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this is a more than worthwhile change and use case. My only comment is that I think we should take it one step further but otherwise, this is LGTM from me.
/LGTM
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think array params require the an author to explicitly declare the type:
spec:
params:
- name: foo
type: array
If type isn't specified then the default is assumed to be string
. A taskrun cannot supply an array of results when task doesn't specify param type
. Validation fails with something like invalid input params for task mytask: param types don''t match the user-specified type: [foo]
.
Assuming that we either keep type
and add type: object
or deprecate type
in favor of schema
I think we can avoid introducing a new result path - echoing what @pierretasci already suggested - the result's type/schema in the task definition would explicitly tell the entrypoint/controller how the file content should have been written, should be parsed, and whether it's valid or not. I think this would be totally backwards compatible since type
would default to string
.
wdyt? I might be missing something here though.
Edited to include mention of the possible type deprecation.
Ah ok, sorry, just read the |
My questions are entirely focused around the implementation and since we're only at the proposal stage I'd def like to see this move ahead! /approve |
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Just added a few comments, mostly I'm not 100% sure about indexing, but in general this looks good!
teps/0076-array-result-types.md
Outdated
|
||
### Use Cases | ||
|
||
1. Looping (i.e. providing values for an interface such as the task-loop custom task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since looping is not part of the core API today, I wonder if we could add another use case here.
Even if an array is not used as an input to a loop, it might be used as args for another task.
I could have a task that builds the args for the next task, by producing an array of parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+:100: on what @afrittoli just wrote. Given the current feature set of pipeline, I think that should appear as the main use case 👼🏼
|
||
* Add two missing pieces of array support | ||
* As results | ||
* Indexing into arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's no use case defined to support the indexing goal.
If we add indexing, we might need to introduce an operator that allows checking the size of an array too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be helpful in loops when we want to loop over certain indices of the array. But if the loop reconciler can take the results
as strings and extract them into arrays, then we can workaround without array indexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another +1 to including the size as a variable replacement, that could be a way to solve tektoncd/pipeline#4097
(will need to check if json schema provides something like this or if we would need to pave new ground)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bobcatfish This seems like a good use case to have in the TEP! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
circling back to this, I forgot that @sbwsg and I had already discussed this a bit and I'm back to leaving this out of scope for now (discussion: #477 (comment) TL;DR not clear how to support this for other types e.g. objects, or where we'd draw the line - also we can always add this later!) - it's currently listed in the alternatives section of the doc
teps/0076-array-result-types.md
Outdated
* Con: Without indexing, arrays will only work with arrays. This means you cannot combine arrays with Tasks that | ||
only take string parameters and this limits the interoperabiltiy between Tasks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense, I kind of struggle to imaging an actual example though.
I would need an array parameter where certain positions in the array hold special meaning I guess?
/assign |
Ahh I see, thanks for explaining in so much detail @mogsie . How strongly do you feel about the json text sequence syntax being supported right from the start? If I'm understanding correctly it seems like a) we'll likely want support for plain old json regardless of whether we support text sequences or not (we need to parse the content of each line anyway which is itself a complete json type) and b) this can be additively supported in a backwards compatible way. My preference is to keep this as simple as we can in the initial iterations (which is why I don't even want to support type nesting initially!) and expand the functionality later, including potentially adding support for json text sequences. What do you think? I've added JSON text sequences as an alternative in the proposal in the meantime - and I've tried to capture the properties you listed above
I don't quite follow that one - is this an easier way to specify 'null'? |
* Kubernetes supports `yaml` and `json`, so it makes sense to narrow our choices to at least these, as opposed to | ||
introducing a new format into the mix. | ||
|
||
We may want to consider also adding support for [JSON text sequences](#also-support-json-text-sequences) in future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other important note, YAML is a superset of json, all valid json is valid yaml so in a sense, we get the best of both 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I'll add that as a bullet as well :D
/approve |
I understand that supporting object syntax is over-the-top for this "only arrays" TEP. I initially suggested just that, a pure "array of strings" syntax, but the point about "future proofness" of this didn't quite come across. My main reasoning was that if you serialize an array of strings using JSON Text Sequence, then you gain a lot of future compatibility with TEP-0075, you get to be able to join two arrays by concatinating them, and so on. So instead of a result array being serialized here depicted as a "value" in yaml as a string representation of a JSON array
I suggest using this (json text sequence compatible):
I'm just concerned that Tekton might miss out on opportunities if it selects an array syntax that isn't easily composable / needs yet another syntax for object notation. e.g. if you have a set of tasks, and each one emits a result and then you want to join these results together to form an array; if results are emitted as "single element json text sequences", they can just be joined together:
I think I was talking about how when value is empty, (i.e. an empty array) it is just an empty string.
Null is just the word "null" without quotes, same as true, false: Here the second value is a null element
|
teps/0076-array-result-types.md
Outdated
[TEP-0075](https://github.com/tektoncd/community/pull/479), dictionaries (technically "objects"). Possible other formats | ||
we could use are: | ||
|
||
* `yaml` (i dont think there is a syntax like jsonpath available for yaml? i.e. it'd be harder to expose via variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yaml ends up being in-memory, a memory structure that's very similar to json, so you should be able to use jsonpath / jsonpointer etc on a yaml document.
steps: ... | ||
results: | ||
- name: animals | ||
type: array # this field is new for results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the whole proposal is about making array results, but this "type: array" is in conflict with "type: object" — have you considered a different attribute like "multiple: true" or something?
With both 0075 and 0076, you'd be able to define something like:
name: animals
type: object
properties:
name:
type: string
weight:
type: number
multiple: true
and emit (using json text sequence, of course)
cat <<EOF > $(results.animals.path)
{ "name": "cat", weight: 33}
{ "name": "dog", weight: 44}
EOF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - i was imagining the syntax being more json schema aligned to support the use case you're describing (an array of objects), e.g. something like this (based on https://json-schema.org/understanding-json-schema/reference/array.html and https://json-schema.org/learn/miscellaneous-examples.html#arrays-of-things):
name: animals
type: array
items: {
type: object
properties:
name:
type: string
weight:
type: number
}
And if we supported json text sequences, i would think the schema would look the same, the difference would be in how the results written by the Task were formatted? (i.e. im not seeing these features as being in conflict but let me know if I'm missing something)
However I have intentionally tried to keep nested types out of the scope of this proposal and TEP-0075 as well. I would want to make sure nothing we propose here would stop us from adding nested type support in the future so if you see this being incompatible please continue to raise it.
Personally I would still lean toward using the json schema syntax for this vs. introducing a new multiple
field.
This TEP proposes adding more support for array params by adding array results as well as the ability to index into arrays. It refers to TEP-0075 which will be added in a separate commit, which proposes adding support for dictionary types. Related issues: * [pipelines#1393 Consider removing type from params (or _really_ support types)](tektoncd/pipeline#1393) * [pipelines#3255 Arguments of type array cannot be access via index](tektoncd/pipeline#3255)
Thanks for clarifying @mogsie - I've tried to add more detail to the section I added on JSON text sequences to try to capture how this syntax could make composability easier for Tasks in a Pipeline, let me know if I'm still not capturing it (and feel free if you want to write something verbatim to add into the proposal, I'd be happy to add you as an author if you want). My preference still remains to keep this out of scope from the initial proposal - provided none of the choices we make here preclude adding support in the future. |
strings. | ||
|
||
If we add support for array results before we add support for looping (and until we have an approved proposal, there is | ||
no guarantee we ever will!) there will be no way to use an array result from one Task with a Task that expects a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getting there soon 😉
https://github.com/tektoncd/community/blob/main/teps/0090-matrix.md
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: afrittoli, jerop, pierretasci, sbwsg, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @pritidesai to review |
@dibyom: GitHub didn't allow me to request PR reviews from the following users: to, review. Note that only tektoncd members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
thank you @bobcatfish for addressing my comments! @mogsie can this be merged given that your proposal is added as a possible extension to the existing proposal? |
@mogsie another thought on this, which may (or may not) help is that in a pipeline, users would be able to concatenate array results together without needing to be aware of anything json specific at all our syntax for using array params currently looks like this: spec:
params:
- name: flags
type: array
description: List of flags
tasks:
- name: build-skaffold-web
params:
- name: flags
value: ["$(params.flags[*])"] The array has to be explicitly expanded, into a field of type array: Once Tasks can emit array results, the same syntax would be used, which means that combining the array results from two Tasks would look like this: tasks:
- name: build-skaffold-web
params:
- name: flags
value: ["$(tasks.prev-task.results.some-flags[*])", "$(tasks.another-task.results.other-flags[*]"] As far as I can tell, the question of supporting json text sequences vs just json would only come into play within the steps of a Task (not between Tasks in a Pipeline) - and again this is something we could expand to support both formats later on. |
/cancel hold |
/hold cancel |
API WG - ready to merge, cancelling the hold |
/area s3c |
While working on (finally) updating the array result and object param/result TEPs (tektoncd/community#479 tektoncd/community#477) I realized I hadn't included an example of how to specify defaults for the new format, so I looked for an example of how we currently do this for arrays, but we had none! Hopefully now we do :D
While working on (finally) updating the array result and object param/result TEPs (tektoncd/community#479 tektoncd/community#477) I realized I hadn't included an example of how to specify defaults for the new format, so I looked for an example of how we currently do this for arrays, but we had none! Hopefully now we do :D
This TEP proposes adding more support for array params by adding
array results as well as the ability to index into arrays.
It refers to TEP-0075 which will be added in a separate commit, which
proposes adding support for dictionary types.
Related issues:
/kind tep