Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEP-0076] Propose array results and indexing 🐠 #477

Merged
merged 1 commit into from
Feb 14, 2022

Conversation

bobcatfish
Copy link
Contributor

@bobcatfish bobcatfish commented Jul 15, 2021

This TEP proposes adding more support for array params by adding
array results as well as the ability to index into arrays.

It refers to TEP-0075 which will be added in a separate commit, which
proposes adding support for dictionary types.

Related issues:

/kind tep

@tekton-robot tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Jul 15, 2021
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 15, 2021
@bobcatfish
Copy link
Contributor Author

p.s. why 76 you might ask? what happened to 74 and 75 XD? well you see, i started working on 74, then i realized i needed 75 (dictionary params + results) and THEN i realized i needed 76, so here we are 😆 (should see 74 + 75 as well asap!!)

@chhsia0
Copy link
Contributor

chhsia0 commented Jul 19, 2021

/assign @sbwsg @pierretasci


* What if the file contains json we don't support? (e.g. we don't yet support dictionaries, and wouldn't
initially support arrays of arrays)
* The TaskRun (or Pipeline Task) would fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could at least partially validate this on the creation of the resource too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain a bit more? I'm not sure how we'd know until runtime that something was written to the file in an unexpected format

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is what I mean. We can validate it at runtime and potentially fail the pipeline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding a section "Validating JSON results at runtime" to cover this in more detail and moving this list out of the notes/caveats section

Comment on lines +198 to +414
tasks:
- name: deploy
params:
- name: environment
value: '$(params.environments[0])'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where I think dynamism could be super cool. What if we did:

Suggested change
tasks:
- name: deploy
params:
- name: environment
value: '$(params.environments[0])'
tasks:
- name: deploy
params:
- name: environment
value: '$(params.environments[])'

Having the non-indexed array there, similar to jq would produce a copy of task deploy for each valid in the input params.environments. Seems fairly straight-forward and a reasonable addition to the api.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree there, or at least in the scope of the TEP. This might be something that appear confusing to the users, and most importantly, I would like to make sure we really need this (and custom task are not enough) before commiting to that level of complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of many use cases related to test sharding where this would be super useful. We also have many "build these X services and deploy them to an ephemeral cluster" uses cases. All involve a ton of copy+pasta of the same exact spec with different params

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but as of today, this is what task-loop custom task is there for, isn't it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierretasci 100% agree with the test sharding use case and i have a languishing todo list item about trying to write up a TEP about better supporting test sharding specifically (which ultimately would lead to needing looping support)

I also agree with @vdemeester though that I don't personally see that as being exaclty within the scope of this particular TEP to add looping support BUT I do see array result support as a requirement for being able to support looping (and sharding) - and maybe that's what you're getting at @pierretasci (not that this TEP includes implementing looping, but that we need this TEP for looping and sharding is a great use case to list to justify adding array results)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I was specifically advocating for supporting it directly in the Tekton API because I think that unlocks many more use-cases but I can also see the argument against.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am 💯 in favor of looping in the pipelines api at this point but i think we need to explore the use cases and alternatives a bit (and hopefully @vdemeester will find them convincing :D )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#600 🎉 🎉 🎉

Copy link
Contributor

@pierretasci pierretasci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is a more than worthwhile change and use case. My only comment is that I think we should take it one step further but otherwise, this is LGTM from me.

/LGTM

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 20, 2021
@vdemeester
Copy link
Member

/hold

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2021
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think array params require the an author to explicitly declare the type:

spec:
  params:
  - name: foo
    type: array

If type isn't specified then the default is assumed to be string. A taskrun cannot supply an array of results when task doesn't specify param type. Validation fails with something like invalid input params for task mytask: param types don''t match the user-specified type: [foo].

Assuming that we either keep type and add type: object or deprecate type in favor of schema I think we can avoid introducing a new result path - echoing what @pierretasci already suggested - the result's type/schema in the task definition would explicitly tell the entrypoint/controller how the file content should have been written, should be parsed, and whether it's valid or not. I think this would be totally backwards compatible since type would default to string.

wdyt? I might be missing something here though.

Edited to include mention of the possible type deprecation.

teps/0076-array-result-types.md Show resolved Hide resolved
@ghost
Copy link

ghost commented Jul 23, 2021

Ah ok, sorry, just read the dictionary proposal and I see now that we're discussing type being deprecated in favor of schema.

@ghost
Copy link

ghost commented Jul 23, 2021

My questions are entirely focused around the implementation and since we're only at the proposal stage I'd def like to see this move ahead!

/approve

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2021
@vdemeester
Copy link
Member

/assign

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Just added a few comments, mostly I'm not 100% sure about indexing, but in general this looks good!

teps/0076-array-result-types.md Outdated Show resolved Hide resolved
teps/0076-array-result-types.md Show resolved Hide resolved
teps/0076-array-result-types.md Outdated Show resolved Hide resolved

### Use Cases

1. Looping (i.e. providing values for an interface such as the task-loop custom task
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since looping is not part of the core API today, I wonder if we could add another use case here.
Even if an array is not used as an input to a loop, it might be used as args for another task.
I could have a task that builds the args for the next task, by producing an array of parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+:100: on what @afrittoli just wrote. Given the current feature set of pipeline, I think that should appear as the main use case 👼🏼


* Add two missing pieces of array support
* As results
* Indexing into arrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's no use case defined to support the indexing goal.
If we add indexing, we might need to introduce an operator that allows checking the size of an array too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be helpful in loops when we want to loop over certain indices of the array. But if the loop reconciler can take the results as strings and extract them into arrays, then we can workaround without array indexing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another +1 to including the size as a variable replacement, that could be a way to solve tektoncd/pipeline#4097

(will need to check if json schema provides something like this or if we would need to pave new ground)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobcatfish This seems like a good use case to have in the TEP! 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

circling back to this, I forgot that @sbwsg and I had already discussed this a bit and I'm back to leaving this out of scope for now (discussion: #477 (comment) TL;DR not clear how to support this for other types e.g. objects, or where we'd draw the line - also we can always add this later!) - it's currently listed in the alternatives section of the doc

teps/0076-array-result-types.md Show resolved Hide resolved
Comment on lines 276 to 277
* Con: Without indexing, arrays will only work with arrays. This means you cannot combine arrays with Tasks that
only take string parameters and this limits the interoperabiltiy between Tasks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense, I kind of struggle to imaging an actual example though.
I would need an array parameter where certain positions in the array hold special meaning I guess?

@jerop
Copy link
Member

jerop commented Aug 2, 2021

/assign

@bobcatfish
Copy link
Contributor Author

Ahh I see, thanks for explaining in so much detail @mogsie . How strongly do you feel about the json text sequence syntax being supported right from the start? If I'm understanding correctly it seems like a) we'll likely want support for plain old json regardless of whether we support text sequences or not (we need to parse the content of each line anyway which is itself a complete json type) and b) this can be additively supported in a backwards compatible way.

My preference is to keep this as simple as we can in the initial iterations (which is why I don't even want to support type nesting initially!) and expand the functionality later, including potentially adding support for json text sequences. What do you think?

I've added JSON text sequences as an alternative in the proposal in the meantime - and I've tried to capture the properties you listed above

empty values represent exactly that.

I don't quite follow that one - is this an easier way to specify 'null'?

* Kubernetes supports `yaml` and `json`, so it makes sense to narrow our choices to at least these, as opposed to
introducing a new format into the mix.

We may want to consider also adding support for [JSON text sequences](#also-support-json-text-sequences) in future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other important note, YAML is a superset of json, all valid json is valid yaml so in a sense, we get the best of both 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I'll add that as a bullet as well :D

@pierretasci
Copy link
Contributor

/approve

@mogsie
Copy link

mogsie commented Feb 3, 2022

I understand that supporting object syntax is over-the-top for this "only arrays" TEP. I initially suggested just that, a pure "array of strings" syntax, but the point about "future proofness" of this didn't quite come across. My main reasoning was that if you serialize an array of strings using JSON Text Sequence, then you gain a lot of future compatibility with TEP-0075, you get to be able to join two arrays by concatinating them, and so on.

So instead of a result array being serialized here depicted as a "value" in yaml as a string representation of a JSON array

value: |
  [ "one", "two", "three" ]

I suggest using this (json text sequence compatible):

value: |
  "one"
  "two"
  "three"

How strongly do you feel about the json text sequence syntax being supported right from the start?

I'm just concerned that Tekton might miss out on opportunities if it selects an array syntax that isn't easily composable / needs yet another syntax for object notation. e.g. if you have a set of tasks, and each one emits a result and then you want to join these results together to form an array; if results are emitted as "single element json text sequences", they can just be joined together:

task 1 result = "foo"\n
task 2 result = "bar"\n
task 3 result = "baz"\n

input to some array = task1 result + task2 result + task 3 result =
"foo"
"bar"
"baz"

-- an array with three elements in it

empty values represent exactly that.

I think I was talking about how when value is empty, (i.e. an empty array) it is just an empty string.

I don't quite follow that one - is this an easier way to specify 'null'?

Null is just the word "null" without quotes, same as true, false: Here the second value is a null element

value: |
  "one"
  null
  "three"

[TEP-0075](https://github.com/tektoncd/community/pull/479), dictionaries (technically "objects"). Possible other formats
we could use are:

* `yaml` (i dont think there is a syntax like jsonpath available for yaml? i.e. it'd be harder to expose via variable
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaml ends up being in-memory, a memory structure that's very similar to json, so you should be able to use jsonpath / jsonpointer etc on a yaml document.

steps: ...
results:
- name: animals
type: array # this field is new for results
Copy link

@mogsie mogsie Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the whole proposal is about making array results, but this "type: array" is in conflict with "type: object" — have you considered a different attribute like "multiple: true" or something?

With both 0075 and 0076, you'd be able to define something like:

  name: animals
  type: object
  properties:
    name:
      type: string
    weight:
      type: number
  multiple: true

and emit (using json text sequence, of course)

cat <<EOF > $(results.animals.path)
{ "name": "cat", weight: 33}
{ "name": "dog", weight: 44}
EOF

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - i was imagining the syntax being more json schema aligned to support the use case you're describing (an array of objects), e.g. something like this (based on https://json-schema.org/understanding-json-schema/reference/array.html and https://json-schema.org/learn/miscellaneous-examples.html#arrays-of-things):

  name: animals
  type: array
  items: {
    type: object
    properties:
      name:
        type: string
      weight:
        type: number
  }

And if we supported json text sequences, i would think the schema would look the same, the difference would be in how the results written by the Task were formatted? (i.e. im not seeing these features as being in conflict but let me know if I'm missing something)

However I have intentionally tried to keep nested types out of the scope of this proposal and TEP-0075 as well. I would want to make sure nothing we propose here would stop us from adding nested type support in the future so if you see this being incompatible please continue to raise it.

Personally I would still lean toward using the json schema syntax for this vs. introducing a new multiple field.

This TEP proposes adding more support for array params by adding
array results as well as the ability to index into arrays.

It refers to TEP-0075 which will be added in a separate commit, which
proposes adding support for dictionary types.

Related issues:
* [pipelines#1393 Consider removing type from params (or _really_ support types)](tektoncd/pipeline#1393)
* [pipelines#3255 Arguments of type array cannot be access via index](tektoncd/pipeline#3255)
@bobcatfish
Copy link
Contributor Author

I initially suggested just that, a pure "array of strings" syntax, but the point about "future proofness" of this didn't quite come across. My main reasoning was that if you serialize an array of strings using JSON Text Sequence, then you gain a lot of future compatibility with TEP-0075, you get to be able to join two arrays by concatinating them, and so on.

Thanks for clarifying @mogsie - I've tried to add more detail to the section I added on JSON text sequences to try to capture how this syntax could make composability easier for Tasks in a Pipeline, let me know if I'm still not capturing it (and feel free if you want to write something verbatim to add into the proposal, I'd be happy to add you as an author if you want).

My preference still remains to keep this out of scope from the initial proposal - provided none of the choices we make here preclude adding support in the future.

strings.

If we add support for array results before we add support for looping (and until we have an approved proposal, there is
no guarantee we ever will!) there will be no way to use an array result from one Task with a Task that expects a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, jerop, pierretasci, sbwsg, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dibyom
Copy link
Member

dibyom commented Feb 7, 2022

/cc @pritidesai to review

@tekton-robot
Copy link
Contributor

@dibyom: GitHub didn't allow me to request PR reviews from the following users: to, review.

Note that only tektoncd members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @pritidesai to review

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jerop jerop mentioned this pull request Feb 7, 2022
@pritidesai
Copy link
Member

thank you @bobcatfish for addressing my comments!
/lgtm

@mogsie can this be merged given that your proposal is added as a possible extension to the existing proposal?

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 11, 2022
@bobcatfish
Copy link
Contributor Author

I'm just concerned that Tekton might miss out on opportunities if it selects an array syntax that isn't easily composable / needs yet another syntax for object notation. e.g. if you have a set of tasks, and each one emits a result and then you want to join these results together to form an array; if results are emitted as "single element json text sequences", they can just be joined together:

@mogsie another thought on this, which may (or may not) help is that in a pipeline, users would be able to concatenate array results together without needing to be aware of anything json specific at all

our syntax for using array params currently looks like this:

spec:
  params:
    - name: flags
      type: array
      description: List of flags
  tasks:
    - name: build-skaffold-web
      params:
        - name: flags
          value: ["$(params.flags[*])"]

The array has to be explicitly expanded, into a field of type array: value: ["$(params.flags[*])"]

Once Tasks can emit array results, the same syntax would be used, which means that combining the array results from two Tasks would look like this:

  tasks:
    - name: build-skaffold-web
      params:
        - name: flags
          value: ["$(tasks.prev-task.results.some-flags[*])", "$(tasks.another-task.results.other-flags[*]"]

As far as I can tell, the question of supporting json text sequences vs just json would only come into play within the steps of a Task (not between Tasks in a Pipeline) - and again this is something we could expand to support both formats later on.

@pritidesai
Copy link
Member

/cancel hold

@pritidesai
Copy link
Member

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 14, 2022
@pritidesai
Copy link
Member

API WG - ready to merge, cancelling the hold

@tekton-robot tekton-robot merged commit e0e7d73 into tektoncd:main Feb 14, 2022
@xchapter7x
Copy link
Contributor

/area s3c

@tekton-robot tekton-robot added the area/s3c Issues or PRs that are related to Secure Software Supply Chain (S3C) label Mar 4, 2022
bobcatfish added a commit to bobcatfish/pipeline that referenced this pull request Mar 18, 2022
While working on (finally) updating the array result and object
param/result TEPs (tektoncd/community#479
tektoncd/community#477) I realized I hadn't
included an example of how to specify defaults for the new format, so I
looked for an example of how we currently do this for arrays, but we had
none! Hopefully now we do :D
tekton-robot pushed a commit to tektoncd/pipeline that referenced this pull request Mar 23, 2022
While working on (finally) updating the array result and object
param/result TEPs (tektoncd/community#479
tektoncd/community#477) I realized I hadn't
included an example of how to specify defaults for the new format, so I
looked for an example of how we currently do this for arrays, but we had
none! Hopefully now we do :D
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/s3c Issues or PRs that are related to Secure Software Supply Chain (S3C) kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: UnAssigned
Status: Implementable
Development

Successfully merging this pull request may close these issues.