Skip to content

Commit

Permalink
TEP-0090: Looping [Problem Statement]
Browse files Browse the repository at this point in the history
This change adds the problem statement for Looping. It scopes the
problem, describes the use cases, and identifies the goals, the
requirements for the solution, and related work in other continuous
delivery systems.

Today, users cannot supply varying `Parameters` to the same `Task` or
`Custom Task` - that is, fan out their `Task` or `Custom Tasks`.

In this TEP, we aim to provide a way to run the same `Task` or
`Custom Task` with varying `Parameters` by spinning up a `TaskRun` or
`Run` for each `Parameter` in a loop.

This looping construct is aimed at improving the composability,
scalability, flexibility and reusability of *Tekton Pipelines*.

References:
- [Task Loops Experimental Project][task-loops]
- Issues:
  - tektoncd/pipeline#2050
  - tektoncd/pipeline#4097

[task-loops]: https://github.com/tektoncd/experimental/tree/main/task-loops
  • Loading branch information
jerop committed Oct 18, 2021
1 parent e60296f commit 2f7b065
Show file tree
Hide file tree
Showing 2 changed files with 394 additions and 0 deletions.
393 changes: 393 additions & 0 deletions teps/0090-looping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,393 @@
---
status: proposed
title: Looping
creation-date: '2021-10-13'
last-updated: '2021-10-13'
authors:
- '@jerop'
- '@pritidesai'
---

# TEP-0090: Looping

<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Use Cases](#use-cases)
- [Parallel Kaniko Build](#parallel-kaniko-build)
- [Dynamic Parallel Docker Build](#dynamic-parallel-docker-build)
- [Fan Out Vault Reading](#fan-out-vault-reading)
- [Multiple Testing Strategies](#multiple-testing-strategies)
- [Requirements](#requirements)
- [Related Work](#related-work)
- [GitHub Actions](#github-actions)
- [Argo Workflows](#argo-workflows)
- [Ansible](#ansible)
- [References](#references)
<!-- /toc -->

## Summary

Today, users cannot supply varying `Parameters` to the same `Task` or `Custom Task` - that is, fan out their `Task` or
`Custom Tasks`. In this TEP, we aim to provide a way to run the same `Task` or `Custom Task` with varying `Parameters`
by spinning up a `TaskRun` or `Run` for each `Parameter` in a loop. This looping construct is aimed at improving the
composability, scalability, flexibility and reusability of *Tekton Pipelines*.

## Motivation

Users can specify `Parameters`, such as an artifacts' names, that they want to supply to [`Tasks`][tasks-docs] and
[`Custom Tasks`][custom-tasks-docs] at execution. However, they don't have a way to supply varying `Parameters` to
the same `Task` or `Custom Task`.

Today, users would have to duplicate that `Task` or `Custom Task` in the `Pipelines` specification as many times as the
number of varying `Parameters` that they want to pass in. This creates some limitations and challenges:
- It is tedious and does not scale well because users have to add a `Task` entry to handle an additional *Parameter*.
- It is error-prone when duplicating the `Tasks` specifications, and it may be challenging to debug those errors.
- It is not flexible enough to handle a dynamic set of `Parameters` making it less reusable.

A common scenario is [a user needs to build multiple images][kaniko-example-1] from one repository using the
[kaniko][kaniko-task] `Task` from the *Tekton Catalog*. Let's assume it's three images. The user would have to specify
that `Pipeline` with the kaniko `Task` duplicated, as such:

```yaml
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
name: kaniko-pipeline
spec:
workspaces:
- name: shared-workspace
params:
- name: image-1
description: reference of the first image to build
- name: image-2
description: reference of the second image to build
- name: image-3
description: reference of the third image to build
tasks:
- name: fetch-repository
taskRef:
name: git-clone
workspaces:
- name: output
workspace: shared-workspace
params:
- name: url
value: https://github.com/tektoncd/pipeline
- name: subdirectory
value: ""
- name: deleteExisting
value: "true"
- name: kaniko-1
taskRef:
name: kaniko
runAfter:
- fetch-repository
workspaces:
- name: source
workspace: shared-workspace
params:
- name: IMAGE
value: $(params.image-1)
- name: kaniko-2
taskRef:
name: kaniko
runAfter:
- fetch-repository
workspaces:
- name: source
workspace: shared-workspace
params:
- name: IMAGE
value: $(params.image-2)
- name: kaniko-3
taskRef:
name: kaniko
runAfter:
- fetch-repository
workspaces:
- name: source
workspace: shared-workspace
params:
- name: IMAGE
value: $(params.image-3)
```
As shown in the above example, the limitations and challenges include:
- the user would have to add another `Task` entry if we need to build another image.
- the user can easily make errors while duplicating the `Tasks` specifications.
- the `Pipeline` cannot handle a dynamic set of images making it less reusable.

The `Parameters` used in the above example are user-defined. In some cases, the `Parameter` may be the `Result` of a
previous `Task` in the `Pipeline`. For example, a user [needs to build a dynamic set of images][kaniko-example-2] and
they share their current experience:
> "Right now I'm doing all of this by just having a statically defined single `Pipeline` with a `Task` and then
delegating to code/loops within that single `Task` to achieve the `N` things I want to do. This works, but then
I'd prefer the concept of a single Task does a single thing, rather than overloading it like this. Especially
when viewing it in the dashboard etc, things get lost" ~ [bitsofinfo][kaniko-example-2]

We need to address these challenges and limitations to improve the composability, scalability, flexibility,
reusability and debuggability of *Tekton Pipelines*.

**In this TEP, we aim to provide a way to run the same `Task` or `Custom Task` with varying `Parameters` by spinning up
a `TaskRun` or `Run` for each `Parameter` in a loop. This looping construct is aimed at improving the composability,
scalability, flexibility and reusability of *Tekton Pipelines***

### Goals

- Executing `Tasks` and `Custom Tasks` in a loop with varying `Parameter` values.
- Configuring whether the `TaskRuns` and `Runs` created in the loop execute sequentially or parallelly.
- Controlling the concurrency of `TaskRuns` or `Runs` created in a given loop.

### Non-Goals

- Terminating early when the `Tasks` or `Custom Tasks` are executed parallely - in-progress `TaskRuns` and `Runs` have
to complete execution before termination.
- Ignoring a failure when the `Tasks` or `Custom Tasks` are executed sequentially - addressed in [TEP-0050][tep-0050].

### Use Cases

#### Parallel Kaniko Build

As a `Pipeline` author, I [need to build multiple images][kaniko-example-1] from one repository using the same `Task`.
I choose to use the [*kaniko*][kaniko-task] `Task` from the *Tekton Catalog*. Let's assume it's three images. I want to
pass in varying `Parameter` values for `IMAGE` to create three `TaskRuns`, one for each image.

```
clone
|
v
--------------------------------------------------
| | |
v v v
ko-build-image-1 ko-build-image-2 ko-build-image-3
```
In other circumstances, the `Parameter` values for `IMAGE` may be produced by a previous `Task` in the `Pipeline`
instead of supplying them myself.
Read more in [user experience report #1][kaniko-example-1] and [user experience report #2][kaniko-example-2].
#### Dynamic Parallel Docker Build
As a `Pipeline` author, I have several dockerfiles in my repository.
```
/ docker / Dockerfile
python / Dockerfile
Ubuntu / Dockerfile
...
```
I have a *clone* `Task` that fetches the repository to a shared `Workspace`. Then I have a *get-dir* `Task` that
produces a `Result` array with the directory names of the dockerfiles. Finally, I want to dynamically generate the
parallel *docker build* `Tasks` that gets each dockerfile and runs docker build and push.
```
clone
|
v
get-dir
|
v
--------------------------------------------------
| | |
v v v
docker-build-1 docker-build-2 docker-build-3
```
Read more in the [user experience report][docker-example].
#### Fan Out Vault Reading
As a `Pipeline` author, I have a file in my repository with several vault paths.
```text
path1
path2
path3
...
```

I have a *vault-read* `Task` that I need to run for every entry in the file and get the secrets in each of them.
As such, I need to fan out the *vault-read* `Task` N times, where N is the number of vault paths in my file.

```
clone
|
v
get-vault-paths
|
v
--------------------------------------------------
| | |
v v v
vault-read-1 vault-read-2 vault-read-3
```

Read more in the [user experience report][vault-example].

#### Multiple Testing Strategies

As a `Pipeline` author, I have several a file configuring the test types that I want to run.

```text
code-analysis
unit-tests
e2e-tests
...
```

I have a *test* `Task` that I need to run for each test type in the file - the `Task` runs tests based on a `Parameter`.
I need to run this *test* `Task` for multiple test types that are defined in my repository (fetched using the
*test-selector* `Task`).

```
clone
|
v
tests-selector
|
v
--------------------------------------------------
| | |
v v v
code-analysis unit-tests e2e-tests
```

### Requirements

- User should be able to pass in an array `Parameter` to a `Task` or `Custom Task` and generate as many `TaskRuns` or
`Runs` as the length of the array `Parameter`.
- Users should be able to pass in several array `Parameters` to a `Task` or `Custom Task` and generate as many `TaskRuns`
or `Runs` as the combinations of the array `Parameters`.
- Users should be able to configure whether the loop is executed sequentially or parallelly.
- Users should be able to control the concurrency limit (maximum `TaskRuns` or `Runs` executed at a time).

### Related Work

The looping construct is related to `for loops` which are available in most programming languages. In this section, we
explore related work on looping constructs in other continuous delivery systems.

#### GitHub Actions

GitHub Actions allows users to define a matrix of job configurations - which creates jobs with after substituting
variables in each job. It also allows users to include or exclude combinations in the build matrix.

For example:

```yaml
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [macos-latest, windows-latest, ubuntu-18.04]
node: [8, 10, 12, 14]
exclude:
# excludes node 8 on macOS
- os: macos-latest
node: 8
include:
# includes node 15 on ubuntu-18.04
- os: ubuntu-18.04
node: 15
```
GitHub Actions workflows syntax also allows users to:
- cancel in-progress jobs is one of the matrix jobs fails
- specify maximum number of jobs to run in parallel
Read more in [documentation][github-actions].
#### Argo Workflows
Argo Workflows allows users to iterate over:
- a list of items as static inputs
- a list of sets of items as static inputs
- parameterized list of items or list of sets of items
- dynamic list of items or lists of sets of items
Here's an example from the [documentation][argo-workflows]:
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: loops-param-result-
spec:
entrypoint: loop-param-result-example
templates:
- name: loop-param-result-example
steps:
- - name: generate
template: gen-number-list
# Iterate over the list of numbers generated by the generate step above
- - name: sleep
template: sleep-n-sec
arguments:
parameters:
- name: seconds
value: "{{item}}"
withParam: "{{steps.generate.outputs.result}}"

# Generate a list of numbers in JSON format
- name: gen-number-list
script:
image: python:alpine3.6
command: [python]
source: |
import json
import sys
json.dump([i for i in range(20, 31)], sys.stdout)
- name: sleep-n-sec
inputs:
parameters:
- name: seconds
container:
image: alpine:latest
command: [sh, -c]
args: ["echo sleeping for {{inputs.parameters.seconds}} seconds; sleep {{inputs.parameters.seconds}}; echo done"]
```
Read more in the [documentation][argo-workflows].
#### Ansible
Ansible allows users to execute a task multiple times using `loop`, `with_<lookup>` and `until` keywords.

For example:

```yaml
- name: Show the environment
ansible.builtin.debug:
msg: " The environment is {{ item }} "
loop:
- staging
- qa
- production
```

Read more in the [documentation][ansible].

## References

- [Task Loops Experimental Project][task-loops]
- Issues:
- [#2050: `Task` Looping inside `Pipelines`][issue-2050]
- [#4097: List of `Results` of a `Task`][issue-4097]

[task-loops]: https://github.com/tektoncd/experimental/tree/main/task-loops
[issue-2050]: https://github.com/tektoncd/pipeline/issues/2050
[issue-4097]: https://github.com/tektoncd/pipeline/issues/4097
[tasks-docs]: https://github.com/tektoncd/pipeline/blob/main/docs/tasks.md
[custom-tasks-docs]: https://github.com/tektoncd/pipeline/blob/main/docs/pipelines.md#using-custom-tasks
[kaniko-example-1]: https://github.com/tektoncd/pipeline/issues/2050#issuecomment-625423085
[kaniko-task]: https://github.com/tektoncd/catalog/tree/main/task/kaniko/0.5
[kaniko-example-2]: https://github.com/tektoncd/pipeline/issues/2050#issuecomment-671959323
[docker-example]: https://github.com/tektoncd/pipeline/issues/2050#issuecomment-814847519
[vault-example]: https://github.com/tektoncd/pipeline/issues/2050#issuecomment-841291098
[tep-0050]: https://github.com/tektoncd/community/blob/main/teps/0050-ignore-task-failures.md
[argo-workflows]: https://github.com/argoproj/argo-workflows/blob/7684ef4a0c5f57e8723dc8e4d3a17246f7edc2e6/examples/README.md#loops
[github-actions]: https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions
[ansible]: https://docs.ansible.com/ansible/latest/user_guide/playbooks_loops.html#loops
1 change: 1 addition & 0 deletions teps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,3 +225,4 @@ This is the complete list of Tekton teps:
|[TEP-0081](0081-add-chains-subcommand-to-the-cli.md) | Add Chains sub-command to the CLI | proposed | 2021-08-31 |
|[TEP-0084](0084-endtoend-provenance-collection.md) | end-to-end provenance collection | proposed | 2021-09-16 |
|[TEP-0085](0085-per-namespace-controller-configuration.md) | Per-Namespace Controller Configuration | proposed | 2021-10-14 |
|[TEP-0090](0090-looping.md) | Looping | proposed | 2021-10-13 |

0 comments on commit 2f7b065

Please sign in to comment.