Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoization doesn't work in templates that are steps #10769

Closed
2 of 3 tasks
moleskin-smile opened this issue Mar 28, 2023 · 6 comments · Fixed by #11257 or #11379
Closed
2 of 3 tasks

Memoization doesn't work in templates that are steps #10769

moleskin-smile opened this issue Mar 28, 2023 · 6 comments · Fixed by #11257 or #11379
Labels

Comments

@moleskin-smile
Copy link

moleskin-smile commented Mar 28, 2023

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

memoize configuration added to a steps template doesn't have any effect. No ConfigMap is created at all.

Memoization status is always like this after multiple submits:

      memoizationStatus:
        cacheName: cache-this-cm-doesnt-exist
        hit: false
        key: hello-10

It prevents us from caching larger reusable chunks of work exposed via Workflow Templates. For example, building images is a multiple step effort – we have to build an app, login to container registry, build Dockerfile and push image. Entire thing could be cached as a whole, but it doesn't work.

There are few workarounds we considered. We're not happy with neither of them.

  1. Add memoize to all of the steps. It's hard to maintain and it's error-prone and sometimes incorrect. For example we we shouldn't cache "container registry login" step, because token it generates has short time to live, but also we don't need it at all if the image is already built (and it's waste of time and resources to login just for nothing).
  2. Merge all steps into one, which is bad for code reusability and it's harder to maintain special images with all the tools installed (they're also bigger, so slower).
  3. Add memoize to the slowest templates only and run these faster ones unnecessarily.

Image building is just an example. We have more use cases for this.

Is there a way to fix or work around this? Thanks!

Similar issue was reported for dag: #10426

Version

v3.4.5

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: testing-memoization-mini-
  annotations:
    workflows.argoproj.io/description: Test memoization
spec:
  entrypoint: main
  templates:
    - name: main
      steps:
        - - name: echo-and-wait-step
            template: echo-and-wait
            arguments:
              parameters:
                - name: time
                  value: 10
                - name: message
                  value: "hello"
      memoize:
        key: "hello-10"
        cache:
          configMap:
            name: cache-this-cm-doesnt-exist

    - name: echo-and-wait
      inputs:
        parameters:
          - name: message
          - name: time
      script:
        image: alpine:3.17.2
        command: [sh]
        source: |
          set -e
          date
          echo {{inputs.parameters.message}}
          sleep {{inputs.parameters.time}}

Logs from the workflow controller

time="2023-03-28T18:19:27.757Z" level=info msg="Processing workflow" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:27.762Z" level=info msg="Updated phase  -> Running" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:27.767Z" level=info msg="Steps node testing-memoization-mini-d2k2z initialized Pending" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:27.767Z" level=info msg="StepGroup node testing-memoization-mini-d2k2z-1462015324 initialized Running" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:27.767Z" level=info msg="Pod node testing-memoization-mini-d2k2z-2612586120 initialized Pending" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:27.792Z" level=info msg="Created pod: testing-memoization-mini-d2k2z[0].echo-and-wait-step (testing-memoization-mini-d2k2z-echo-and-wait-2612586120)" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:27.792Z" level=info msg="Workflow step group node testing-memoization-mini-d2k2z-1462015324 not yet completed" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:27.792Z" level=info msg="TaskSet Reconciliation" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:27.792Z" level=info msg=reconcileAgentPod namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:27.801Z" level=info msg="Workflow update successful" namespace=ci-workflows phase=Running resourceVersion=273753196 workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:37.799Z" level=info msg="Processing workflow" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:37.799Z" level=info msg="Task-result reconciliation" namespace=ci-workflows numObjs=0 workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:37.799Z" level=info msg="node changed" namespace=ci-workflows new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=testing-memoization-mini-d2k2z-2612586120 old.message= old.phase=Pending old.progress=0/1 workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:37.800Z" level=info msg="Workflow step group node testing-memoization-mini-d2k2z-1462015324 not yet completed" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:37.800Z" level=info msg="TaskSet Reconciliation" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:37.800Z" level=info msg=reconcileAgentPod namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:37.811Z" level=info msg="Workflow update successful" namespace=ci-workflows phase=Running resourceVersion=273753277 workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:47.812Z" level=info msg="Processing workflow" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:47.813Z" level=info msg="Task-result reconciliation" namespace=ci-workflows numObjs=0 workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:47.813Z" level=info msg="node unchanged" namespace=ci-workflows nodeID=testing-memoization-mini-d2k2z-2612586120 workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:47.813Z" level=info msg="Workflow step group node testing-memoization-mini-d2k2z-1462015324 not yet completed" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:47.813Z" level=info msg="TaskSet Reconciliation" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:47.813Z" level=info msg=reconcileAgentPod namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:57.906Z" level=info msg="Processing workflow" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:57.906Z" level=info msg="Task-result reconciliation" namespace=ci-workflows numObjs=0 workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:57.907Z" level=info msg="node changed" namespace=ci-workflows new.message= new.phase=Running new.progress=0/1 nodeID=testing-memoization-mini-d2k2z-2612586120 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:57.907Z" level=info msg="Workflow step group node testing-memoization-mini-d2k2z-1462015324 not yet completed" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:57.907Z" level=info msg="TaskSet Reconciliation" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:57.907Z" level=info msg=reconcileAgentPod namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:19:57.917Z" level=info msg="Workflow update successful" namespace=ci-workflows phase=Running resourceVersion=273753440 workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.944Z" level=info msg="Processing workflow" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.945Z" level=info msg="Task-result reconciliation" namespace=ci-workflows numObjs=1 workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.945Z" level=info msg="task-result changed" namespace=ci-workflows nodeID=testing-memoization-mini-d2k2z-2612586120 workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.945Z" level=info msg="node changed" namespace=ci-workflows new.message= new.phase=Succeeded new.progress=0/1 nodeID=testing-memoization-mini-d2k2z-2612586120 old.message= old.phase=Running old.progress=0/1 workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.945Z" level=info msg="Step group node testing-memoization-mini-d2k2z-1462015324 successful" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.945Z" level=info msg="node testing-memoization-mini-d2k2z-1462015324 phase Running -> Succeeded" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.945Z" level=info msg="node testing-memoization-mini-d2k2z-1462015324 finished: 2023-03-28 18:20:08.945950054 +0000 UTC" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.945Z" level=info msg="Outbound nodes of testing-memoization-mini-d2k2z-2612586120 is [testing-memoization-mini-d2k2z-2612586120]" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.945Z" level=info msg="Outbound nodes of testing-memoization-mini-d2k2z is [testing-memoization-mini-d2k2z-2612586120]" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.945Z" level=info msg="node testing-memoization-mini-d2k2z phase Pending -> Succeeded" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.945Z" level=info msg="node testing-memoization-mini-d2k2z finished: 2023-03-28 18:20:08.945993045 +0000 UTC" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.946Z" level=info msg="Checking daemoned children of testing-memoization-mini-d2k2z" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.946Z" level=info msg="TaskSet Reconciliation" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.946Z" level=info msg=reconcileAgentPod namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.946Z" level=info msg="Updated phase Running -> Succeeded" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.946Z" level=info msg="Marking workflow completed" namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.946Z" level=info msg="Checking daemoned children of " namespace=ci-workflows workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.951Z" level=info msg="cleaning up pod" action=deletePod key=ci-workflows/testing-memoization-mini-d2k2z-1340600742-agent/deletePod
time="2023-03-28T18:20:08.957Z" level=info msg="Workflow update successful" namespace=ci-workflows phase=Succeeded resourceVersion=273753515 workflow=testing-memoization-mini-d2k2z
time="2023-03-28T18:20:08.958Z" level=info msg="Queueing Succeeded workflow ci-workflows/testing-memoization-mini-d2k2z for delete in 4h0m0s due to TTL"
time="2023-03-28T18:20:08.978Z" level=info msg="cleaning up pod" action=labelPodCompleted key=ci-workflows/testing-memoization-mini-d2k2z-echo-and-wait-2612586120/labelPodCompleted

Logs from in your workflow's wait container

time="2023-03-28T18:19:48.072Z" level=info msg="capturing logs" argo=true
Tue Mar 28 18:19:48 UTC 2023
hello
time="2023-03-28T18:19:58.078Z" level=info msg="sub-process exited" argo=true error="<nil>"
@JPZ13 JPZ13 added type/feature Feature request and removed type/bug labels Mar 30, 2023
@JPZ13
Copy link
Member

JPZ13 commented Mar 30, 2023

It seems like memoization is only declared on execution templates rather than orchestration templates currently. This and #10426 are requesting that it work for orchestration templates

@moleskin-smile
Copy link
Author

@JPZ13: Thanks for looking into this. What needs to be done to implement this? I think it's an important missing feature that would allow us to cache more complicated steps that run on big monorepos.

@JPZ13
Copy link
Member

JPZ13 commented Apr 4, 2023

Not sure about what technical steps, but I'll bring it up at the next contributors meeting to see if anyone has capacity. Would you be interested in contributing it @moleskin-smile?

@moleskin-smile
Copy link
Author

I'd love to, but I lack Go skills. :(

Thanks for taking care of this issue.

Joibel added a commit to Joibel/argo-workflows that referenced this issue Jun 23, 2023
Despite argoproj#10769 and argoproj#10426 both having examples of memoization not
working with the examples having no output, no-one has picked up on
this.

To address this improve the documentation for memoization and
work-avoidance, linking the two ideas and pointing people who want to
skip steps towards work-avoidance unless they are really doing what
memoization was designed to do.

Issue argoproj#10426 is problematic in that some steps get memoized when
perhaps they should't, so this commit shouldn't close it.

Fixes argoproj#10769

Signed-off-by: Alan Clucas <[email protected]>
@Joibel
Copy link
Member

Joibel commented Jun 23, 2023

This is not a bug. Memoization should not memoize steps without outputs. Adding outputs to the above example does what is intended.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: memo-steps-
  annotations:
    workflows.argoproj.io/description: Test memoization
spec:
  entrypoint: main
  templates:
    - name: main
      steps:
        - - name: echo-and-wait-step
            template: echo-and-wait
            arguments:
              parameters:
                - name: time
                  value: 10
                - name: message
                  value: "hello"
      memoize:
        key: "hello-10"
        cache:
          configMap:
            name: cache-this-cm-doesnt-exist
      outputs:
        parameters:
          - name: message
            valueFrom:
              parameter: "{{ steps.echo-and-wait-step.outputs.parameters.message }}"

    - name: echo-and-wait
      inputs:
        parameters:
          - name: message
          - name: time
      script:
        image: alpine:3.17.2
        command: [sh]
        source: |
          set -e
          date
          echo {{inputs.parameters.message}} > /tmp/msg
          sleep {{inputs.parameters.time}}
      outputs:
        parameters:
          - name: message
            valueFrom:
              path: /tmp/msg

I have put up a PR to try to address this issue through documentation.

@moleskin-smile
Copy link
Author

Thanks! That works indeed.

@JPZ13 JPZ13 closed this as completed Jun 28, 2023
Joibel added a commit to Joibel/argo-workflows that referenced this issue Jul 17, 2023
Please note: I am not totally in favour of this change, it makes
memoization work without outputs and I'm not conviced this is actually
healthy, but it's what people seem to expect. See the linked issues in
the fixes list below.

This change will cause memoization to work for all step and dag tasks
even without outputs. Note: They were working semi-erroneously for
some dag tasks due to implicit outputs before this change.

Fixes argoproj#11280 (raised to cover this desire)
Fixes argoproj#10769 (already closed by documentation)
Partially addresses argoproj#10426: Dags will memoize now, but retries still
won't

Signed-off-by: Alan Clucas <[email protected]>
Joibel added a commit to Joibel/argo-workflows that referenced this issue Jul 17, 2023
Please note: I am not totally in favour of this change, it makes
memoization work without outputs and I'm not conviced this is actually
healthy, but it's what people seem to expect. See the linked issues in
the fixes list below.

This change will cause memoization to work for all step and dag tasks
even without outputs. Note: They were working semi-erroneously for
some dag tasks due to implicit outputs before this change.

Fixes argoproj#11280 (raised to cover this desire)
Fixes argoproj#10769 (already closed by documentation)
Partially addresses argoproj#10426: Dags will memoize now, but retries still
won't

Signed-off-by: Alan Clucas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants