docs: Link memoization and work-avoidance #11257

Joibel · 2023-06-23T13:13:37Z

Despite #10769 and #10426 both having examples of memoization not working with the examples having no output, no-one has picked up on this.

To address this improve the documentation for memoization and work-avoidance, linking the two ideas and pointing people who want to skip steps towards work-avoidance unless they are really doing what memoization was designed to do.

Issue #10426 is problematic in that some steps get memoized when perhaps they should't, so this commit shouldn't close it.

Fixes #10769

Despite argoproj#10769 and argoproj#10426 both having examples of memoization not working with the examples having no output, no-one has picked up on this. To address this improve the documentation for memoization and work-avoidance, linking the two ideas and pointing people who want to skip steps towards work-avoidance unless they are really doing what memoization was designed to do. Issue argoproj#10426 is problematic in that some steps get memoized when perhaps they should't, so this commit shouldn't close it. Fixes argoproj#10769 Signed-off-by: Alan Clucas <[email protected]>

tico24

nits

tico24 · 2023-06-26T13:47:57Z

docs/memoization.md

 it stores the outputs of a template into a specified cache with a variable key.

+Memoization only works for steps which have outputs, if you attempt to use it on steps which do not it should not work (there are some cases where it does, but they shouldn't). It is designed for 'pure' steps, where the purpose of running the step is to calculate some outputs based upon the steps inputs, and only the inputs. Pure steps should not interact with the outside world, but workflows won't enforce this on you.


"steps inputs" needs an appropriate apostrophe.

tico24 · 2023-06-26T13:49:20Z

docs/work-avoidance.md

@@ -2,7 +2,11 @@

 > v2.9 and after


did we ever work out why this is a v2.9 feature?

No idea, unless PVCs were introduced then.

Signed-off-by: Alan Clucas <[email protected]>

moleskin-smile · 2023-06-26T17:26:21Z

Thanks for clarifying on intended usage of Argo Workflows features.

Since an issue I've submitted triggered this change, let me describe our use case.

We're using Argo Workflows for CI and we're using memoization for work avoidance and I think it's valid usage. We don't want to run tests again for the same sets of inputs. So our expensive to compute output is... the information that tests pass for the given set of inputs. And we're happy that hooks run for memoized steps, because what we really need is green GitHub status check.

"Work avoidance" is not really Argo Workflows feature. It's just a technique that could be used to implement caching anywhere (which we also use with Argo in some places). But in order to not "abuse" memoization (replace memoization with "work avoidance"), we would have to repeat the same (part of) shell script in every template we want to cache, add input and output artifacts, handle artifact lifecycle etc. That's a lot of repeated YAML and shell code. And also pods would still be scheduled, run, they would fetch all input artifacts (cloning repositories for example) and would waste requested resources, increasing costs by a lot (especially for templates requesting a lot of cpu and memory). Why do that if we could just say "don't run this again for this set of inputs" which is exactly what memoization does?

Why not treat the success of template run part of its outputs and allow to memoize templates without outputs?

Memoization works really good when it works (when we don't run into 1MB limit of configmaps or this). It makes cached steps to be instant and makes Argo Workflows possible with monorepos.

(Let me know if this is not the right place to raise this concern).

mnarozny · 2023-06-26T18:20:57Z

I would also argue that this default approach is a bit odd to me. The memoization feature seems extremely powerful and restricting it to only steps with output seems counterintuitive when in a lot of cases it is the outcome with a set of parameters that matters, not necessarily the pure output. So if I memoized on my inputs and knew that the steps will produce the same output every time, I was expecting it not to have to run but give me a stamp of approval on the top level.

I mean I could put all the steps inside into a single template and memoize it but it seems counter intuitive when I can divide it into smaller reusable templates and then mix and match them and memoize the whole thing because the outcome of that whole template will never change for the same set of inputs.

Joibel · 2023-06-30T15:08:40Z

I understand that people are unhappy that memoization doesn't work without outputs, and would like it to work.

This PR doesn't change any behaviour, it just tries to document the current behaviour, which even in the event of a PR to change the behaviour of memoziation needs documenting for users prior to that change.

@terrytangyuan, I'd like you to consider merging this as is unless you feel it's factually incorrect.

I have raised #11280 as a proposal to implement memoization as @moleskin-smile and @mnarozny would like it to work to see what the feelings of the community are.

terrytangyuan

Thanks!

agilgur5 · 2023-07-06T21:14:25Z

docs/work-avoidance.md

-You can make workflows faster and more robust by employing **work avoidance**. A workflow that utilizes this is simply a workflow containing steps that do not run if the work has already been done. This simplest way to do this is to use **marker files**.
+You can make workflows faster and more robust by employing **work avoidance**. A workflow that utilizes this is simply a workflow containing steps that do not run if the work has already been done.
+
+This technique is similar to [memoization](memoization.md) but they have distinct use cases. Work avoidance is totally in your control and you make the decisions as to have to skip the work. [Memoization](memoization.md) is a feature of Argo Workflows to automatically skip steps which generate outputs - it is designed


The last sentence here was incomplete; it ends with "it is designed". Not sure how it was intended to be written

Oh, erm... yeah, I'll finish that off. Thanks @agilgur5

Fixed in #11357 . Thanks!

Complete my sentence from PR argoproj#11257 Signed-off-by: Alan Clucas <[email protected]>

Joibel marked this pull request as ready for review June 26, 2023 13:44

tico24 approved these changes Jun 26, 2023

View reviewed changes

Joibel added 2 commits June 26, 2023 13:51

Fix nit

04b6827

Signed-off-by: Alan Clucas <[email protected]>

Merge branch 'master' into memo-workavoidance

5cb725b

Joibel mentioned this pull request Jun 30, 2023

proposal: Allow memoization of things that do not have outputs #11280

Closed

terrytangyuan approved these changes Jun 30, 2023

View reviewed changes

terrytangyuan merged commit 476cd7a into argoproj:master Jun 30, 2023

JPZ13 pushed a commit to pipekit/argo-workflows that referenced this pull request Jul 4, 2023

docs: Link memoization and work-avoidance (argoproj#11257)

c53b68b

agilgur5 reviewed Jul 6, 2023

View reviewed changes

Joibel added a commit to Joibel/argo-workflows that referenced this pull request Jul 14, 2023

docs: Finish describing memoization

219f82f

Complete my sentence from PR argoproj#11257 Signed-off-by: Alan Clucas <[email protected]>

Joibel mentioned this pull request Jul 14, 2023

docs: Finish describing memoization #11357

Merged

Joibel added a commit to Joibel/argo-workflows that referenced this pull request Jul 14, 2023

docs: Finish describing memoization

1b77fe1

Complete my sentence from PR argoproj#11257 Signed-off-by: Alan Clucas <[email protected]>

Joibel mentioned this pull request Aug 2, 2023

REQUEST: Promotion to Reviewer for @Joibel argoproj/argoproj#231

Closed

6 tasks

Joibel deleted the memo-workavoidance branch November 20, 2023 21:38

agilgur5 added the area/docs Incorrect, missing, or mistakes in docs label Jan 29, 2024

Joibel mentioned this pull request Jan 31, 2024

REQUEST: Promotion to Approver for Joibel argoproj/argoproj#276

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Link memoization and work-avoidance #11257

docs: Link memoization and work-avoidance #11257

Joibel commented Jun 23, 2023

tico24 left a comment

tico24 Jun 26, 2023

Joibel Jun 26, 2023

tico24 Jun 26, 2023

Joibel Jun 26, 2023

moleskin-smile commented Jun 26, 2023

mnarozny commented Jun 26, 2023

Joibel commented Jun 30, 2023

terrytangyuan left a comment

agilgur5 Jul 6, 2023

Joibel Jul 14, 2023

agilgur5 Jul 19, 2023

		it stores the outputs of a template into a specified cache with a variable key.

		Memoization only works for steps which have outputs, if you attempt to use it on steps which do not it should not work (there are some cases where it does, but they shouldn't). It is designed for 'pure' steps, where the purpose of running the step is to calculate some outputs based upon the steps inputs, and only the inputs. Pure steps should not interact with the outside world, but workflows won't enforce this on you.

docs: Link memoization and work-avoidance #11257

docs: Link memoization and work-avoidance #11257

Conversation

Joibel commented Jun 23, 2023

tico24 left a comment

Choose a reason for hiding this comment

tico24 Jun 26, 2023

Choose a reason for hiding this comment

Joibel Jun 26, 2023

Choose a reason for hiding this comment

tico24 Jun 26, 2023

Choose a reason for hiding this comment

Joibel Jun 26, 2023

Choose a reason for hiding this comment

moleskin-smile commented Jun 26, 2023

mnarozny commented Jun 26, 2023

Joibel commented Jun 30, 2023

terrytangyuan left a comment

Choose a reason for hiding this comment

agilgur5 Jul 6, 2023

Choose a reason for hiding this comment

Joibel Jul 14, 2023

Choose a reason for hiding this comment

agilgur5 Jul 19, 2023

Choose a reason for hiding this comment