-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Link memoization and work-avoidance #11257
Conversation
Despite argoproj#10769 and argoproj#10426 both having examples of memoization not working with the examples having no output, no-one has picked up on this. To address this improve the documentation for memoization and work-avoidance, linking the two ideas and pointing people who want to skip steps towards work-avoidance unless they are really doing what memoization was designed to do. Issue argoproj#10426 is problematic in that some steps get memoized when perhaps they should't, so this commit shouldn't close it. Fixes argoproj#10769 Signed-off-by: Alan Clucas <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits
docs/memoization.md
Outdated
it stores the outputs of a template into a specified cache with a variable key. | ||
|
||
Memoization only works for steps which have outputs, if you attempt to use it on steps which do not it should not work (there are some cases where it does, but they shouldn't). It is designed for 'pure' steps, where the purpose of running the step is to calculate some outputs based upon the steps inputs, and only the inputs. Pure steps should not interact with the outside world, but workflows won't enforce this on you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"steps inputs" needs an appropriate apostrophe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -2,7 +2,11 @@ | |||
|
|||
> v2.9 and after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we ever work out why this is a v2.9 feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, unless PVCs were introduced then.
Signed-off-by: Alan Clucas <[email protected]>
Thanks for clarifying on intended usage of Argo Workflows features. Since an issue I've submitted triggered this change, let me describe our use case. We're using Argo Workflows for CI and we're using memoization for work avoidance and I think it's valid usage. We don't want to run tests again for the same sets of inputs. So our expensive to compute output is... the information that tests pass for the given set of inputs. And we're happy that hooks run for memoized steps, because what we really need is green GitHub status check. "Work avoidance" is not really Argo Workflows feature. It's just a technique that could be used to implement caching anywhere (which we also use with Argo in some places). But in order to not "abuse" memoization (replace memoization with "work avoidance"), we would have to repeat the same (part of) shell script in every template we want to cache, add input and output artifacts, handle artifact lifecycle etc. That's a lot of repeated YAML and shell code. And also pods would still be scheduled, run, they would fetch all input artifacts (cloning repositories for example) and would waste requested resources, increasing costs by a lot (especially for templates requesting a lot of cpu and memory). Why do that if we could just say "don't run this again for this set of inputs" which is exactly what memoization does? Why not treat the success of template run part of its outputs and allow to memoize templates without outputs? Memoization works really good when it works (when we don't run into 1MB limit of configmaps or this). It makes cached steps to be instant and makes Argo Workflows possible with monorepos. (Let me know if this is not the right place to raise this concern). |
I would also argue that this default approach is a bit odd to me. The memoization feature seems extremely powerful and restricting it to only steps with output seems counterintuitive when in a lot of cases it is the outcome with a set of parameters that matters, not necessarily the pure output. So if I memoized on my inputs and knew that the steps will produce the same output every time, I was expecting it not to have to run but give me a stamp of approval on the top level. I mean I could put all the steps inside into a single template and memoize it but it seems counter intuitive when I can divide it into smaller reusable templates and then mix and match them and memoize the whole thing because the outcome of that whole template will never change for the same set of inputs. |
I understand that people are unhappy that memoization doesn't work without outputs, and would like it to work. This PR doesn't change any behaviour, it just tries to document the current behaviour, which even in the event of a PR to change the behaviour of memoziation needs documenting for users prior to that change. @terrytangyuan, I'd like you to consider merging this as is unless you feel it's factually incorrect. I have raised #11280 as a proposal to implement memoization as @moleskin-smile and @mnarozny would like it to work to see what the feelings of the community are. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
You can make workflows faster and more robust by employing **work avoidance**. A workflow that utilizes this is simply a workflow containing steps that do not run if the work has already been done. This simplest way to do this is to use **marker files**. | ||
You can make workflows faster and more robust by employing **work avoidance**. A workflow that utilizes this is simply a workflow containing steps that do not run if the work has already been done. | ||
|
||
This technique is similar to [memoization](memoization.md) but they have distinct use cases. Work avoidance is totally in your control and you make the decisions as to have to skip the work. [Memoization](memoization.md) is a feature of Argo Workflows to automatically skip steps which generate outputs - it is designed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last sentence here was incomplete; it ends with "it is designed". Not sure how it was intended to be written
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, erm... yeah, I'll finish that off. Thanks @agilgur5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #11357 . Thanks!
Complete my sentence from PR argoproj#11257 Signed-off-by: Alan Clucas <[email protected]>
Complete my sentence from PR argoproj#11257 Signed-off-by: Alan Clucas <[email protected]>
Despite #10769 and #10426 both having examples of memoization not working with the examples having no output, no-one has picked up on this.
To address this improve the documentation for memoization and work-avoidance, linking the two ideas and pointing people who want to skip steps towards work-avoidance unless they are really doing what memoization was designed to do.
Issue #10426 is problematic in that some steps get memoized when perhaps they should't, so this commit shouldn't close it.
Fixes #10769