Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed has_release? when called in multiple windows #168

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

cvanderschuere
Copy link
Contributor

Currently has_release does not work with long running workflows. Once has_release has been hit once, it will never change its value for the lifetime of a workflow.

Example

Original

def execute
    counter = 0
    loop do
      workflow.sleep(2.minutes.to_i)
      HelloActivity.execute("loop_#{counter}")
      counter += 1
    end
  end

Modification

def execute
    counter = 0
    loop do
      workflow.sleep(2.minutes.to_i)
      if workflow.has_release?('testing')
        HelloActivity.execute("TESTING_#{counter}")
      else
        HelloActivity.execute("loop_#{counter}")
      end
      counter += 1
    end
  end

The desired outcome with releasing this modification is that all future loops will call the new code, but previous tasks will execute the original version. Prior to this PR, the original code will always be executed.

With this change, here are the possible scenarios:

  1. has_release seen for the first time when not replaying --- mark release
  2. has_release seen for the first time while replaying --- set to false locally
  3. has_release seen for first time while replaying and then again while not replaying -- mark release
  4. has_release seen for first time when not replaying and then again while not replaying -- use previous value / don't mark release

Copy link
Contributor

@antstorm antstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cvanderschuere thanks a lot for the PR 👍

Can you please help me understand your thought process here?

Originally I've implemented this in a way where you can expect a has_release?('version') to return the same value within a single workflow instance no matter at what point the new code got deployed. The reasoning here is simple — you can add a change with the same condition to different parts of the workflow code and expect it to either always be true or false, but never change the value along the way.

I can definitely see how your example is also valid where you want the new release to affect the loops that haven't yet happened. But this breaks the previous assumption

@cvanderschuere
Copy link
Contributor Author

The main use case that is not well supported right now is for long running workflows (specifically those with loops). If you have a workflow that runs for days or even weeks, there is currently no good way to make safe / deterministic updates and that's a critical feature for us.

As far as tools to make breaking changes, there are currently two approaches: version the entire workflow (i.e. WorkflowV1, WorkflowV2, etc) or use has_release? to be more specific with the versioning.

For workflows that are short (i.e. minutes), the current approach works just fine:

  1. (assume multiple workflow executions running)
  2. Wrap new codepath in has_release? and deploy to worker(s)
    a. For currently running workflows, they complete as though they would have before
    b. For new workflows, they use the new code
  3. After the max workflow duration (minutes in this example), all workflows are running the new code

The problem is when the max workflow duration is longer than a tolerable time to make changes. In our case, workflows can last for weeks which is more than tolerable.

I see two paths forward:
A. We change the assumption for has_release? to be consistent based on time in event history rather for entire workflow execution -- model has_release? like a SideEffect rather than more like a global variable
B. Add a new way to give more granular control over workflow execution potentially copying from the GetVersion example in the Go SDK

I would favor option A, but would be curious to hear use-cases where that would be a significant breaking change.

@antstorm
Copy link
Contributor

@cvanderschuere thanks for explaining this 🙌

What you're describing does make sense and it might be a better default behaviour. However it is a breaking change if there are people right now who depend on the old behaviour.

There might be a 3rd option which is a compromise — either add a new method or add a flag (e.g. has_release?('my_release', sticky: false)) that alters the behaviour of the has_release? for your use-case. At some point we'll have to go for the GetVersion-like API to give people more flexibility, but we can probably delay for a bit longer.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants