You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Worker receives a history containing the sleep finishing followed by timer firing.
The workflow thus receives two TIMER_FIRED events in its history at once. Now what should happen?
What’s happening in temporal-ruby today
It processes the timers in order. Sleep first, then the other timer. The workflow completes first, and it’s not allowed to do anything after the workflow has completed. Thus, if the timer issues a command, the task will fail with Temporal::WorkflowAlreadyCompletingError.
Attempted solution
My first attempt was to early-out in state_manager.rb
history_window.events.each do |event|
apply_event(event)
break if if workflow_finished?
end
This fixed my problem and is logical; the workflow will execute the same as it would have if the worker had never gone down. But, it will silently drop events.
I asked Maxim what other SDKs do…
Other SDKs’ behaviors
According to Maxim:
Temporal processes new events in batches. One batch per workflow task. It applies all the events before running workflow code. So in case of Java or Go, for example the Futures that correspond to timer will be set to ready before running any workflow threads. This way workflow can check if any of these Futures are ready to decide what to do next.
Signals have similar properties, but they use callbacks in Java and some other SDKs. Temporal SDK schedules callback threads before running workflow threads. This way workflow cannot complete before the callback threads were called.
Then in case of Java SDK it is going to make the main workflow thread eligible to run. But it still will run only after all other threads executed and all other Futures resolved.
Without this functionality you guaranteed to lose signals if they are delivered as callbacks.
Rust core does the same
Java implementation - we can a see that callbacks (timer in this case) have a higher priority than the main thread (which is awoken by the sleep in this case).
What now?
Changing temporal-ruby's behavior is obviously somewhat tricky and would cause version changes if not done in a version-safe way. We might take this fix on but want to work with the community.
Should we fix this?
Implementation suggestions for the fix itself?
Version safety. I'm guessing we should add a configuration that changes this behavior that folks can opt into.
The text was updated successfully, but these errors were encountered:
I think there might be a relatively painless fix to this. All the concurrent execution threads in Ruby are represented by fibers, so if we can distinguish the callback fibers from the main one in which the workflow was executed and skip resuming it until the very end of the history window. We already do something very similar with event handlers ordering.
A simpler fix that you can do straight away — is to check yourself whether a workflow has finished or not from your time callback using workflow.completed?. Not pretty, but at least will save you from failing workflow tasks.
In term of version safety — I don't think it should affect anyone. At least I don't think there's a non-failure scenario where anyone would depend on this kind of behaviour. It they did they would already be seeing failures
temporal-ruby Timer timing issue
Scenario:
The workflow thus receives two TIMER_FIRED events in its history at once. Now what should happen?
What’s happening in temporal-ruby today
It processes the timers in order. Sleep first, then the other timer. The workflow completes first, and it’s not allowed to do anything after the workflow has completed. Thus, if the timer issues a command, the task will fail with
Temporal::WorkflowAlreadyCompletingError
.Attempted solution
My first attempt was to early-out in
state_manager.rb
This fixed my problem and is logical; the workflow will execute the same as it would have if the worker had never gone down. But, it will silently drop events.
I asked Maxim what other SDKs do…
Other SDKs’ behaviors
According to Maxim:
Rust core does the same
Java implementation - we can a see that callbacks (timer in this case) have a higher priority than the main thread (which is awoken by the sleep in this case).
What now?
Changing temporal-ruby's behavior is obviously somewhat tricky and would cause version changes if not done in a version-safe way. We might take this fix on but want to work with the community.
The text was updated successfully, but these errors were encountered: