You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks to the detective work done by @DanielCosme, we found out that increasing the activities HeartbeatTimeout and the worker DefaultHeartbeatThrottleInterval and MaxHeartbeatThrottleInterval may reduce the likelihood of such timeouts. This values should be configurable so they can be set based on the expected system load and available resources.
I went deep into the rabbit hole, being able to configure the timeout values is definitely useful and a must have due to the variety of environments this system can run. However the root cause for timeout failures at a high SIP count in a queue was different, I was able to make the timeouts no more for up to 30k (I did no more tests) queued SIPs via configuring the concurrent workflows the worker is willing to work at a time. Check this PR artefactual-labs/enduro#616@jraddaoui
Is your feature request related to a problem? Please describe.
In some high load scenarios and environments with limited resources we have seen workflows ending unexpectedly with activity heartbeat timeouts:
Describe the solution you'd like
Thanks to the detective work done by @DanielCosme, we found out that increasing the activities
HeartbeatTimeout
and the workerDefaultHeartbeatThrottleInterval
andMaxHeartbeatThrottleInterval
may reduce the likelihood of such timeouts. This values should be configurable so they can be set based on the expected system load and available resources.Describe alternatives you've considered
Over-provision everywhere!
Additional context
Check @DanielCosme PR implementing this solution in artefactual-labs/enduro: artefactual-labs/enduro#612
The text was updated successfully, but these errors were encountered: