Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Run #749

Merged
merged 4 commits into from
May 7, 2020
Merged

Refactor Run #749

merged 4 commits into from
May 7, 2020

Conversation

vcastellm
Copy link
Member

@vcastellm vcastellm commented May 4, 2020

Rationale

Serf queries works over UDP sending messages to nodes and expecting a response, there is no guarantee or verification of the message delivery. This was causing some job executions to be lost, around 1%-2% on edge cases.

Solution

Agents will listen to gRPC calls from the servers, using server-side streaming, the other way around as currently, agents will send Execution progress as now.

Servers will actively order nodes to run the jobs, so we can verify the job is actually being executed or report an error in case it's not.

This maintains the same guarantees in job status reporting and streaming, also adding the possibility to cancel an execution in the future.

Fixes #731 #744

@vcastellm vcastellm marked this pull request as draft May 4, 2020 22:06
@vcastellm vcastellm marked this pull request as ready for review May 5, 2020 22:19
Serf queries works over UDP sending messages to nodes and expecting a
response, there is no guarantee or verification of the message delivery.
This was causing some job executions to be lost, around 1%-2% on edge
cases.

Solution

Agents will listen to gRPC calls from the servers, using server-side
streaming, the other way around as currently, agents will send Execution
progress as now.

Servers will actively order nodes to run the jobs, so we can verify the
job is actually being executed or report an error in case it's not.

This maintains the same guarantees in job status reporting and
streaming, also adding the possibility to cancel an execution in the
future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduled jobs sometimes not being executed
1 participant