-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consul Catalog Integration Fixes #5536
Conversation
54f0e0e
to
761314d
Compare
The current implementation of Service Registration uses a hash of the nomad-internal state of a service to register it with Consul, this means that any update to the service invalidates this name and we then deregister, and recreate the service in Consul. While this behaviour slightly simplifies reasoning about service registration, this becomes problematic when we add consul health checks to a service. When the service is re-registered, so are the checks, which default to failing for at least one check period. This commit migrates us to using a stable identifier based on the allocation, task, and service identifiers, and uses the difference between the remote and local state to decide when to push updates. It uses the existing hashing mechanic to decide when UpdateTask should regenerate service registrations for providing to Sync, but this should be removable as part of a future refactor. It additionally introduces the _nomad-check- prefix for check definitions, to allow for future allowing of consul features like maintenance mode.
This commit causes sync to skip deregistering checks that are not managed by nomad, such as service maintenance mode checks. This is handled in the same way as service registrations - by doing a Nomad specific prefix match.
429d157
to
14c231a
Compare
Only travis job that's failing on push is the website and that seems unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good Dani! Don't forget to update the CHANGELOG either in this PR or post merge.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
fixes #4566
fixes #4537
The current implementation of Service Registration uses a hash of the nomad-internal state of a service to register it with Consul, this means that any update to the service invalidates this name and we then de-register, and recreate the service in Consul.
While this behavior slightly simplifies reasoning about service registration, this becomes problematic when we add consul health checks to a service. When the service is re-registered, so are the checks, which default to failing for at least one check period. This causes issues when a canary is promoted, as the old, stable allocs are immediately stopped, and the new ones updated, which causes a consul registration cycle and takes your service offline for one health check period.
This pull request introduces
v3
of the Nomad-Consul integration style that moves back to having a stable identifier for a service, and registers updates based on changes to the internal struct state and remote state.Notes
There is currently a UX issue in Consul that means that when you update a Service's tags, its Checks will still return the ones they were registered against, however this does not effect the functionality of Consul or Nomad (Updating a Service definition via the API does not update Checks consul#5631).
There is an outstanding Nomad bug where the consul reconciler races against task restoration on a host, which means when restarting or updating a client, services will be removed from consul and re-registered, causing a 1-health-check-period status flap. This PR does not attempt to solve that bug as it will require a larger refactor of the client startup flow to be reliable, thus this PR does not attempt to handle older registrations outside of removing them as expected.