You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nomad v0.8.4 (dbee1d7d051619e90a809c23cf7e55750900742a) Consul v1.2.0 Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
Operating system and Environment details
Server - Windows 2016 Datacenter, run on VM
Agent - Windows 2016 Datacenter, inside container microsoft/windowsservercore:latest.
Issue
Service maintenance mode check removed by nomad agent from consul. It is probably a bug related to #4170.
We have the following setup:
nomad agent and consul agent are installed on the same VM.
When running nomad job, it is successfully registered a service and it's health checks and it is visible from consul agent and server.
Then, we switch the service registered by nomad into maintenance mode, by directly calling the agent api (or running by command line).
consul maint -enable -service=:_nomad-task-7jnnnudilvhc7up4z6yjvm2vjwx576jw -reason "Testing"
For some time the service is shown in consul as it should be - in maintenance mode, then it is switched back to normal state.
Studying the logs of both nomad and consul agents show explicitly that consul agent receives a request from localhost to de-register service maintenance health check:
018/07/27 10:44:12 [INFO] agent: Service "_nomad-task-7jnnnudilvhc7up4z6yjvm2vjwx576jw" entered maintenance mode
2018/07/27 10:44:12 [DEBUG] agent: Service "_nomad-task-7jnnnudilvhc7up4z6yjvm2vjwx576jw" in sync
2018/07/27 10:44:12 [DEBUG] agent: Check "_service_maintenance:_nomad-task-7jnnnudilvhc7up4z6yjvm2vjwx576jw" in sync
2018/07/27 10:44:12 [DEBUG] http: Request PUT /v1/agent/service/maintenance/_nomad-task-7jnnnudilvhc7up4z6yjvm2vjwx576jw?enable=true&reason=Testing (42.0073ms) from=127.0.0.1:62209
...
...
2018/07/27 10:44:40 [DEBUG] http: Request PUT /v1/agent/check/deregister/_service_maintenance:_nomad-task-7jnnnudilvhc7up4z6yjvm2vjwx576jw (21.0081ms) from=127.0.0.1:52544
And it looks like this is triggered by nomad agent:
Corresponding long from nomad agent:
I have a feeling that nomad should not remove maintenance mode checks from services in consul in this case, though the rest should be synced as it works now according to #4170 .
The tests also showed that If node as a whole is set to maintenance, then it remains in this state until explicitly removed from maintenance mode.
P.S. There is no any traces at all in consul server and nomad servers logs.
Reproduction steps
Run consul agent and nomad agent on the same VM. Point nomad agent to consul on localhost: consul-address=127.0.0.1:8500. It is not matter will consul run in dev mode or consul agent will connect to the server.
Run nomad job with service registration
Set the registered (by nomad) service into maintenance mode
After some time the maintenance mode health-check is automatically removed
@i-prudnikov Thanks for the details and reproduction steps, I confirmed this behavior as well.
As part of #4170 we made an assumption that any checks registered on behalf of Nomad tasks are only created and managed by Nomad, so we remove extraneous checks that Nomad is not aware of. This plays badly with maintenance mode, which caused the behavior you saw.
We'll fix this in an upcoming release for maintenance mode to work. In general, any out of band registered checks for services that Nomad manages should still get removed. i.e if you want to register any checks, use the service stanza in Nomad to do so. Maintenance mode is a special case so we will fix that.
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad & Consul version
Nomad v0.8.4 (dbee1d7d051619e90a809c23cf7e55750900742a)
Consul v1.2.0 Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
Operating system and Environment details
Server - Windows 2016 Datacenter, run on VM
Agent - Windows 2016 Datacenter, inside container microsoft/windowsservercore:latest.
Issue
Service maintenance mode check removed by nomad agent from consul. It is probably a bug related to #4170.
We have the following setup:
nomad agent and consul agent are installed on the same VM.
When running nomad job, it is successfully registered a service and it's health checks and it is visible from consul agent and server.
Then, we switch the service registered by nomad into maintenance mode, by directly calling the agent api (or running by command line).
For some time the service is shown in consul as it should be - in maintenance mode, then it is switched back to normal state.
Studying the logs of both nomad and consul agents show explicitly that consul agent receives a request from localhost to de-register service maintenance health check:
And it looks like this is triggered by nomad agent:
Corresponding long from nomad agent:
I have a feeling that nomad should not remove maintenance mode checks from services in consul in this case, though the rest should be synced as it works now according to #4170 .
The tests also showed that If node as a whole is set to maintenance, then it remains in this state until explicitly removed from maintenance mode.
P.S. There is no any traces at all in consul server and nomad servers logs.
Reproduction steps
consul-address=127.0.0.1:8500
. It is not matter will consul run in dev mode or consul agent will connect to the server.Nomad Client logs (if appropriate)
The text was updated successfully, but these errors were encountered: