You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While tracking down a potential cause of #9307, I noticed serviceHook.deregister is called many times when stopping a job... including jobs that have no task services. We call deregistertwice from each of Exited, PreKilling, and Stop, regardless if there are any task services to dereigister. At no point does the absense of services to deregister get optimized out of the reconciliation with Consul, and the result is 6+* syncs with Consul that do nothing and did not need to happen.
* It's actually more than 6, but I can't explain the others yet ... looks like PreKill gets called more than once, like 3x, so that would explain 10 syncs total.
print trace on task stop
calling groupServiceHook.PreKill()
ServiceClient.RemoveWorkload from: groupServiceHook.dereigster (noCanary)
SH Remove service: _nomad-task-86bc61d3-6bd9-214c-f356-ef4d3802e9a5-group-api-count-api-9001
commit ops, from: remove-workload
ServiceClient.RemoveWorkload from: groupServiceHook.deregister (canary)
SH Remove service: _nomad-task-86bc61d3-6bd9-214c-f356-ef4d3802e9a5-group-api-count-api-9001
commit ops, from: remove-workload
SYNC(operations) @ 2021-06-21 15:52:43.406748182 -0500 CDT m=+54.063630885...
serviceHook.deregister from: serviceHook.PreKilling
serviceHook.getWorkloadService, services: []
ServiceClient.RemoveWorkload from: serviceHook.deregister (noCanary)
commit ops, from: remove-workload
serviceHook.deregister from: serviceHook.PreKilling
serviceHook.getWorkloadService, services: []
ServiceClient.RemoveWorkload from: serviceHook.deregister (noCanary)
commit ops, from: remove-workload
ServiceClient.RemoveWorkload from: serviceHook.deregister (canary)
commit ops, from: remove-workload
ServiceClient.RemoveWorkload from: serviceHook.deregister (canary)
commit ops, from: remove-workload
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove service id: _nomad-task-86bc61d3-6bd9-214c-f356-ef4d3802e9a5-group-api-count-api-9001
-> killing service /_nomad-task-86bc61d3-6bd9-214c-f356-ef4d3802e9a5-group-api-count-api-9001
killed service /_nomad-task-86bc61d3-6bd9-214c-f356-ef4d3802e9a5-group-api-count-api-9001
maybe remove service id: _nomad-task-86bc61d3-6bd9-214c-f356-ef4d3802e9a5-group-api-count-api-9001-sidecar-proxy
-> skip C
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
END SYNC(operations)
SYNC(operations) @ 2021-06-21 15:52:43.409282508 -0500 CDT m=+54.066165221...
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
END SYNC(operations)
SYNC(operations) @ 2021-06-21 15:52:43.411934106 -0500 CDT m=+54.068816799...
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
END SYNC(operations)
SYNC(operations) @ 2021-06-21 15:52:43.413364745 -0500 CDT m=+54.070247458...
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
END SYNC(operations)
SYNC(operations) @ 2021-06-21 15:52:43.414354655 -0500 CDT m=+54.071237359...
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
END SYNC(operations)
SYNC(operations) @ 2021-06-21 15:52:43.415381657 -0500 CDT m=+54.072264380...
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
END SYNC(operations)
2021-06-21T15:52:43.462-0500 [INFO] client.driver_mgr.docker: stopped container: container_id=94b087d772b04675fd50fd73eee7ba828169f4c325addc7ea8660dc1ea81e0ab driver=docker
serviceHook.deregister from: serviceHook.Exited
serviceHook.getWorkloadService, services: []
ServiceClient.RemoveWorkload from: serviceHook.deregister (noCanary)
commit ops, from: remove-workload
ServiceClient.RemoveWorkload from: serviceHook.deregister (canary)
commit ops, from: remove-workload
SYNC(operations) @ 2021-06-21 15:52:43.469069665 -0500 CDT m=+54.125952378...
calling groupServiceHook.PreKill()
ServiceClient.RemoveWorkload from: groupServiceHook.dereigster (noCanary)
SH Remove service: _nomad-task-86bc61d3-6bd9-214c-f356-ef4d3802e9a5-group-api-count-api-9001
commit ops, from: remove-workload
ServiceClient.RemoveWorkload from: groupServiceHook.deregister (canary)
SH Remove service: _nomad-task-86bc61d3-6bd9-214c-f356-ef4d3802e9a5-group-api-count-api-9001
commit ops, from: remove-workload
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
END SYNC(operations)
SYNC(operations) @ 2021-06-21 15:52:43.470338295 -0500 CDT m=+54.127220988...
serviceHook.deregister from: serviceHook.Stop
serviceHook.getWorkloadService, services: []
ServiceClient.RemoveWorkload from: serviceHook.deregister (noCanary)
commit ops, from: remove-workload
ServiceClient.RemoveWorkload from: serviceHook.deregister (canary)
commit ops, from: remove-workload
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
END SYNC(operations)
SYNC(operations) @ 2021-06-21 15:52:43.471226022 -0500 CDT m=+54.128108715...
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
END SYNC(operations)
SYNC(operations) @ 2021-06-21 15:52:43.472036172 -0500 CDT m=+54.128918865...
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
END SYNC(operations)
SYNC(operations) @ 2021-06-21 15:52:43.47285652 -0500 CDT m=+54.129739223...
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
END SYNC(operations)
SYNC(operations) @ 2021-06-21 15:52:43.473637114 -0500 CDT m=+54.130519807...
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
END SYNC(operations)
2021-06-21T15:52:53.518-0500 [INFO] client.driver_mgr.docker: stopped container: container_id=b495f130a7ab03e19d59dad5e2d84072650bba965963afae12aa3b376a917ec1 driver=docker
serviceHook.deregister from: serviceHook.Exited
serviceHook.getWorkloadService, services: []
ServiceClient.RemoveWorkload from: serviceHook.deregister (noCanary)
commit ops, from: remove-workload
ServiceClient.RemoveWorkload from: serviceHook.deregister (canary)
commit ops, from: remove-workload
SYNC(operations) @ 2021-06-21 15:52:53.525461567 -0500 CDT m=+64.182344290...
serviceHook.deregister from: serviceHook.Stop
serviceHook.getWorkloadService, services: []
ServiceClient.RemoveWorkload from: serviceHook.deregister (noCanary)
commit ops, from: remove-workload
ServiceClient.RemoveWorkload from: serviceHook.deregister (canary)
commit ops, from: remove-workload
2021-06-21T15:52:53.527-0500 [INFO] client.alloc_runner: killing task: alloc_id=86bc61d3-6bd9-214c-f356-ef4d3802e9a5 task=connect-proxy-count-api
2021-06-21T15:52:53.527-0500 [INFO] client.gc: marking allocation for GC: alloc_id=86bc61d3-6bd9-214c-f356-ef4d3802e9a5
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
END SYNC(operations)
SYNC(operations) @ 2021-06-21 15:52:53.529104449 -0500 CDT m=+64.185987162...
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
END SYNC(operations)
SYNC(operations) @ 2021-06-21 15:52:53.530041329 -0500 CDT m=+64.186924033...
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
END SYNC(operations)
SYNC(operations) @ 2021-06-21 15:52:53.531021542 -0500 CDT m=+64.187904245...
maybe remove service id: _nomad-client-ypilmc5s7ufrslte5wisd5qj52tgy6ol
-> skip A
maybe remove service id: _nomad-server-hmlqn3wkqsbobqnrz3lbxiofcyr3lv4f
-> skip A
maybe remove service id: _nomad-server-nwdddf6h2o4nrsovinzt3m5a3xahfv2m
-> skip A
maybe remove service id: _nomad-server-pfad5tnxrcovuhqd2cjfwmuu27qckjzr
-> skip A
maybe remove check id: _nomad-check-7bce93dbe8acae494625366cc1b08b2d1a689652
-> skip A
maybe remove check id: _nomad-check-a16b004d61a3ec73ceec25a9e327a13c27ae0cf6
-> skip A
maybe remove check id: _nomad-check-aa54236a98ee5a3bd394aa76f46955ff42e4273c
-> skip A
maybe remove check id: _nomad-check-d9af14be9a42cd9ef46a2d23fcf1043b7b782eba
-> skip A
END SYNC(operations)
calling groupServiceHook.Postrun()
Note that a correct implementation would trigger SYNC(operations) only once per human action.
The text was updated successfully, but these errors were encountered:
This PR makes it so the Consul sync logic will ignore operations that
do not specify an action to take (i.e. [de-]register [services|checks]).
Ideally such noops would be discarded at the callsites (i.e. users
of [Create|Update|Remove]Workload], but we can also be defensive
at the commit point.
Also adds 2 trace logging statements which are helpful for diagnosing
sync operations with Consul - when they happen and why.
Fixes#10797
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
While tracking down a potential cause of #9307, I noticed
serviceHook.deregister
is called many times when stopping a job... including jobs that have no task services. We callderegister
twice from each ofExited
,PreKilling
, andStop
, regardless if there are any task services to dereigister. At no point does the absense of services to deregister get optimized out of the reconciliation with Consul, and the result is 6+* syncs with Consul that do nothing and did not need to happen.* It's actually more than 6,
but I can't explain the others yet... looks like PreKill gets called more than once, like 3x, so that would explain 10 syncs total.print trace on task stop
Note that a correct implementation would trigger SYNC(operations) only once per human action.
The text was updated successfully, but these errors were encountered: