-
Notifications
You must be signed in to change notification settings - Fork 302
Delay between fleet service submit and actual systemd service start #1362
Comments
AfterIndex watcher option usage ensures registry event watcher won't miss events that trigger Agent reconciliation. Fixes coreos#1362
Fixed problem for us just by adding |
Hi guys! |
Could you explain in a little more detail how the afterindex change helped? (how were the events getting missed?). thanks! |
Would love to know more about how you're missing events, though |
Hi @jonboulle ! |
Very tricky thing here is that const in
^ Update is tied to that time. So no matter what |
This commit bcwaldon@bdf5f72 states
Actually content of events is used here:
So |
@paveltiunov which etcd version do you use for fleet? |
|
@paveltiunov etcd 0.4.x has lots of bugs and architectural flaws. can you upgrade your etcd cluster to the latest 2.2? Please note that you have to upgrade 0.4.x to 2.0.x, then 2.0.x to 2.1.x, and then 2.1.x to 2.2.x. |
Relates to #1363 |
@paveltiunov were you able to upgrade etcd cluster and test fleet with the new one? |
Bumping milestone for now. @paveltiunov As @kayrus mentioned, we'd love to know if you can replicate the problem with a recent (2.2+) version of etcd |
@kayrus is there some automatic way to upgrade? Upgrading production environment with proposed method doesn't seem reasonable for us. What release version of CoreOS is using etcd 2.2? |
@paveltiunov latest coreos stable release ships etcd 2.2.0. and I would like to inform you that etcd v0.4.x binary will be removed in future coreos releases. |
@paveltiunov, automatic upgrades work in 2.x branch only. but I strongly recommend you to create a backup before each upgrade. |
@kayrus Sorry. My bad. I forgot etcd 2 is
So we're up to date here. |
@paveltiunov great. so we have to get more info at this point.
|
I had a closer look into this issue. Even if Also it's true that fleet doesn't make use of content of the returned event, in agent or engine, as noted in bcwaldon@bdf5f72. I think it's hard to find a logical relation between The only scenario I can imagine is etcd being overloaded due to a well-known issue, which results in temporary event misses. That could be eventually fixed by using etcd v3 and fleet's recent versions. Even if it's worth setting AfterIndex to non-zero, I think the long term plan should be just to change fleet to use new API from etcd v3, based on protobuf. So I would suggest closing this issue and #1363, unless the original author could give another feedback. |
Hi guys!
Great job on CoreOS!
We're using fleet to run high frequency services: there are many starts and stops and they should be performed as fast as possible.
We're experiencing significant delays about 2-3 seconds between submit of a service to fleet and it's actual start with systemd.
Tried to tweak
engine_reconcile_interval
and no effect so far.What I see in logs with
verbosity=10
andjournalctl -f -u fleet.service | grep Reconciler
:Looks like there is lack of "Reconciler triggered" message for AgentReconciler.
Do you have any suggestions we can start from to solve this problem?
Fleet version is 0.10.2.
The text was updated successfully, but these errors were encountered: