Proposal Apply Rate spikes #8096

gyuho · 2017-06-14T17:47:40Z

@xiang90 and me found the root cause and have solution. Just create this for reference.

Our testing cluster with kubemark showed spikes in sum(rate(etcd_server_proposals_applied_total [1m])).

Logs around the spike:

# node 1
2017-06-14 17:01:50.527744 I | etcdserver: start to snapshot (applied: 2900060, lastsnap: 2800057)
2017-06-14 17:01:50.536900 I | etcdserver: saved snapshot at index 2900060
2017-06-14 17:01:50.537312 I | etcdserver: compacted raft log at 2895060
2017-06-14 17:01:55.759008 I | pkg/fileutil: purged file /root/etcd/member/snap/0000000000000006-0000000000249f35.snap successfully
2017-06-14 17:02:29.528986 I | mvcc: store.index: compact 2228138
2017-06-14 17:02:29.649299 I | mvcc: finished scheduled compaction at 2228138 (took 117.021781ms)

# node 2
2017-06-14 17:01:50.527419 I | etcdserver: start to snapshot (applied: 2900060, lastsnap: 2800057)
2017-06-14 17:01:50.536373 I | etcdserver: saved snapshot at index 2900060
2017-06-14 17:01:50.536692 I | etcdserver: compacted raft log at 2895060
2017-06-14 17:02:17.497063 I | pkg/fileutil: purged file /root/etcd/member/snap/0000000000000006-0000000000249f35.snap successfully
2017-06-14 17:02:29.528974 I | mvcc: store.index: compact 2228138
2017-06-14 17:02:29.651662 I | mvcc: finished scheduled compaction at 2228138 (took 118.936079ms)
2017-06-14 17:03:12.529455 N | compactor: Starting auto-compaction at revision 2117484
2017-06-14 17:03:12.531058 N | compactor: Finished auto-compaction at revision 2117484
2017-06-14 17:04:59.707802 I | raft: 88087a036e7c60e1 [logterm: 9, index: 2943624, vote: 88087a036e7c60e1] ignored MsgVote from 69df290beb2b8a14 [logterm: 9, index: 2943623] at term 9: lease is not expired (remaining ticks: 7)

# node 3
2017-06-14 16:52:29.496868 I | mvcc: finished scheduled compaction at 2207578 (took 118.841479ms)
2017-06-14 16:57:29.451265 I | mvcc: store.index: compact 2217813
2017-06-14 16:57:29.574574 I | mvcc: finished scheduled compaction at 2217813 (took 119.858753ms)
2017-06-14 17:01:50.527964 I | etcdserver: start to snapshot (applied: 2900060, lastsnap: 2800056)
2017-06-14 17:01:50.536614 I | etcdserver: saved snapshot at index 2900060
2017-06-14 17:01:50.536890 I | etcdserver: compacted raft log at 2895060
2017-06-14 17:02:04.694354 I | pkg/fileutil: purged file /root/etcd/member/snap/0000000000000006-0000000000249f34.snap successfully
2017-06-14 17:02:29.529014 I | mvcc: store.index: compact 2228138
2017-06-14 17:02:29.649298 I | mvcc: finished scheduled compaction at 2228138 (took 117.061855ms)

# node 4
2017-06-14 16:57:29.574395 I | mvcc: finished scheduled compaction at 2217813 (took 120.044099ms)
2017-06-14 17:01:50.528166 I | etcdserver: start to snapshot (applied: 2900060, lastsnap: 2800056)
2017-06-14 17:01:50.546346 I | etcdserver: saved snapshot at index 2900060
2017-06-14 17:01:50.546639 I | etcdserver: compacted raft log at 2895060
2017-06-14 17:02:15.691987 I | pkg/fileutil: purged file /root/etcd/member/snap/0000000000000006-0000000000249f34.snap successfully
2017-06-14 17:02:29.529039 I | mvcc: store.index: compact 2228138
2017-06-14 17:02:29.659175 I | mvcc: finished scheduled compaction at 2228138 (took 126.962864ms)

# node 5
2017-06-14 17:01:50.528009 I | etcdserver: start to snapshot (applied: 2900060, lastsnap: 2800056)
2017-06-14 17:01:50.536932 I | etcdserver: saved snapshot at index 2900060
2017-06-14 17:01:50.537209 I | etcdserver: compacted raft log at 2895060
2017-06-14 17:02:15.967542 I | pkg/fileutil: purged file /root/etcd/member/snap/0000000000000006-0000000000249f34.snap successfully
2017-06-14 17:02:23.816398 I | wal: segmented wal file /root/etcd/member/wal/0000000000000031-00000000002cd00d.wal is created
2017-06-14 17:02:29.528944 I | mvcc: store.index: compact 2228138
2017-06-14 17:02:29.648869 I | mvcc: finished scheduled compaction at 2228138 (took 116.532844ms)
2017-06-14 17:02:46.152472 I | pkg/fileutil: purged file /root/etcd/member/wal/000000000000002c-000000000027ce1f.wal successfully
2017-06-14 17:04:59.707762 I | raft: 3213e0b0f8d70928 [term: 9] received a MsgVote message with higher term from 69df290beb2b8a14 [term: 10]
2017-06-14 17:04:59.707798 I | raft: 3213e0b0f8d70928 became follower at term 10

Nothing special on etcd server logs.

So we investigated the WAL entries, and found that all spikes are from lease revokes:

etcd-dump-logs --data-dir ./etcd-1 > etcd-1.log
etcd-dump-logs --data-dir ./etcd-2 > etcd-2.log
etcd-dump-logs --data-dir ./etcd-3 > etcd-3.log
etcd-dump-logs --data-dir ./etcd-4 > etcd-4.log
etcd-dump-logs --data-dir ./etcd-5 > etcd-5.log

   9	   2899991	norm	header:<ID:6980962663844685081 > lease_revoke:<ID:659879111853478908 > 
   9	   2899992	norm	header:<ID:6980962663844685082 > lease_revoke:<ID:659879111853503344 > 
   9	   2899993	norm	header:<ID:6980962663844685058 > lease_revoke:<ID:659879111853427852 >

Dashboard with rate(etcd_debugging_server_lease_expired_total[1m]) also confirms this in the run-etcd-1-2 node:

If we further investigate the specific logs from run-etcd-1-2 node (since latest snapshot log dump only showed lease revokes):

etcd-dump-logs --data-dir ./etcd-2 -start-snap 0000000000000006-0000000000293318.snap > etcd-2-previous.log

   6	   2730283	norm	header:<ID:659879111853503345 > lease_grant:<TTL:3600 ID:659879111853503344 > 
   6	   2730284	norm	header:<ID:6980962663844573262 > txn:<compare:<target:MOD key:"/registry/services/endpoints/kube-system/kube-controller-manager" mod_revision:2106700 > success:<request_put:<key:"/registry/services/endpoints/kube-system/kube-controller-manager" value:"k8s\000\n\017\n\002v1\022\tEndpoints\022\376\002\n\373\002\n\027kube-controller-manager\022\000\032\013kube-system\"@/api/v1/namespaces/kube-system/endpoints/kube-controller-manager*$48c6d4eb-508a-11e7-bdd4-42010af0000b2\0008\000B\014\010\203\327\201\312\005\020\311\224\253\344\001b\324\001\n(control-plane.alpha.kubernetes.io/leader\022\247\001{\"holderIdentity\":\"kubernetes-kubemark-master\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2017-06-13T22:47:31Z\",\"renewTime\":\"2017-06-14T15:57:53Z\",\"leaderTransitions\":0}z\000\032\000\"\000" > > failure:<request_range:<key:"/registry/services/endpoints/kube-system/kube-controller-manager" > > >

TTL was 3600 (1-hour), and there was leader election an hour before.
And when leader election happens, we renew all the lease, and those renewed lease are being revoked all at the same time the hour later, thus spikes.

We plan to randomize leases on recovery, in addition to rate limiting lease revokes.

The text was updated successfully, but these errors were encountered:

xiang90 · 2017-06-14T17:55:10Z

xref: kubernetes/kubernetes#47532

xinxiaogang · 2017-06-14T18:02:41Z

@gyuho @xiang90 So what's the solution?

gyuho · 2017-06-14T18:03:03Z

@xinxiaogang

We plan to randomize leases on recovery, in addition to rate limiting lease revokes.

Address etcd-io#8096. Signed-off-by: Gyu-Ho Lee <[email protected]>

Randomize the very first expiry on lease recovery to prevent recovered leases from expiring all at the same time. Address etcd-io#8096. Signed-off-by: Gyu-Ho Lee <[email protected]>

gyuho · 2017-06-16T01:00:33Z

I manually triggered leader election with master, and we don't see the spikes anymore.

xiang90 · 2017-06-16T01:02:53Z

You need to wait one or two hours to allow leases to be accumulated before triggered a election. Then wait another hour to check spikes.

xiang90 · 2017-06-16T01:03:45Z

Also k8s tests need to be running against etcd.

armstrongli · 2017-06-16T07:01:19Z

@gyuho The spick doesn't appear for every leader election. Actually there're events generated every second in kubernetes cluster.
The solution We plan to randomize leases on recovery, in addition to rate limiting lease revokes. can reduce symptoms, I believe.

xiang90 · 2017-06-16T21:38:58Z

i can confirm it is fixed with our new test result. @heyitsanthony is working on a better smooth algo though.

Instead of unconditionally randomizing, extend leases on promotion if too many leases expire within the same time span. If the server has few leases or spread out expires, there will be no extension. Squashed previous commits for #8149. This is a combination of 4 commits below: lease: randomize expiry on initial refresh call Randomize the very first expiry on lease recovery to prevent recovered leases from expiring all at the same time. Address #8096. Signed-off-by: Gyu-Ho Lee <[email protected]> integration: remove lease exist checking on randomized expiry Lease with TTL 5 should be renewed with randomization, thus it's still possible to exist after 3 seconds. Signed-off-by: Gyu-Ho Lee <[email protected]> lessor: extend leases on promote if expires will be rate limited Instead of unconditionally randomizing, extend leases on promotion if too many leases expire within the same time span. If the server has few leases or spread out expires, there will be no extension. Revert "integration: remove lease exist checking on randomized expiry" This reverts commit 95bc33f. The new lease extension algorithm should pass this test.

Instead of unconditionally randomizing, extend leases on promotion if too many leases expire within the same time span. If the server has few leases or spread out expires, there will be no extension. Squashed previous commits for #8149. Author: Anthony Romano <[email protected]> This is a combination of 4 commits below: lease: randomize expiry on initial refresh call Randomize the very first expiry on lease recovery to prevent recovered leases from expiring all at the same time. Address #8096. integration: remove lease exist checking on randomized expiry Lease with TTL 5 should be renewed with randomization, thus it's still possible to exist after 3 seconds. lessor: extend leases on promote if expires will be rate limited Instead of unconditionally randomizing, extend leases on promotion if too many leases expire within the same time span. If the server has few leases or spread out expires, there will be no extension. Revert "integration: remove lease exist checking on randomized expiry" This reverts commit 95bc33f. The new lease extension algorithm should pass this test.

Randomize the very first expiry on lease recovery to prevent recovered leases from expiring all at the same time. Address etcd-io#8096. Signed-off-by: Gyu-Ho Lee <[email protected]>

xiang90 mentioned this issue Jun 14, 2017

spike in ETCD IO: raft apply, peer traffic #7981

Closed

gyuho mentioned this issue Jun 14, 2017

Rate-limit Lease Revoke #8097

Closed

gyuho added a commit to gyuho/etcd that referenced this issue Jun 14, 2017

lease: randomize TTL on recovery

3d1cfa5

Address etcd-io#8096. Signed-off-by: Gyu-Ho Lee <[email protected]>

gyuho mentioned this issue Jun 14, 2017

lease: randomize expiry on initial refresh call #8101

Merged

xiang90 added this to the v3.3.0 milestone Jun 16, 2017

xiang90 closed this as completed Jun 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal Apply Rate spikes #8096

Proposal Apply Rate spikes #8096

gyuho commented Jun 14, 2017 •

edited

Loading

xiang90 commented Jun 14, 2017

xinxiaogang commented Jun 14, 2017

gyuho commented Jun 14, 2017

gyuho commented Jun 16, 2017

xiang90 commented Jun 16, 2017

xiang90 commented Jun 16, 2017

armstrongli commented Jun 16, 2017

xiang90 commented Jun 16, 2017

Proposal Apply Rate spikes #8096

Proposal Apply Rate spikes #8096

Comments

gyuho commented Jun 14, 2017 • edited Loading

xiang90 commented Jun 14, 2017

xinxiaogang commented Jun 14, 2017

gyuho commented Jun 14, 2017

gyuho commented Jun 16, 2017

xiang90 commented Jun 16, 2017

xiang90 commented Jun 16, 2017

armstrongli commented Jun 16, 2017

xiang90 commented Jun 16, 2017

gyuho commented Jun 14, 2017 •

edited

Loading