Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeepAlive in etcd client cannot ensure the permanent validity of lease #2654

Closed
px3303 opened this issue Jan 12, 2022 · 0 comments · Fixed by #2655
Closed

KeepAlive in etcd client cannot ensure the permanent validity of lease #2654

px3303 opened this issue Jan 12, 2022 · 0 comments · Fixed by #2655

Comments

@px3303
Copy link
Contributor

px3303 commented Jan 12, 2022

KeepAlive in etcd client cannot ensure the permanent validity of lease. If automatic renewal of trillian lease is interrupted due to unexpected circumstances, trillian will still continue operating. However, the program will never be detected by ETCD server, and as a result it can no longer provide services. In addition, external process health check program cannot discover abnormalities and warnings cannot be sent to system administrators in time.

// AnnounceSelf announces this binary's presence to etcd.
func AnnounceSelf(ctx context.Context, client *clientv3.Client, etcdService, endpoint string) func() {
	if client == nil {
		return func() {}
	}

	// Get a lease so our entry self-destructs.
	leaseRsp, err := client.Grant(ctx, 30)
	if err != nil {
		glog.Exitf("Failed to get lease from etcd: %v", err)
	}
	client.KeepAlive(ctx, leaseRsp.ID)
...
// recvKeepAlive updates a lease based on its LeaseKeepAliveResponse
func (l *lessor) recvKeepAlive(resp *pb.LeaseKeepAliveResponse) {
	karesp := &LeaseKeepAliveResponse{
		ResponseHeader: resp.GetHeader(),
		ID:             LeaseID(resp.ID),
		TTL:            resp.TTL,
	}

	l.mu.Lock()
	defer l.mu.Unlock()

	ka, ok := l.keepAlives[karesp.ID]
	if !ok {
		return
	}

	if karesp.TTL <= 0 {
		// lease expired; close all keep alive channels
		delete(l.keepAlives, karesp.ID)
		ka.close()
		return
	}
...

Meanwhile, because trillian did not consume LeaseKeepAliveResponse channel immediately, ETCD client would print a large number of warnings #2249 .

@px3303 px3303 changed the title keepAlive in etcd client cannot ensure the permanent validity of lease KeepAlive in etcd client cannot ensure the permanent validity of lease Jan 12, 2022
mhutchinson added a commit that referenced this issue Jan 19, 2022
This PR can let trillian proactively Listen "LeaseKeepAliveResponse" channel returned by KeepAlive in ETCD client. When automatic renewal interruption is detected, Exit the program by canceling the context.

Fixes #2654,#2249

Co-authored-by: Simba Peng <[email protected]>
Co-authored-by: Martin Hutchinson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant