Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using lease for cluster remote server healthz checker. #249

Merged
merged 1 commit into from
Apr 28, 2021

Conversation

zyjhtangtang
Copy link
Contributor

@zyjhtangtang zyjhtangtang commented Apr 2, 2021

Ⅰ. Describe what this PR does

In yurthub, the existing cloud-side network check uses the health check interface of apiserver. Because the data packet of the health check interface is very small, when the cloud edge weak network, the health check interface can request data normally, but other requests cannot. It is inaccurate for health check. Therefore, the lease request is used to do a health check.
(1)Yurthub sends a lease request to report the heartbeat of the node, and serves as the basis for the health of the cloud-side network.
(2)Yurthub caches lease data, and kubelet's lease requests all use locally cached data.
(3)Add interface Run() for HealthChecker.
(4)The initialization certificate no longer depends on the health check.
(5)Adjust the startup sequence of the health check module in yurthub.
(6)When the request from the cloud fails, the data is fetched from the local cache.
(7)Add UT.

Ⅱ. Does this pull request fix one issue?

NONE
Optimize cloud-side network inspection for yurthub.

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

if url == nil {
for _, server := range kcm.remoteServers {
if kcm.checker.IsHealthy(server) {
s = server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about add a break statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -27,6 +27,7 @@ import (
"github.com/openyurtio/openyurt/pkg/yurthub/proxy/remote"
"github.com/openyurtio/openyurt/pkg/yurthub/proxy/util"
"github.com/openyurtio/openyurt/pkg/yurthub/transport"
util2 "github.com/openyurtio/openyurt/pkg/yurthub/util"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

util2 --> hubutil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -122,6 +122,17 @@ func ReqInfoString(info *apirequest.RequestInfo) string {
return fmt.Sprintf("%s %s for %s", info.Verb, info.Resource, info.Path)
}

func IsKubeletLeaseReq(req *http.Request) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment for func

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return checker.isHealthy()
}
//if there is not checker for server, default healthy.
return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the status of server is healthy when no checker for server?

Copy link
Contributor Author

@zyjhtangtang zyjhtangtang Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is not checker for server, default healthy, so certificate can be made up before healthcheker run.

}

func (hcm *healthCheckerManager) getChecker() *checker {
hcm.RLock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about delete lock of healthCheckerManager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


return false
func (hcm *healthCheckerManager) sync() {
Copy link
Member

@rambohe-ch rambohe-ch Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments to explain the business of checking healthy status of server by lease

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that the node heartbeat can be reported when there is a healthy remote server

serverHealthzURL := *url
if serverHealthzURL.Path == "" || serverHealthzURL.Path == "/" {
serverHealthzURL.Path = "/healthz"
func (hcm *healthCheckerManager) backoffEnsureLease(checker *checker) (*coordinationv1.Lease, bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please define Lease process in a interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Timeout: hcm.heartbeatTimeoutSeconds,
}
return cfg

}

func PingClusterHealthz(client *http.Client, addr string) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PingClusterHealthz() has not been used now. how about delete it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yurtctl uses this method to detect the health status of yurthub when setup yurthub.

lease.Spec.RenewTime = &metav1.MicroTime{Time: hcm.clock.Now()}

if lease.OwnerReferences == nil || len(lease.OwnerReferences) == 0 {
if node, err := checker.client.CoreV1().Nodes().Get(hcm.holderIdentity, metav1.GetOptions{}); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get node.UID from yurt-hub env instead of getting from kube-apiserver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the env of yurt-hub have not node.UID.

return lease
}

func (hcm *healthCheckerManager) storageLease(lease *coordinationv1.Lease) error {
Copy link
Member

@rambohe-ch rambohe-ch Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we only need to update lease by using storageWrapper directly, not need to verify resourceVersion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return err
}
klog.Infof("%d. run health checker for remote servers", trace)
healthChecker.Run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add healthChecker.Run() after healthchecker.NewHealthChecker()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if !rp.cacheMgr.CanCacheFor(req) {
klog.Errorf("can not cache for %s", util.ReqString(req))
rw.WriteHeader(http.StatusBadGateway)
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about write some error message to rw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to the default errHandler, do not write error messages. only return 502

var errInfo error
obj, errQuery := rp.cacheMgr.QueryCache(req)
if errQuery == storage.ErrStorageNotFound {
reqInfo, _ := apirequest.RequestInfoFrom(req.Context())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about use info directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}
}
rw.WriteHeader(http.StatusBadGateway)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about write some error message to rw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to the default errHandler, do not write error messages. only return 502

var timeout time.Duration
if info.Verb == "watch" {
timeout = time.Duration(*opts.TimeoutSeconds+watchTimeoutMargin) * time.Second
} else if *opts.TimeoutSeconds > getAndListTimeoutReduce {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add ut testcases for get/list request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@zyjhtangtang zyjhtangtang changed the title using lease for cluster remote server healthz. using lease for cluster remote server healthz checker. Apr 27, 2021
Copy link
Member

@rambohe-ch rambohe-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rambohe-ch rambohe-ch merged commit a8ebd34 into openyurtio:master Apr 28, 2021
MrGirl pushed a commit to MrGirl/openyurt that referenced this pull request Mar 29, 2022
using lease for cluster remote server healthz checker.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants