-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using lease for cluster remote server healthz checker. #249
Conversation
if url == nil { | ||
for _, server := range kcm.remoteServers { | ||
if kcm.checker.IsHealthy(server) { | ||
s = server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about add a break statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pkg/yurthub/proxy/proxy.go
Outdated
@@ -27,6 +27,7 @@ import ( | |||
"github.com/openyurtio/openyurt/pkg/yurthub/proxy/remote" | |||
"github.com/openyurtio/openyurt/pkg/yurthub/proxy/util" | |||
"github.com/openyurtio/openyurt/pkg/yurthub/transport" | |||
util2 "github.com/openyurtio/openyurt/pkg/yurthub/util" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
util2 --> hubutil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -122,6 +122,17 @@ func ReqInfoString(info *apirequest.RequestInfo) string { | |||
return fmt.Sprintf("%s %s for %s", info.Verb, info.Resource, info.Path) | |||
} | |||
|
|||
func IsKubeletLeaseReq(req *http.Request) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment for func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
99fe6ba
to
b7bcfff
Compare
return checker.isHealthy() | ||
} | ||
//if there is not checker for server, default healthy. | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the status of server is healthy when no checker for server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is not checker for server, default healthy, so certificate can be made up before healthcheker run.
} | ||
|
||
func (hcm *healthCheckerManager) getChecker() *checker { | ||
hcm.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about delete lock of healthCheckerManager
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
return false | ||
func (hcm *healthCheckerManager) sync() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments to explain the business of checking healthy status of server by lease
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that the node heartbeat can be reported when there is a healthy remote server
serverHealthzURL := *url | ||
if serverHealthzURL.Path == "" || serverHealthzURL.Path == "/" { | ||
serverHealthzURL.Path = "/healthz" | ||
func (hcm *healthCheckerManager) backoffEnsureLease(checker *checker) (*coordinationv1.Lease, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please define Lease process in a interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Timeout: hcm.heartbeatTimeoutSeconds, | ||
} | ||
return cfg | ||
|
||
} | ||
|
||
func PingClusterHealthz(client *http.Client, addr string) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PingClusterHealthz() has not been used now. how about delete it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yurtctl uses this method to detect the health status of yurthub when setup yurthub.
lease.Spec.RenewTime = &metav1.MicroTime{Time: hcm.clock.Now()} | ||
|
||
if lease.OwnerReferences == nil || len(lease.OwnerReferences) == 0 { | ||
if node, err := checker.client.CoreV1().Nodes().Get(hcm.holderIdentity, metav1.GetOptions{}); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get node.UID from yurt-hub env instead of getting from kube-apiserver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the env of yurt-hub have not node.UID.
return lease | ||
} | ||
|
||
func (hcm *healthCheckerManager) storageLease(lease *coordinationv1.Lease) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we only need to update lease by using storageWrapper directly, not need to verify resourceVersion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
return err | ||
} | ||
klog.Infof("%d. run health checker for remote servers", trace) | ||
healthChecker.Run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add healthChecker.Run()
after healthchecker.NewHealthChecker()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
if !rp.cacheMgr.CanCacheFor(req) { | ||
klog.Errorf("can not cache for %s", util.ReqString(req)) | ||
rw.WriteHeader(http.StatusBadGateway) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about write some error message to rw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to the default errHandler, do not write error messages. only return 502
pkg/yurthub/proxy/remote/remote.go
Outdated
var errInfo error | ||
obj, errQuery := rp.cacheMgr.QueryCache(req) | ||
if errQuery == storage.ErrStorageNotFound { | ||
reqInfo, _ := apirequest.RequestInfoFrom(req.Context()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about use info
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
} | ||
rw.WriteHeader(http.StatusBadGateway) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about write some error message to rw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to the default errHandler, do not write error messages. only return 502
pkg/yurthub/proxy/util/util.go
Outdated
var timeout time.Duration | ||
if info.Verb == "watch" { | ||
timeout = time.Duration(*opts.TimeoutSeconds+watchTimeoutMargin) * time.Second | ||
} else if *opts.TimeoutSeconds > getAndListTimeoutReduce { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add ut testcases for get/list request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
6eb7888
to
8b5cccf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
using lease for cluster remote server healthz checker.
Ⅰ. Describe what this PR does
In yurthub, the existing cloud-side network check uses the health check interface of apiserver. Because the data packet of the health check interface is very small, when the cloud edge weak network, the health check interface can request data normally, but other requests cannot. It is inaccurate for health check. Therefore, the lease request is used to do a health check.
(1)Yurthub sends a lease request to report the heartbeat of the node, and serves as the basis for the health of the cloud-side network.
(2)Yurthub caches lease data, and kubelet's lease requests all use locally cached data.
(3)Add interface Run() for HealthChecker.
(4)The initialization certificate no longer depends on the health check.
(5)Adjust the startup sequence of the health check module in yurthub.
(6)When the request from the cloud fails, the data is fetched from the local cache.
(7)Add UT.
Ⅱ. Does this pull request fix one issue?
NONE
Optimize cloud-side network inspection for yurthub.
Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews