Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: re-list when target change #1195

Merged
merged 1 commit into from
Feb 10, 2023

Conversation

LaurenceLiZhixin
Copy link
Member

Force re-list pool-scope resource when proxy change between loadbalancer and pool-coordinator

@openyurt-bot openyurt-bot added the size/M size/M: 30-99 label Feb 7, 2023
@codecov
Copy link

codecov bot commented Feb 7, 2023

Codecov Report

Merging #1195 (5636359) into master (c524116) will decrease coverage by 0.20%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master    #1195      +/-   ##
==========================================
- Coverage   52.35%   52.16%   -0.20%     
==========================================
  Files         106      106              
  Lines       13852    13880      +28     
==========================================
- Hits         7252     7240      -12     
- Misses       5952     5991      +39     
- Partials      648      649       +1     
Flag Coverage Δ
unittests 52.16% <0.00%> (-0.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/yurthub/proxy/remote/loadbalancer.go 12.54% <0.00%> (ø)
pkg/yurthub/proxy/util/util.go 62.88% <0.00%> (-5.91%) ⬇️
pkg/yurthub/storage/etcd/storage.go 77.35% <0.00%> (-3.53%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Congrool
Copy link
Member

Congrool commented Feb 7, 2023

I still have some questions for such problem:

  1. client sent list/watch requests and they were handled by yurthub. Then the yurthub crashed and restarted. And at this time, the handler changed(from pool-coordinator to cloud APIServer or vice versa), will the client relist the resource?
  2. When the long time watch connection break is caused by the server-side(e.g. cloud-edge network breaks or pool-coordinator restarted), will yurthub always make the client relist?

@LaurenceLiZhixin
Copy link
Member Author

Thanks for your great comments @Congrool

  1. After my tests, yurthub restart will cause all existing watch requests to be disconnected from client side, and the client will trigger a re-list-watch. If the restart process occurs with the flow change from edge to cloud, after the stability, yurthub got ready coordinator status, and yurthub will cut the cloud to the edge again , the client-side re-list-watch will still be triggered by the error event injection provided by this pr.

  • If the cloud edge network is disconnected: If yurthub is directly connected to the cloud, the edge pool-coordinator is unavailable. Therefore, there is no problem.
  • If the pool-coordinator restarts, the existing watch requests are cut and the client re-list-watch. Then yurthub will determine that a pool-coordinator is abnormal, and trigger a re-list-watch for all existing watches through the error event injection provided by this pr, and then client side list-watch agent and yurthub proxy to the cloud. After the pool-coordinator restarted, the flow is changed to local and all re-list-watch are triggered again. There will be no problem after the experiment.

@Congrool
Copy link
Member

Congrool commented Feb 8, 2023

Sounds good to me

if lb.coordinatorGetter == nil {
continue
}
klog.Infof("lb.coordinatorGetter is %s", lb.coordinatorGetter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not need such log, please remove

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine!

coordinator := lb.coordinatorGetter()
if coordinator == nil {
klog.Infof("lb.coordinatorGetter().coordinator is nil")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

cloudServeCancel()
return
}
klog.Infof("coordinator.IsReady() is false")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@Congrool
Copy link
Member

Congrool commented Feb 8, 2023

Could you please squash your commits to make the commit message more readable?

@Congrool
Copy link
Member

Congrool commented Feb 9, 2023

Another question, is it possible that:
list request is first handled by one handler(for example cloud APIServer), but the later watch request is handled by another handler(for pool-coordinator becoming healthy)

Will it still have some problems in such situation?

@LaurenceLiZhixin
Copy link
Member Author

Another question, is it possible that: list request is first handled by one handler(for example cloud APIServer), but the later watch request is handled by another handler(for pool-coordinator becoming healthy)

Will it still have some problems in such situation?

Yes, that's still a problem. But it's very rare. I think we can talk about it and fix it totally later.

…hange(cloud to pool-coor or pool-coor to cloud)
@LaurenceLiZhixin
Copy link
Member Author

I've fixed your great comments above. @Congrool

@Congrool
Copy link
Member

LGTM as a solution for most cases, thanks for your contribution!

/lgtm

@rambohe-ch rambohe-ch merged commit 08e1619 into openyurtio:master Feb 10, 2023
@openyurt-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LaurenceLiZhixin, rambohe-ch

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openyurt-bot openyurt-bot added the approved approved label Feb 10, 2023
@rambohe-ch rambohe-ch added the backport release-v1.2 backport release-v1.2 label Feb 17, 2023
@rambohe-ch
Copy link
Member

/backport release-v1.2

@github-actions
Copy link

Successfully created backport PR for release-v1.2:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved approved backport release-v1.2 backport release-v1.2 lgtm lgtm size/M size/M: 30-99
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants