Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Yurthub cloud mode is not working as intended. #600

Closed
DrmagicE opened this issue Nov 17, 2021 · 1 comment · Fixed by #607
Closed

[BUG] Yurthub cloud mode is not working as intended. #600

DrmagicE opened this issue Nov 17, 2021 · 1 comment · Fixed by #607
Labels
kind/bug kind/bug

Comments

@DrmagicE
Copy link
Member

Background

The Yurthub "working mode" feature was introduced via #483 , which distinguishes cloud node and edge node by setting the --working-mode flag of yurthub to cloud and edge respectively.

This feature aims to enable service topology feature in cloud side, as a prerequisite for enabling yurt-tunnel DNS mode.

#270 provides some background knowledge about what yurt-tunnel DNS mode is and how to use it

What happened

When Yurthub is running in cloud mode, the local cache manager is disabled, which means no data will be written to the local disk. However, the servicetopology filter (and maybe other components too) in yurthub will try to read data from disk cache:

func (ssf *serviceTopologyFilter) Approve(comp, resource, verb string) bool {
if !ssf.Approver.Approve(comp, resource, verb) {
return false
}
if ok := cache.WaitForCacheSync(ssf.stopCh, ssf.nodeSynced, ssf.serviceSynced, ssf.nodePoolSynced); !ok {
return false
}
return true

The ssf.nodeSynced in the above code will try to sync node state from the local disk and hang forever.

ssf.nodeSynced = func() bool {
obj, err := s.Get(nodeKey)
if err != nil || obj == nil {
return false
}
if _, ok := obj.(*v1.Node); !ok {
return false
}
return true
}

How to fix

We should check all usages of the StorageWrapper interface, and make them compatible with yurthub in cloud mode.

type StorageWrapper interface {
Create(key string, obj runtime.Object) error
Delete(key string) error
Get(key string) (runtime.Object, error)
ListKeys(key string) ([]string, error)
List(key string) ([]runtime.Object, error)
Update(key string, obj runtime.Object) error
Replace(rootKey string, objs map[string]runtime.Object) error
DeleteCollection(rootKey string) error
GetRaw(key string) ([]byte, error)
UpdateRaw(key string, contents []byte) error
}

A Temporary Solution

For users who encounter the same problem, here is a temporary solution:

# Edit the yurthub manifest on all your cloud nodes(usually located at /etc/kubernetes/manifests/yurt-hub.yaml) , 
# switch --working-mode to edge 
# and add --disabled-resource-filters=discardcloudservice to disable discardcloudservice filter.
    command:
    - yurthub
    - --v=2
    - --server-addr=https://xxx
    - --node-name=$(NODE_NAME)
    - --join-token=xxx
    - --working-mode=edge
    - --disabled-resource-filters=discardcloudservice

/kind bug

@DrmagicE DrmagicE added the kind/bug kind/bug label Nov 17, 2021
@rambohe-ch
Copy link
Member

@DrmagicE Thank you for raising issue.
I have forgot that servicetopology filter has used cachemanager, and the background of using cachemanager for syncing node and yurthub does not list/watch node from kube-apiserver directly is eliminate the traffic between yurthub and kube-apiserver, so use node cache of kubelet directly as following:

func (ssf *serviceTopologyFilter) SetStorageWrapper(s cachemanager.StorageWrapper) error {
	if len(ssf.nodeName) == 0 {
		return fmt.Errorf("node name for serviceTopologyFilter is not ready")
	}

	nodeKey := fmt.Sprintf("kubelet/nodes/%s", ssf.nodeName)  <-- use kubelet node cache
	ssf.nodeSynced = func() bool {
		obj, err := s.Get(nodeKey)
		if err != nil || obj == nil {
			return false
		}

		if _, ok := obj.(*v1.Node); !ok {
			return false
		}

		return true
	}
  • solution:
  1. In my opinion, yurthub list/watch node from kube-apiserver is not good because traffic between cloud and edge will up, so public network cost will up.
  2. maybe we need to cache node on local disk even for cloud mode of yurthub
  3. so we need to add a cache mechanism for only storing node resource for kubelet list/watch request. i will consider the solution later, @DrmagicE if you have any ideas, please let me know.

rambohe-ch added a commit to rambohe-ch/openyurt that referenced this issue Nov 18, 2021
--> solution: service topology filter will list/watch node from kube-apsierver if working mode is cloud
fixes openyurtio#600

2. optimize shared informers registeration. extract all of informers registeration and
make a comman function named registerInformers
openyurt-bot pushed a commit that referenced this issue Nov 19, 2021
#607)

--> solution: service topology filter will list/watch node from kube-apsierver if working mode is cloud
fixes #600

2. optimize shared informers registeration. extract all of informers registeration and
make a comman function named registerInformers
MrGirl pushed a commit to MrGirl/openyurt that referenced this issue Mar 29, 2022
openyurtio#607)

--> solution: service topology filter will list/watch node from kube-apsierver if working mode is cloud
fixes openyurtio#600

2. optimize shared informers registeration. extract all of informers registeration and
make a comman function named registerInformers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug kind/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants