-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster member state isn't updated to 'DOWN' after the pod becomes down. #4364
Comments
@i will solve it@ |
我的nacos是1.3.2 三节点部署同样出现 |
这是节点的网络不同的问题,自己排查下集群cluster.conf的设置以及节点监听的端口和IP |
Describe the bug
Cluster member state is never updated to 'DOWN'.
集群成员的健康状态未被正确更新至DOWN。
Expected behavior
I reduce the pod replicas total of a Nacos deployed on a k8s cluster by one(originally 3). The state of the Nacos node in the pod reduced should then be updated to 'DOWN' and all requests from clients should be dealt properly.
我对k8s集群上部署的Nacos集群做了减pod操作,则Nacos集群应该感知到这个节点的健康状态变化,把其设置为DOWN,也不会再把请求转发到这个节点上去。
Acutally behavior
The node state is updated to 'SUSPICIOUS' but never become 'DOWN'. The requests are still forwarded to the node from others member.
节点状态变为了SUSPICIOUS,但一直保持在这个状态上,未被设置为DOWN。其他节点仍然会把请求转发到这个节点上,导致约1/3请求报错。
How to Reproduce
Desktop (please complete the following information):
Additional context
cluster log:
According to my guess, the code below leads to the bug.
Here we do add failAccessCnt of the cloneMember by one, but when I look into the function
I find that the failAccessCnt of cloneMember isn't copied to the original member object:
In the method 'copy', we don't copy the field 'failAccessCnt'.
maybe alter the method 'copy' as follow would help?
The text was updated successfully, but these errors were encountered: