Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coordinator lose watch on /druid/announcements due to unreasonable usage of PathChildrenCache #6597

Closed
warren288 opened this issue Nov 10, 2018 · 3 comments
Labels

Comments

@warren288
Copy link

When a coordinator nodes becomes leader, it starts a PathChildrenCache for /druid/announcements in CuratorInventoryManager.start. if no historical nodes had started, thus /druid/announcements does not exist, the PathChildrenCache for /druid/announcements created by coordinator will create /druid/announcements with CONTAINER mode, and then historicals will announce themselves in /druid/announcements.
After a while if all historicals are shutdown due to some exception, thus /druid/annoucements become empty, the zookeeper server will clean /druid/annoucements node and coordinator leader loses watch on it.
When historicals restore and create /druid/annoucements again, coordinator leader can never perceive.

So, in CuratorInventoryManager.start, coordinator leader should check whether /druid/announcements node exists and create it with PERSISTENT mode if not, before it starts PathChildrenCache for /druid/announcements.

@kaijianding
Copy link
Contributor

@warren288 maybe #6683 also fix this issue. NodeCache is used instead of PathChildrenCache for /druid/annoucements. You can apply this fix in your environment and try it.

@stale
Copy link

stale bot commented Sep 7, 2019

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

@stale stale bot added the stale label Sep 7, 2019
@stale
Copy link

stale bot commented Oct 5, 2019

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

@stale stale bot closed this as completed Oct 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants