Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove ServerDiscoverySelector i.e. hardcoded zk dependency from DruidLeaderClient #10537

Merged
merged 1 commit into from
Oct 28, 2020

Conversation

himanshug
Copy link
Contributor

@himanshug himanshug commented Oct 27, 2020

towards #9053

Brings back #9481 that was reverted in #9702 because bug #9701 was discovered.

#9701 root cause was really same as that of #8716, both were fixed in #9717

TLDR:
Change in this patch was introduced in the past, an issue was discovered whose root cause was ideally fixed by another change. That couldn't be done at the time to let Druid release go forward quickly, so the change was reverted. Later, original problem got fixed and now it is [rolling eyes] safe to bring the change back.

Copy link
Member

@nishantmonu51 nishantmonu51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, 👍

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤘

@himanshug
Copy link
Contributor Author

thanks @clintropolis , @nishantmonu51

I also tested it manually to see that stuff in sys.segments and sys.tasks table continues to work when coordinator/overlord leadership changes.

@himanshug himanshug merged commit 4de4d4d into apache:master Oct 28, 2020
@himanshug himanshug deleted the druid_leader_client_zk_removal branch October 28, 2020 17:57
@suneet-s
Copy link
Contributor

suneet-s commented Nov 3, 2020

@himanshug How should we write an integration test for this so that we do not need to test this manually with each release? Can you describe the steps you did. Is there something missing in the integration tests framework that prevents us from writing an integration test for this?

@himanshug
Copy link
Contributor Author

himanshug commented Nov 3, 2020

@suneet-s for the specific change here, it does get covered by integration tests but scenarios like one of the leader overlord/coordinator node crash while some operation was happening, don't get tested in our build environments as all of them work with one overlord and coordinator nodes . Even in a real more complex env we would need to simulate node crashes to handle such scenarios to simulate different possible leader/node failure models.
does that answer your question ?

@jihoonson jihoonson added this to the 0.21.0 milestone Jan 4, 2021
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants