Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Zookeeper probes parameters in Apache Solr Operator helm charts. #477

Closed
iampranabroy opened this issue Sep 21, 2022 · 7 comments · Fixed by #546
Closed

Support Zookeeper probes parameters in Apache Solr Operator helm charts. #477

iampranabroy opened this issue Sep 21, 2022 · 7 comments · Fixed by #546
Labels
custom kube options Adding options related to customizing parts of the default Kubernetes resources. zookeeper Related to Zookeeper or the Zookeeper Operator
Milestone

Comments

@iampranabroy
Copy link

Describe the issue:

When deploying SolrCloud via Apache Solr Operator with ensembled Zookeeper, sometimes one of the zookeeper pods gives the below error during the start:

2021-03-29 13:33:56,645 [myid:2] - ERROR [main:QuorumPeerMain@113] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: My id 2 not in the peer list
	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1073)
	at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
	at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
	at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)

Possible solutions:

As per the discussions in the GitHub issue-315, they are suggesting increasing the probes.readiness.initialDelaySeconds from default 10 to 30/60 sec.
Can we add support for the zookeeper config.* parameters in Apache Solr Operator helm charts?

@iampranabroy iampranabroy changed the title Support Zookeeper probes inputs via Apache Solr Operator helm charts Support Zookeeper probes parameters in Apache Solr Operator helm charts. Sep 21, 2022
@mmoscher
Copy link
Contributor

@iampranabroy did you found an interim solution?
Facing the same issue and I think (for now) the only way to go is to deploy an separated zookeeper cluster.

Will dig into this and will submit an PR. Shouldn't be that hard I think.

@iampranabroy
Copy link
Author

Hey @mmoscher - As of now, NO. If you can raise a PR that would be great.
@HoustonPutman - If there are any upcoming minor releases, can we add this item?

@mmoscher
Copy link
Contributor

mmoscher commented Sep 26, 2022

However, can confirm that the described solutions, i.e. increasing the livenessProbe.initialDelaySeconds, works. Setting this to 30s I was able to successfully deploy a zookeeper cluster with replicas > 1.

//Edit: false positive ... just had a bunch of luck. For now I'm unable to successfully (re-)deploy a zookeeper cluster.
Let's move this discussion back to: pravega/zookeeper-operator#315

@HoustonPutman HoustonPutman added zookeeper Related to Zookeeper or the Zookeeper Operator custom kube options Adding options related to customizing parts of the default Kubernetes resources. labels Oct 21, 2022
@HoustonPutman
Copy link
Contributor

@mmoscher We can definitely add probes support through the Solr Operator, but just to make sure you solved this issue independently from any Solr/ZK settings correct?

@mmoscher
Copy link
Contributor

@HoustonPutman yes, solved it without using any probes.
The problem was related to wrong NetworkPolicies and old (maybe corrupted) configs in the zookeeper PVC, cf. pravega/zookeeper-operator#315 (comment)

@iampranabroy
Copy link
Author

iampranabroy commented Oct 25, 2022

Hey, @mmoscher - Thanks for your response.
In my case, I have the Solr cluster and zookeeper cluster deployed in the same namespace, but I have seen this error several times. If we can add the support for probes.readiness.initialDelaySeconds, we can see if that resolves the problem.

@mmoscher - Do you have your zookeeper and solr deployed in the same namespace or a different namespace? Was curious about allow-zookeeper-access: true

@mmoscher
Copy link
Contributor

@iampranabroy yes, all resources (Solr + ZK) in the same namespace with NetworkPolicies denying all pod's egress traffic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom kube options Adding options related to customizing parts of the default Kubernetes resources. zookeeper Related to Zookeeper or the Zookeeper Operator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants