-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean install does not work #265
Comments
After bringing it up/down a few times it seems to be hit or miss whether the master election actually occurs. It depends on the number of nodes and whether the application requesting is hitting a sentinel that is colocated with the master. |
Yeah, I just think this isn't compatible on smaller node count clusters, the risk that a sentinel and redis are scheduled together is too high. You can get this error to happen easily by starting the operator in kubernetes on docker.
|
same issue |
Are you storing to disk? I had the same issue, and it was caused by bad permissions in the redis storage folder (host level). Once I corrected that, the cluster booted and synchronized. I'll keep testing, but that solved my issue. |
I have the same issue, were you able to fix it? @Sieabah |
In my tests it usually starts withing ~3-4 minutes. A workaround is to spin up the cluster with 1 replica, and then expand to 3 replicas when a node is ready. I think the problem was introduced in #206 with This will hit the code path with Now, instead, it calls
redis-operator/operator/redisfailover/checker.go Lines 102 to 132 in b3dad57
|
This issue is stale because it has been open for 45 days with no activity. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
@XBeg9 Sorry for not catching this reply earlier. I was not, I wasn't using redis for anything other than pub/sub so I ended up rolling my own exchange with Elixir and Phoenix. Couldn't find any reliable solution for redis on small clusters. I use singleton instances and when nodes cannot connect to redis they abruptly kill any in-progress work until they can reconnect. It's the only way I can get redis to work at all. |
Expected behaviour
To be able to connect to redis without having to do additional work of creating additional charts and services to work around the 127.0.0.1 "redis host"
Actual behaviour
All redis master instances are 127.0.0.1, which fails 100% of the time and is incompatible with actual redis sidecar caches.
Steps to reproduce the behaviour
Clean install, directly into the default namespace. Use ioredis to connect to the sentinel, pretty sure this doesn't work with any library
Environment
How are the pieces configured?
Logs
Redis operates fine, the sentinels operate fine. The value returns from getting the master ip is useless because all masters are local to the sentinels. (3 instances in a cluster?)
Is the idea to have more than 3 instances and have a minimum of 6 or more? Do we need to scale the cluster well past 6 to absolutely ensure 127.0.0.1 is never the result by having the scheduler kill any pods that are on the same nodes?
In the mean time I'm going to go with another solution as this seems fundamentally broken on smaller clusters.
The cluster permissions are also excessively terrible
Is this operator meant to be used in production, it doesn't seem like it can guarantee an outage if pods are scheduled on the same node?
Edit:
Additional issues were found with the "all resources" yaml. The order in which the resources are created creates a race for the operator deployment. If the resources are not created in time the operator spits out an error that it doesn't have permission to run. (It's not a "failed state", must be terminated)
Due to this failure the CRD's are never created, which also makes it so you're unable to apply the redisfailover resource. Why are the CRDs not provided externally from the operator, why must the operator be the one which creates the CRD?
The text was updated successfully, but these errors were encountered: