Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.2.0 Sentinel Config ReadOnly #35

Closed
Paic opened this issue Mar 26, 2018 · 7 comments · Fixed by #36
Closed

v0.2.0 Sentinel Config ReadOnly #35

Paic opened this issue Mar 26, 2018 · 7 comments · Fixed by #36

Comments

@Paic
Copy link
Contributor

Paic commented Mar 26, 2018

I deployed redis-operator v0.2.0 on a brand new GKE cluster, added RBAC permissions (also, is there a list of those permissions ?) and the Sentinels seems to crash.

Redis Sentinel logs:

# Sentinel config file /redis/sentinel.conf is not writable: Read-only file system. Exiting...

Am I missing something ?

@Paic Paic changed the title Sentinel Config ReadOnly v0.2.0 Sentinel Config ReadOnly Mar 26, 2018
@jchanam
Copy link
Collaborator

jchanam commented Mar 26, 2018

Hi!

For enabling the RBAC on the operator, I recommend you to use the helm chart. The list of the permissions needed are here.

Apart from that, I'm not able to reproduce your issue. Here is what I've done:

  1. Launch a fresh minikube with RBAC active:
    sudo minikube start --vm-driver=none --extra-config=apiserver.Authorization.Mode=RBAC
  2. Create a clusterRoleBinding to the default service-account on kube-system namespace:
    kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
  3. Install tiller on my cluster:
    helm init
  4. Deploy the redis-operator with the given chart:
    helm install charts/redisoperator/ --name redis-operator --set "rbac.install=true"
  5. Create the example redis-failover:
    kubectl create -f example/redisfailover.yaml
  6. After a couple of minutes All the pods are ready:
rfr-redisfailover-0                             2/2       Running   0          4m
rfr-redisfailover-1                             2/2       Running   0          4m
rfr-redisfailover-2                             2/2       Running   0          3m
rfs-redisfailover-7d9f479b65-bw27m              1/1       Running   0          4m
rfs-redisfailover-7d9f479b65-ks9kz              1/1       Running   0          4m
rfs-redisfailover-7d9f479b65-pvn6m              1/1       Running   0          4m

If I check the status of the redises, I get this:

  • Node 0:
# Replication
role:master
connected_slaves:2
slave0:ip=172.17.0.11,port=6379,state=online,offset=12663,lag=0
slave1:ip=172.17.0.10,port=6379,state=online,offset=12663,lag=0
master_repl_offset:12663
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:12662
  • Node 1:
# Replication
role:slave
master_host:172.17.0.6
master_port:6379
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_repl_offset:12663
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
  • Node 2:
# Replication
role:slave
master_host:172.17.0.6
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:12798
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

The status of the sentinels are this:

  • Node A:
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.17.0.6:6379,slaves=2,sentinels=3
  • Node B:
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.17.0.6:6379,slaves=2,sentinels=3
  • Node C:
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.17.0.6:6379,slaves=2,sentinels=3

Here is an example of the output of a sentinel logs:

                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 3.2.11 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in sentinel mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 26379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

1:X 26 Mar 15:32:11.089 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 26 Mar 15:32:11.095 # Sentinel ID is 5d56f5cc89225e5431a4cac20514e2ce04f4aa0b
1:X 26 Mar 15:32:11.095 # +monitor master mymaster 127.0.0.1 6379 quorum 2
1:X 26 Mar 15:32:12.106 # +sdown master mymaster 127.0.0.1 6379
1:X 26 Mar 15:35:06.719 # -monitor master mymaster 127.0.0.1 6379
1:X 26 Mar 15:35:06.729 # +monitor master mymaster 172.17.0.6 6379 quorum 2
1:X 26 Mar 15:35:06.758 # +reset-master master mymaster 172.17.0.6 6379
1:X 26 Mar 15:35:06.778 * +slave slave 172.17.0.11:6379 172.17.0.11 6379 @ mymaster 172.17.0.6 6379
1:X 26 Mar 15:35:07.344 # +reset-master master mymaster 172.17.0.6 6379
1:X 26 Mar 15:35:08.742 * +sentinel sentinel af9b6b10ed64532d73903c1aed80772b3090d459 172.17.0.8 26379 @ mymaster 172.17.0.6 6379
1:X 26 Mar 15:35:08.763 * +sentinel sentinel 7a0ff75331388455c357baeedc915cc730676a8b 172.17.0.7 26379 @ mymaster 172.17.0.6 6379
1:X 26 Mar 15:35:16.844 * +slave slave 172.17.0.11:6379 172.17.0.11 6379 @ mymaster 172.17.0.6 6379
1:X 26 Mar 15:35:16.849 * +slave slave 172.17.0.10:6379 172.17.0.10 6379 @ mymaster 172.17.0.6 6379

Could you give me more information of how to reproduce the issue you're facing?

Thank you for opening this issue!

@Paic
Copy link
Contributor Author

Paic commented Mar 27, 2018

Thanks for the quick reply!
I did not even notice a chart was available, awesome !

After recreating a cluster (GKE - OS: cos - Kubernetes version: 1.9.4-gke.1) and following your steps about installing Helm, deploying the operator with helm and using the example failover, I still have the same error on the sentinels :

                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 3.2.11 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in sentinel mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 26379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

1:X 27 Mar 07:36:55.265 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 27 Mar 07:36:55.265 # Sentinel config file /redis/sentinel.conf is not writable: Read-only file system. Exiting...
NAME                                            READY     STATUS             RESTARTS   AGE
redis-operator-redisoperator-57d74cd97c-2wvff   1/1       Running            0          5m
rfr-redisfailover-0                             1/1       Running            0          3m
rfr-redisfailover-1                             1/1       Running            0          3m
rfr-redisfailover-2                             1/1       Running            0          3m
rfs-redisfailover-5645bc4c57-hjgst              0/1       CrashLoopBackOff   5          3m
rfs-redisfailover-5645bc4c57-j6ld6              0/1       CrashLoopBackOff   5          3m
rfs-redisfailover-5645bc4c57-sxbsr              0/1       CrashLoopBackOff   5          3m

The redis-operator keeps outputing these :

time="2018-03-27T07:56:47Z" level=info msg="configMap updated" configMap=rfs-redisfailover namespace=default service=k8s.configMap src="configmap.go:58"
time="2018-03-27T07:56:48Z" level=info msg="configMap updated" configMap=rfr-redisfailover namespace=default service=k8s.configMap src="configmap.go:58"
time="2018-03-27T07:56:48Z" level=info msg="podDisruptionBudget updated" namespace=default podDisruptionBudget=rfr-redisfailover service=k8s.podDisruptionBudget src
="poddisruptionbudget.go:58"
time="2018-03-27T07:56:48Z" level=info msg="statefulSet updated" namespace=default service=k8s.statefulSet src="statefulset.go:77" statefulSet=rfr-redisfailover
time="2018-03-27T07:56:48Z" level=info msg="podDisruptionBudget updated" namespace=default podDisruptionBudget=rfs-redisfailover service=k8s.podDisruptionBudget src
="poddisruptionbudget.go:58"
time="2018-03-27T07:56:48Z" level=info msg="deployment updated" deployment=rfs-redisfailover namespace=default service=k8s.deployment src="deployment.go:77"
time="2018-03-27T07:56:49Z" level=error msg="Error processing default/redisfailover: dial tcp 10.60.2.13:26379: getsockopt: connection refused" controller=redisfail
over operator=redisfailover src="generic.go:158"

However, the redises seems to be configured properly :

  • Redis 0 :
# Replication
role:master
connected_slaves:2
slave0:ip=10.60.1.11,port=6379,state=online,offset=617,lag=0
slave1:ip=10.60.0.14,port=6379,state=online,offset=617,lag=0
master_repl_offset:617
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:616
  • Redis 1 :
# Replication
role:slave
master_host:10.60.2.12
master_port:6379
master_link_status:up
master_last_io_seconds_ago:3
master_sync_in_progress:0
slave_repl_offset:813
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
  • Redis 2 :
# Replication
role:slave
master_host:10.60.2.12
master_port:6379
master_link_status:up
master_last_io_seconds_ago:5
master_sync_in_progress:0
slave_repl_offset:841
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

@jchanam
Copy link
Collaborator

jchanam commented Mar 28, 2018

Hi @Paic,

That file is from a volume taken from a configmap. I don't know if it works different on the images that Google provides, and that's why I cannot reproduce it.

The mode can be set with this: https://kubernetes.io/docs/concepts/storage/volumes/#example-pod-with-multiple-secrets-with-a-non-default-permission-mode-set

Could you edit that deployment and add mode: 666 to the volume called sentinel-config?

About the error that is seen on the logs, it's because the operator is trying to connect to the sentinels to check their status and fix it if needed.

@jchanam
Copy link
Collaborator

jchanam commented Mar 28, 2018

Hi again,

As this is a ConfigMap, it seems based on the K8S API Reference, that it is: defaultMode.

It is weird, because at it's said on the API Reference, de default one is 644, and root should be able to write on it.

Please, keep us updated with this :)

@adamresson
Copy link

This likely has to do with the security issue and fix in 1.9.4 that requires configmaps to be readonly. See overarching issue here: kubernetes/kubernetes#61563

@jchanam
Copy link
Collaborator

jchanam commented Mar 28, 2018

@Paic please use version 0.2.1 and confirm us this problem is solved.

Thanks for helping us improving this!

@Paic
Copy link
Contributor Author

Paic commented Mar 29, 2018

Can confirm it's working on a fresh GKE 1.9.4 cluster.

Thanks guys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants