Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New cluster doesn't happen to end up in orchestrator #673

Open
ynnt opened this issue Apr 7, 2021 · 10 comments
Open

New cluster doesn't happen to end up in orchestrator #673

ynnt opened this issue Apr 7, 2021 · 10 comments

Comments

@ynnt
Copy link

ynnt commented Apr 7, 2021

Cluster is stuck in Ready: False phase because mysql pod never gets Ready.

2021-04-07T14:52:54.553072021Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"Lease","namespace":"default","name":"mysql-operator-leader-election","uid":"c61dd4bc-4706-4b5e-9c46-b2267447f087","apiVersion":"coordination.k8s.io/v1","resourceVersion":"78295"}, "reason": "LeaderElection", "message": "cm-mysql-operator-0_750f6e52-877d-473c-9935-69d30acfd952 became leader"}
2021-04-07T14:52:54.553275205Z	INFO	controller-runtime.manager.controller.mysqlbackup-controller	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.553360761Z	INFO	controller-runtime.manager.controller.mysqlbackup-controller	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.553824764Z	INFO	controller-runtime.manager.controller.mysqlbackupcron-controller	Starting EventSource	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.554015469Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.554121207Z	INFO	controller-runtime.manager.controller.mysql-database	Starting EventSource	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.554335196Z	INFO	controller-runtime.manager.controller.mysql-user	Starting EventSource	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.554562241Z	INFO	controller-runtime.manager.controller.controller.mysqlNode	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.554786339Z	INFO	controller-runtime.manager.controller.controller.orchestrator	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.654175743Z	INFO	controller-runtime.manager.controller.mysqlbackup-controller	Starting Controller
2021-04-07T14:52:54.65433261Z	INFO	controller-runtime.manager.controller.mysql-database	Starting Controller
2021-04-07T14:52:54.654463104Z	INFO	controller-runtime.manager.controller.mysqlbackupcron-controller	Starting Controller
2021-04-07T14:52:54.654492149Z	INFO	controller-runtime.manager.controller.mysqlbackupcron-controller	Starting workers	{"worker count": 1}
2021-04-07T14:52:54.654561787Z	INFO	controller-runtime.manager.controller.mysql-user	Starting Controller
2021-04-07T14:52:54.654585036Z	INFO	controller-runtime.manager.controller.mysql-user	Starting workers	{"worker count": 1}
2021-04-07T14:52:54.654638173Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.654938043Z	INFO	controller-runtime.manager.controller.controller.orchestrator	Starting EventSource  	{"source": "channel source: 0xc00022c2d0"}
2021-04-07T14:52:54.655058696Z	INFO	controller-runtime.manager.controller.controller.orchestrator	Starting Controller
2021-04-07T14:52:54.655927767Z	INFO	controller-runtime.manager.controller.controller.mysqlNode	Starting Controller
2021-04-07T14:52:54.656018901Z	DEBUG	controller.orchestrator	register cluster in clusters list	{"obj": {"kind":"MysqlCluster","apiVersion":"mysql.presslabs.org/v1alpha1","metadata":{"name":"kl-my","namespace":"default","uid":"5a09c95d-a977-4a4b-94e3-a97209938043","resourceVersion":"74547","generation":1,"creationTimestamp":"2021-04-07T14:38:02Z","annotations":{"mysql.presslabs.org/version":"300"},"ownerReferences":[{"apiVersion":"kuberlogic.com/v1","kind":"KuberLogicService","name":"kl-my","uid":"9db08315-5aed-4f29-8c18-aa3e95ceb053","controller":true,"blockOwnerDeletion":true}],"finalizers":["mysql.presslabs.org/registered-in-orchestrator"],"managedFields":[{"manager":"operator","operation":"Update","apiVersion":"mysql.presslabs.org/v1alpha1","time":"2021-04-07T14:38:02Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{}},"f:spec":{".":{},"f:image":{},"f:podSpec":{".":{},"f:annotations":{".":{},"f:monitoring.cloudlinux.com/port":{},"f:monitoring.cloudlinux.com/scrape":{}},"f:containers":{},"f:imagePullSecrets":{},"f:initContainers":{},"f:metricsExporterResources":{".":{},"f:limits":{".":{},"f:cpu":{},"f:memory":{}},"f:requests":{".":{},"f:cpu":{},"f:memory":{}}},"f:mysqlOperatorSidecarResources":{".":{},"f:requests":{".":{},"f:cpu":{},"f:memory":{}}},"f:resources":{".":{},"f:limits":{".":{},"f:cpu":{},"f:memory":{}},"f:requests":{".":{},"f:cpu":{},"f:memory":{}}}},"f:replicas":{},"f:secretName":{},"f:volumeSpec":{".":{},"f:persistentVolumeClaim":{".":{},"f:resources":{".":{},"f:requests":{".":{},"f:storage":{}}}}}}}},{"manager":"mysql-operator","operation":"Update","apiVersion":"mysql.presslabs.org/v1alpha1","time":"2021-04-07T14:38:06Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:mysql.presslabs.org/version":{}},"f:finalizers":{}},"f:spec":{"f:minAvailable":{},"f:podSpec":{"f:mysqlOperatorSidecarResources":{"f:limits":{".":{},"f:cpu":{},"f:memory":{}}}},"f:volumeSpec":{"f:persistentVolumeClaim":{"f:accessModes":{}}}},"f:status":{".":{},"f:conditions":{}}}}]},"spec":{"replicas":2,"secretName":"kl-my-cred","image":"quay.io/kuberlogic/mysql:5.7.26","podSpec":{"imagePullSecrets":[{"name":"kuberlogic-registry"}],"annotations":{"monitoring.cloudlinux.com/port":"9999","monitoring.cloudlinux.com/scrape":"true"},"resources":{"limits":{"cpu":"100m","memory":"512Mi"},"requests":{"cpu":"10m","memory":"256Mi"}},"initContainers":[{"name":"myisam-repair","image":"quay.io/kuberlogic/mysql:5.7.26","command":["/bin/sh","-c","for f in $(ls /var/lib/mysql/mysql/*MYI); do myisamchk -r --update-state $(echo $f | tr -d .MYI); done"],"resources":{},"volumeMounts":[{"name":"data","mountPath":"/var/lib/mysql"}]}],"containers":[{"name":"kuberlogic-exporter","image":"quay.io/kuberlogic/mysql-exporter-deprecated:v2","ports":[{"name":"metrics","containerPort":9999,"protocol":"TCP"}],"resources":{},"volumeMounts":[{"name":"data","mountPath":"/var/lib/mysql"}]}],"metricsExporterResources":{"limits":{"cpu":"100m","memory":"128Mi"},"requests":{"cpu":"10m","memory":"32Mi"}},"mysqlOperatorSidecarResources":{"requests":{"cpu":"10m","memory":"64Mi"}}},"volumeSpec":{"persistentVolumeClaim":{"resources":{"requests":{"storage":"1Gi"}}}}},"status":{"conditions":[{"type":"ReadOnly","status":"True","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"ClusterReadOnlyTrue","message":"read-only nodes: "},{"type":"Ready","status":"False","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"StatefulSetNotReady","message":"StatefulSet is not ready"},{"type":"PendingFailoverAck","status":"False","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"NoPendingFailoverAckExists","message":"no pending ack"}]}}}
2021-04-07T14:52:54.65837839Z	INFO	controller-runtime.manager.controller.mysql-database	Starting workers	{"worker count": 1}
2021-04-07T14:52:54.755666407Z	INFO	controller-runtime.manager.controller.mysqlbackup-controller	Starting workers      	{"worker count": 1}
2021-04-07T14:52:54.755720705Z	INFO	controller-runtime.manager.controller.controller.orchestrator	Starting workers      	{"worker count": 10}
2021-04-07T14:52:54.757668027Z	INFO	controller-runtime.manager.controller.controller.mysqlNode	Starting workers      	{"worker count": 1}
2021-04-07T14:52:54.757705105Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.858307211Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.959827321Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:55.060318519Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:55.161471864Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting Controller
2021-04-07T14:52:55.161604803Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting workers      	{"worker count": 1}
2021-04-07T14:52:55.161879663Z	DEBUG	controller.mysqlcluster	reconcile cluster	{"key": "default/kl-my"}
2021-04-07T14:52:55.163217074Z	DEBUG	unchanged	{"syncer": "ConfigMap", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "/v1, Kind=ConfigMap", "diff": []}
2021-04-07T14:52:55.163743132Z	DEBUG	unchanged	{"syncer": "OperatedSecret", "key": {"namespace": "default", "name": "kl-my-mysql-operated"}, "kind": "/v1, Kind=Secret", "diff": []}
2021-04-07T14:52:55.164085532Z	DEBUG	unchanged	{"syncer": "Secret", "key": {"namespace": "default", "name": "kl-my-cred"}, "kind": "/v1, Kind=Secret", "diff": []}
2021-04-07T14:52:55.16461333Z	DEBUG	unchanged	{"syncer": "HeadlessSVC", "key": {"namespace": "default", "name": "mysql"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.166243362Z	DEBUG	unchanged	{"syncer": "MasterSVC", "key": {"namespace": "default", "name": "kl-my-mysql-master"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.1668702Z	DEBUG	unchanged	{"syncer": "HealthySVC", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.167596425Z	DEBUG	unchanged	{"syncer": "HealthyReplicasSVC", "key": {"namespace": "default", "name": "kl-my-mysql-replicas"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.208066905Z	DEBUG	updated	{"syncer": "StatefulSet", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "apps/v1, Kind=StatefulSet", "diff": []}
2021-04-07T14:52:55.20854835Z	DEBUG	unchanged	{"syncer": "PDB", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "policy/v1beta1, Kind=PodDisruptionBudget", "diff": []}
2021-04-07T14:52:55.2085749Z	DEBUG	controller.mysqlcluster	cluster status	{"key": "default/kl-my", "status": {"conditions":[{"type":"ReadOnly","status":"True","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"ClusterReadOnlyTrue","message":"read-only nodes: "},{"type":"Ready","status":"False","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"StatefulSetNotReady","message":"StatefulSet is not ready"},{"type":"PendingFailoverAck","status":"False","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"NoPendingFailoverAckExists","message":"no pending ack"}]}}
2021-04-07T14:52:55.208888335Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"MysqlCluster","namespace":"default","name":"kl-my","uid":"5a09c95d-a977-4a4b-94e3-a97209938043","apiVersion":"mysql.presslabs.org/v1alpha1","resourceVersion":"74547"}, "reason": "StatefulSetSyncSuccessfull", "message": "apps/v1, Kind=StatefulSet default/kl-my-mysql updated successfully"}
2021-04-07T14:52:55.310250803Z	DEBUG	controller.mysqlcluster	reconcile cluster	{"key": "default/kl-my"}
2021-04-07T14:52:55.311133499Z	DEBUG	unchanged	{"syncer": "ConfigMap", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "/v1, Kind=ConfigMap", "diff": []}
2021-04-07T14:52:55.311607029Z	DEBUG	unchanged	{"syncer": "OperatedSecret", "key": {"namespace": "default", "name": "kl-my-mysql-operated"}, "kind": "/v1, Kind=Secret", "diff": []}
2021-04-07T14:52:55.311900251Z	DEBUG	unchanged	{"syncer": "Secret", "key": {"namespace": "default", "name": "kl-my-cred"}, "kind": "/v1, Kind=Secret", "diff": []}
2021-04-07T14:52:55.312375685Z	DEBUG	unchanged	{"syncer": "HeadlessSVC", "key": {"namespace": "default", "name": "mysql"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.313117698Z	DEBUG	unchanged	{"syncer": "MasterSVC", "key": {"namespace": "default", "name": "kl-my-mysql-master"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.313745106Z	DEBUG	unchanged	{"syncer": "HealthySVC", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.314438017Z	DEBUG	unchanged	{"syncer": "HealthyReplicasSVC", "key": {"namespace": "default", "name": "kl-my-mysql-replicas"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.34431649Z	DEBUG	updated	{"syncer": "StatefulSet", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "apps/v1, Kind=StatefulSet", "diff": []}
2021-04-07T14:52:55.344757091Z	DEBUG	unchanged	{"syncer": "PDB", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "policy/v1beta1, Kind=PodDisruptionBudget", "diff": []}
2021-04-07T14:52:55.344790801Z	DEBUG	controller.mysqlcluster	cluster status	{"key": "default/kl-my", "status": {"conditions":[{"type":"ReadOnly","status":"True","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"ClusterReadOnlyTrue","message":"read-only nodes: "},{"type":"Ready","status":"False","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"StatefulSetNotReady","message":"StatefulSet is not ready"},{"type":"PendingFailoverAck","status":"False","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"NoPendingFailoverAckExists","message":"no pending ack"}]}}
2021-04-07T14:52:55.345053401Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"MysqlCluster","namespace":"default","name":"kl-my","uid":"5a09c95d-a977-4a4b-94e3-a97209938043","apiVersion":"mysql.presslabs.org/v1alpha1","resourceVersion":"74547"}, "reason": "StatefulSetSyncSuccessfull", "message": "apps/v1, Kind=StatefulSet default/kl-my-mysql updated successfully"}
2021-04-07T14:52:59.553012175Z	DEBUG	controller.orchestrator	Schedule new cluster for reconciliation	{"key": "default/kl-my"}
2021-04-07T14:52:59.553225885Z	DEBUG	controller.orchestrator	reconciling cluster	{"key": "default/kl-my"}
2021-04-07T14:52:59.554547195Z	DEBUG	unchanged	{"syncer": "OrchestratorFinalizerSyncer", "key": {"namespace": "default", "name": "kl-my"}, "kind": "mysql.presslabs.org/v1alpha1, Kind=MysqlCluster", "diff": []}
2021-04-07T14:52:59.56895656Z	WARNING	orchestrator-reconciler	cluster not found in Orchestrator	{"key": "default/kl-my", "error": "not found"}
github.com/go-logr/zapr.(*zapLogger).Info
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:126
github.com/presslabs/mysql-operator/pkg/controller/orchestrator.(*orcUpdater).getFromOrchestrator
	/go/src/github.com/presslabs/mysql-operator/pkg/controller/orchestrator/orchestrator_reconcile.go:133
github.com/presslabs/mysql-operator/pkg/controller/orchestrator.(*orcUpdater).Sync
	/go/src/github.com/presslabs/mysql-operator/pkg/controller/orchestrator/orchestrator_reconcile.go:83
github.com/presslabs/controller-util/syncer.Sync
	/go/pkg/mod/github.com/presslabs/[email protected]/syncer/syncer.go:82
github.com/presslabs/mysql-operator/pkg/controller/orchestrator.(*ReconcileMysqlCluster).Reconcile
	/go/src/github.com/presslabs/mysql-operator/pkg/controller/orchestrator/orchestrator_controller.go:216
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99```
@jicki
Copy link

jicki commented Apr 9, 2021

Same problem

@nigh8w0lf
Copy link

Yes and logs in orchestrator show - Unable to determine cluster name

This is for a brand new cluster

@nigh8w0lf
Copy link

deployed the cluster in a different namespace and also tried in same namespace as operator, same result.
Also tried changing the name of the cluster but has same issues as above.

@sagikazarmark
Copy link

I have a similar issue: the cluster starts, but after some time the mysql becomes non-ready and I get the above log message in the operator logs.

@browol
Copy link

browol commented Jul 23, 2021

I have this problem too

@iefc
Copy link

iefc commented Sep 29, 2021

Same problem

@calind
Copy link
Member

calind commented Oct 11, 2021

Please make sure you are not hitting #170. (see https://www.bitpoke.io/docs/mysql-operator/deploy-mysql-cluster/#note-1).

Also please try with v0.5.0.

@tebaly
Copy link

tebaly commented Oct 5, 2022

Hello. In my case:

  • I used Kubespray
  • Hoster shut down several servers
    • Got there one ETCD, one Worker
  • I tried adding new
  • Something went wrong
  • I DIDN'T NOTICE IT
  • The cluster crumbled
  • In a panic, I barely fix it
  • I DID SOMETHING WRONG

EVERYTHING WORKED, BUT there were errors in MySQL clusters only.

Obviously, I figured the problem was mysql-operator - no changes helped at all. Everything worked, but MySQL clusters gradually stopped working. Horror...

RUN Kubespray upgrade-cluster.yml

An error occurred - not deleted pod with MySQL cluster. The same error was at the very beginning when I tried to fix the cluster K8S. I ignored her then. This happened at the stage "Drain node"

fatal: [node1]: FAILED! => {"attempts": 3, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig", "/etc/kubernetes/admin.conf", "drain", "--force", "--ignore-daemonsets", "--grace-period", "300", "--timeout", "360s", "--delete-emptydir-data", "node1"], "delta": "0:06:01.760844", "end": "2022-10-05 02:44:14.018346", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2022-10-05 02:38:12.257502", "stderr": "WARNING: ignoring DaemonSet-managed Pods: default/netchecker-agent-hostnet-xvkjz, default/netchecker-agent-w282k, *** \nerror when evicting pods/"***-mysql-0" -n "***" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.\nerror when evicting pods/"***-mysql-0" -n "***"

Kubespray unable to upgrade the cluster completely - in my case that was the reason.

Solution (in my case)

  1. RUN Kubespray upgrade-cluster.yml
  2. Follow the process to the stage "Drain node" each node
  3. The process will hang at this stage and wait for a long time
  4. Delete all MySQL pods from this node
  5. The process will move forward
  6. The K8S cluster will be updated without errors and everything will work

@oau-dev
Copy link

oau-dev commented Feb 1, 2024

hello

same problem here. 77 clusters deployed without problem but one of them does not want to deploy the second node because "cluster not found in Orchestrator". No other error at all

  • Last operator version
  • k8s v1.24.2
  • name of the cluster is just db and namespace is shorter than all others.

@oau-dev
Copy link

oau-dev commented Feb 7, 2024

hello, I found that some data are still in the sqlite db after days of cluster deletion. in database_instance_last_analysis , database_instance_tls, kv_store, hostname_ips.

  • Is this could prevent the cluster to be reinstalled and trigger the message "cluster not found in Orchestrator" ?
  • Is it safe to clean those ?
  • any cleaning process can be achived ?

thx for your help, I'm really stuck here :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants