Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator in CrashLoop with panic #479

Closed
golonzovsky opened this issue Sep 30, 2022 · 2 comments · Fixed by #481
Closed

Operator in CrashLoop with panic #479

golonzovsky opened this issue Sep 30, 2022 · 2 comments · Fixed by #481
Assignees
Labels
bug Something isn't working
Milestone

Comments

@golonzovsky
Copy link

golonzovsky commented Sep 30, 2022

We are trying to switch to operator and after image update got to point of operator crashlooping:

2022-09-30T11:08:17.926Z        INFO    controller-runtime.manager.controller.solrcloud Starting EventSource    {"reconciler group": "solr.apache.org", "reconciler kind": "SolrCloud", "source": "kind source: /, Ki
nd="}
2022-09-30T11:08:17.927Z        INFO    controller-runtime.manager.controller.solrcloud Starting Controller     {"reconciler group": "solr.apache.org", "reconciler kind": "SolrCloud"}
2022-09-30T11:08:17.927Z        INFO    controller-runtime.manager.controller.solrcloud Starting workers        {"reconciler group": "solr.apache.org", "reconciler kind": "SolrCloud", "worker count": 1}
2022-09-30T11:08:18.335Z        INFO    controller-runtime.manager.controller.solrprometheusexporter    Starting workers        {"reconciler group": "solr.apache.org", "reconciler kind": "SolrPrometheusExporter",
"worker count": 1}
2022-09-30T11:08:18.430Z        INFO    controller-runtime.manager.controller.solrcloud Update required because field changed   {"reconciler group": "solr.apache.org", "reconciler kind": "SolrCloud", "name": "sear
ch-solr-test", "namespace": "search", "statefulSet": "search-solr-test-solrcloud", "kind": "statefulSet", "field": "Spec.Template.Spec.Volumes[1].VolumeSource", "from": {"secret":{"secretName":"gcp-search-solr-cre
dentials-secret","defaultMode":420}}, "to": {"secret":{"secretName":"gcp-search-solr-credentials-secret"}}}
2022-09-30T11:08:18.432Z        INFO    controller-runtime.manager.controller.solrcloud Updating StatefulSet    {"reconciler group": "solr.apache.org", "reconciler kind": "SolrCloud", "name": "search-solr-test", "
namespace": "search", "statefulSet": "search-solr-test-solrcloud"}
2022-09-30T11:08:18.549Z        INFO    controller-runtime.manager.controller.solrcloud.ManagedUpdateSelector   Pod update selection started.   {"reconciler group": "solr.apache.org", "reconciler kind": "SolrCloud
", "name": "search-solr-test", "namespace": "search", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 0, "outOfDatePodsNotStarted": 0, "maxPodsToUpdate": 1}
2022-09-30T11:08:18.549Z        INFO    controller-runtime.manager.controller.solrcloud.ManagedUpdateSelector   Pod killed for update.  {"reconciler group": "solr.apache.org", "reconciler kind": "SolrCloud", "name
": "search-solr-test", "namespace": "search", "pod": "search-solr-test-solrcloud-0", "reason": "Pod's replicas are safe to take down, adhering to the minimum active replicas per shard."}
2022-09-30T11:08:18.549Z        INFO    controller-runtime.manager.controller.solrcloud.ManagedUpdateSelector   Pod update selection complete. Maximum number of pods able to be updated reached.       {"reconciler
group": "solr.apache.org", "reconciler kind": "SolrCloud", "name": "search-solr-test", "namespace": "search", "maxPodsToUpdate": 1}
E0930 11:08:18.550190       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 561 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x13dd140, 0x2249580})
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000760b00})
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x75
panic({0x13dd140, 0x2249580})
        /usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/apache/solr-operator/controllers/util.EvictReplicasForPodIfNecessary({0x176d458, 0xc0058552c0}, 0xc005393b60, 0x1c, {0x1787d28, 0xc00547e6e0})
        /workspace/controllers/util/solr_update_util.go:493 +0x67
github.com/apache/solr-operator/controllers.(*SolrCloudReconciler).Reconcile(0xc0002b3e60, {0x176d458, 0xc0058552c0}, {{{0xc000885770, 0x144a920}, {0xc000885760, 0xc0008e4380}}})
        /workspace/controllers/solrcloud_controller.go:428 +0x3167
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000121180, {0x176d3b0, 0xc00029c000}, {0x1427120, 0xc000760b00})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298 +0x303
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000121180, {0x176d3b0, 0xc00029c000})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2({0x176d3b0, 0xc00029c000})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216 +0x46
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:213 +0x356
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x12b3627]

goroutine 561 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000760b00})
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:55 +0xd8
panic({0x13dd140, 0x2249580})
        /usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/apache/solr-operator/controllers/util.EvictReplicasForPodIfNecessary({0x176d458, 0xc0058552c0}, 0xc005393b60, 0x1c, {0x1787d28, 0xc00547e6e0})
        /workspace/controllers/util/solr_update_util.go:493 +0x67
github.com/apache/solr-operator/controllers.(*SolrCloudReconciler).Reconcile(0xc0002b3e60, {0x176d458, 0xc0058552c0}, {{{0xc000885770, 0x144a920}, {0xc000885760, 0xc0008e4380}}})
        /workspace/controllers/solrcloud_controller.go:428 +0x3167
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000121180, {0x176d3b0, 0xc00029c000}, {0x1427120, 0xc000760b00})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298 +0x303
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000121180, {0x176d3b0, 0xc00029c000})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2({0x176d3b0, 0xc00029c000})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216 +0x46
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f9eb86e9250)
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0010c22a0, {0x1746200, 0xc000bfa4b0}, 0x1, 0xc0003103c0)
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0010ba6d0, 0x3b9aca00, 0x0, 0x0, 0xc0010c22d0)
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x176d3b0, 0xc00029c000}, 0xc0006677e0, 0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185 +0x99
k8s.io/apimachinery/pkg/util/wait.UntilWithContext({0x176d3b0, 0xc00029c000}, 0x0, 0x0)
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99 +0x2b
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:213 +0x356

SolrCloud CRD definition:

apiVersion: solr.apache.org/v1beta1
kind: SolrCloud
metadata:
  name: search-solr-test
  namespace: search
spec:
  busyBoxImage:
    repository: library/busybox
    tag: 1.28.0-glibc
  customSolrKubeOptions:
    podOptions:
      initContainers:
        - name: upload-zk-config
          image: xxx-docker.jfrog.io/xxx/search-solr:test-operator-3
          command: ["/var/solr-ricardo/scripts/load_configs_to_zookeeper.sh"]
          env:
            - name: ZK_HOST
              value: search-solr-test-solrcloud-zookeeper-0.search-solr-test-solrcloud-zookeeper-headless.search.svc.cluster.local:2181,search-solr-test-solrcloud-zookeeper-1.search-solr-test-solrcloud-zookeeper-headless.search.svc.cluster.local:2181,search-solr-test-solrcloud-zookeeper-2.search-solr-test-solrcloud-zookeeper-headless.search.svc.cluster.local:2181/
      imagePullSecrets:
        - name: xxx-docker-jfrog
      envVars:
        - name: ARTICLES_EXPIRATION_FIELD
          value: expiration_date
        - name: AUTO_DELETE_PERIOD_SECONDS
          value: "3600"
        - name: GCS_PROJECT_ID
          value: xxxxx
        - name: GOOGLE_APPLICATION_CREDENTIALS
          value: "/var/gcp-credentials/credentials.json"
      volumes:
        - name: gcp-credentials
          defaultContainerMount:
            name: gcp-credentials
            mountPath: /var/gcp-credentials
            readOnly: true
          source:
            secret:
              secretName: gcp-search-solr-credentials-secret
  dataStorage:
    persistent:
      pvcTemplate:
        metadata:
          annotations:
            volume.beta.kubernetes.io/storage-class: regional-ssd
          name: search-solr-test
        spec:
          resources:
            requests:
              storage: 20Gi
  replicas: 3
  solrAddressability:
    commonServicePort: 80
    podPort: 8983
  solrImage:
    repository: xxx-docker.jfrog.io/xxx/search-solr
    tag: test-operator-3
  solrJavaMem: -Xms2048m -Xmx4096m
  solrLogLevel: INFO
  updateStrategy:
    managed: { }
    method: Managed
  zookeeperRef:
    provided:
      chroot: /
      config: { }
      image:
        pullPolicy: IfNotPresent
        repository: pravega/zookeeper
      replicas: 3
      zookeeperPodPolicy:
        resources: { }

Status on CRD:

status:
  backupRestoreReady: false
  internalCommonAddress: http://search-solr-test-solrcloud-common.search
  podSelector: solr-cloud=search-solr-test,technology=solr-cloud
  readyReplicas: 2
  replicas: 3
  solrNodes:
  - internalAddress: http://search-solr-test-solrcloud-0.search-solr-test-solrcloud-headless.search:8983
    name: search-solr-test-solrcloud-0
    nodeName: gke-dev-cookie-e2-spoon-np-1fd433b7-l0z6
    ready: true
    specUpToDate: false
    version: test-operator
  - internalAddress: http://search-solr-test-solrcloud-1.search-solr-test-solrcloud-headless.search:8983
    name: search-solr-test-solrcloud-1
    nodeName: gke-dev-cookie-e2-fork-np-f4c2fe51-jkvi
    ready: true
    specUpToDate: false
    version: test-operator
  - internalAddress: http://search-solr-test-solrcloud-2.search-solr-test-solrcloud-headless.search:8983
    name: search-solr-test-solrcloud-2
    nodeName: gke-dev-cookie-e2-fork-np-3162ff26-q7vc
    ready: false
    specUpToDate: true
    version: test-operator-3
  targetVersion: test-operator-3
  upToDateNodes: 1
  version: test-operator
  zookeeperConnectionInfo:
    chroot: /
    externalConnectionString: N/A
    internalConnectionString: search-solr-test-solrcloud-zookeeper-0.search-solr-test-solrcloud-zookeeper-headless.search.svc.cluster.local:2181,search-solr-test-solrcloud-zookeeper-1.search-solr-test-solrcloud-zookeeper-headless.search.svc.cluster.local:2181,search-solr-test-solrcloud-zookeeper-2.search-solr-test-solrcloud-zookeeper-headless.search.svc.cluster.local:2181

Let me know if I can provide more info which can help you to investigate an issue.

@HoustonPutman
Copy link
Contributor

Thanks for finding this bug! You can fix it yourself locally by not setting a custom name for your data PVCs, but we will get a fix in as soon as possible. Not sure when the next release will be, but this will for sure be included.

@golonzovsky
Copy link
Author

golonzovsky commented Oct 19, 2022

hey, yes, I've figured as well that changing PVC name to data fixed crashing for us. We'll try removing name as well, thanks for the feedback and fix! 👍🏻

@HoustonPutman HoustonPutman added the bug Something isn't working label Oct 21, 2022
@HoustonPutman HoustonPutman modified the milestones: main (v0.7.0), v0.6.1 Nov 1, 2022
@HoustonPutman HoustonPutman self-assigned this Nov 1, 2022
@HoustonPutman HoustonPutman modified the milestones: v0.6.1, main (v0.7.0) Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants