Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

post-master-upgrade fails on 'Waiting for redis instance to be removed' #6065

Closed
adamwalach opened this issue Oct 18, 2019 · 1 comment
Closed
Assignees
Labels
area/service-management Issues or PRs related to service management priority/critical Priority indication

Comments

@adamwalach
Copy link
Contributor

adamwalach commented Oct 18, 2019

Description
post-master-kyma-gke-upgrade fails randomly on "Test Kyma end-to-end upgrade scenarios" stage

Example: https://storage.googleapis.com/kyma-prow-logs/logs/post-master-kyma-gke-upgrade/1185157315163066369/build-log.txt

Actual result

{"level":"info","log":{"message":"Waiting for redis instance to be removed","taskID":"8cd2c541-6fb4-4a03-8546-3db703c2ac05","time":"2019-10-18T12:24:44.529Z"}}
{"level":"error","log":{"message":"timed out waiting for the condition","taskID":"8cd2c541-6fb4-4a03-8546-3db703c2ac05","time":"2019-10-18T12:26:44.538Z"}}

Cluster State:
fail-log.log

  • ServiceInstance status: All associated ServiceBindings must be removed before this ServiceInstance can be deleted
  • ServiceBinding status: Unbind request for ServiceBinding in-flight to Broker

Migration log:
migration-log.log

Steps to reproduce

Fails randomly

@adamwalach adamwalach added area/service-management Issues or PRs related to service management priority/critical Priority indication labels Oct 18, 2019
@jasiu001 jasiu001 added this to the Sprint_Gopher_28 milestone Oct 21, 2019
@jasiu001 jasiu001 self-assigned this Oct 21, 2019
@jasiu001
Copy link

jasiu001 commented Oct 23, 2019

As mentioned in the issue description, the error occurs because ServiceBinding cannot be deleted.
ServiceBinding during unbind (remove) process is set to "in-flight" status, the process never ends so SB stays in that status.
ServiceCatalog controller (during unbind SB process) tries to find parent ServiceInstance, after finding it checks its ServiceClass reference.
Logs from ServiceCatalog controller:

I 2019-10-18T12:24:44.496966Z ServiceBinding "helmbrokerupgradetest/redis-credentials" v19161: Processing Delete
(...)
I 2019-10-18T12:24:44.655132Z Error syncing ServiceBinding helmbrokerupgradetest/redis-credentials (retry: 0/15): ClusterServiceClass reference for Instance has not been resolved yet
(...)
I 2019-10-18T12:26:07.070575Z Error syncing ServiceBinding helmbrokerupgradetest/redis-credentials (retry: 14/15): ClusterServiceClass reference for Instance has not been resolved yet

the code responsible for this check is located in controller binding

When we look at ServiceInstance spec then we see lack of reference class:

"spec": {
        "clusterServiceClassExternalName": "redis",
        "clusterServicePlanExternalName": "micro",
        "externalID": "f944b7ee-f19e-11e9-b995-be76fb442d49",
        "userInfo": {
            (...)
        },
        "updateRequests": 0
    },

This lack of reference comes from the migration process, more specifically from the restore ServiceInstance process.
Logs from migration jobs shows that the ServiceInstance restore process failed in the first iteration:

I 2019-10-18T12:11:54.589743Z Processing Service Instance: api-service-id
I 2019-10-18T12:11:55.181101Z Processing Service Instance: events-service-id
I 2019-10-18T12:11:55.782833Z Processing Service Instance: redis
I 2019-10-18T12:11:56.218256Z Retry 2
I 2019-10-18T12:11:56.218300Z Processing Service Instance: redis
I 2019-10-18T12:11:56.388871Z Resource already exists, deleting and recreating
I 2019-10-18T12:11:57.380901Z Applying 2 service bindings
I 2019-10-18T12:11:57.380944Z Removing owner referneces from secrets

The bug is described and fixed in this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/service-management Issues or PRs related to service management priority/critical Priority indication
Projects
None yet
Development

No branches or pull requests

3 participants