Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Over 10 minutes of delay for operator upgrade to initiate after CatalogSource image is updated. #947

Closed
sacharya opened this issue Jul 9, 2019 · 4 comments

Comments

@sacharya
Copy link

sacharya commented Jul 9, 2019

Start with a CatalogSource and Subscription that is working. Etcd ClusterServiceVersion will be created successfully.

$ kubectl get csvs --all-namespaces
NAMESPACE   NAME                    DISPLAY          VERSION   REPLACES              PHASE
kniops      etcdoperator.v0.9.2     etcd             0.9.2     etcdoperator.v0.9.0   Succeeded
olm         packageserver.v0.10.1   Package Server   0.10.1                          Succeeded
$ kubectl get installplans --all-namespaces
NAMESPACE   NAME            CSV                     SOURCE   APPROVAL    APPROVED
kniops      install-6smkz   etcdoperator.v0.9.2              Automatic   true
olm         install-d4qr5   packageserver.v0.10.1            Automatic   true

Change the image field on the CatalogSource to point to a new image, which has an additional CSV.

It takes over 10 minutes before OLM acts on the change, new upgrade is initiated and etcdoperator is actually upgraded. The installplans do not get updated during that time. Eventually:

$ kubectl get installplans --all-namespaces
NAMESPACE   NAME            CSV                     SOURCE   APPROVAL    APPROVED
kniops      install-6smkz   etcdoperator.v0.9.2              Automatic   true
kniops      install-p65sh   etcdoperator.v0.9.4              Automatic   true
olm         install-d4qr5   packageserver.v0.10.1            Automatic   true
$ kubectl get subscription kni -n kniops -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  creationTimestamp: "2019-07-09T17:02:29Z"
  generation: 1
  name: kni
  namespace: kniops
  ownerReferences:
  - apiVersion: kni.openshift.com/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: KNICluster
    name: example-knicluster
    uid: 55b7c141-a26b-11e9-8236-748511679c6e
  resourceVersion: "4677"
  selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/kniops/subscriptions/kni
  uid: 55bdaebd-a26b-11e9-8236-748511679c6e
spec:
  channel: singlenamespace-alpha
  name: etcd
  source: demo-catalog
  sourceNamespace: olm
status:
  currentCSV: etcdoperator.v0.9.4
  installPlanRef:
    apiVersion: operators.coreos.com/v1alpha1
    kind: InstallPlan
    name: install-p65sh
    namespace: kniops
    resourceVersion: "4657"
    uid: 81a40300-a26d-11e9-8236-748511679c6e
  installedCSV: etcdoperator.v0.9.4
  installplan:
    apiVersion: operators.coreos.com/v1alpha1
    kind: InstallPlan
    name: install-p65sh
    uuid: 81a40300-a26d-11e9-8236-748511679c6e
  lastUpdated: "2019-07-09T17:18:04Z"
  state: AtLatestKnown

From the catalog-operator logs:

time="2019-07-09T17:04:38Z" level=info msg="building connection to registry" currentSource="{demo-catalog olm}" id=xhyGG source=demo-catalog
time="2019-07-09T17:04:38Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{demo-catalog olm}" id=xhyGG source=demo-catalog
time="2019-07-09T17:18:01Z" level=info msg=syncing id=B7fOH ip=install-d4qr5 namespace=olm phase=Complete
time="2019-07-09T17:18:01Z" level=info msg=syncing id=5o8En ip=install-6smkz namespace=kniops phase=Complete
time="2019-07-09T17:18:01Z" level=warning msg="no installplan found with matching manifests, creating new one" id=Iy1wN namespace=kniops
time="2019-07-09T17:18:02Z" level=info msg=syncing id=3Sdu4 ip=install-p65sh namespace=kniops phase=
time="2019-07-09T17:18:02Z" level=info msg="skip processing installplan without status - subscription sync responsible for initial status" id=3Sdu4 ip=install-p65sh namespace=kniops phase=
time="2019-07-09T17:18:02Z" level=info msg=syncing id=PIzX6 ip=install-p65sh namespace=kniops phase=Installing
time="2019-07-09T17:18:04Z" level=info msg=syncing id=+N0UQ ip=install-p65sh namespace=kniops phase=Complete
@ecordell
Copy link
Member

ecordell commented Jul 9, 2019

Thanks for the report @sacharya!

When a catalog is updated, we spend some time trying to connect and waiting for the connection to become healthy. If that doesn't happen fast enough, we back off and wait a while longer - this is the 10m delay you're seeing.

We already have a fix for this issue staged and ready to merge, which you can track here: #906 . This triggers OLM to check anytime the state of the grpc connection changes, so updates to a catalog should roll out as soon as the connection is healthy, instead of sometimes taking 10m.

@sacharya
Copy link
Author

sacharya commented Jul 9, 2019

@ecordell Does that solve #901 too?

@sacharya
Copy link
Author

sacharya commented Jul 15, 2019

@ecordell I pulled down #906 on top of master and I still see the delay during install and upgrade.

Before upgrade

$ kubectl get catalogsources -n local -o yaml
apiVersion: v1
items:
- apiVersion: operators.coreos.com/v1alpha1
  kind: CatalogSource
  metadata:
    creationTimestamp: "2019-07-15T21:16:49Z"
    generation: 1
    name: demo-catalog
    namespace: local
    resourceVersion: "1872"
    selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/local/catalogsources/demo-catalog
    uid: dbf4d538-a745-11e9-8d77-1ccefc66179f
  spec:
    displayName: KNI Operators
    icon:
      base64data: ""
      mediatype: ""
    image: quay.io/mhrivnak/demo-operator-registry:1.0
    publisher: kni.openshift.com
    sourceType: grpc
  status:
    lastSync: "2019-07-15T21:27:22Z"
    registryService:
      createdAt: "2019-07-15T21:16:49Z"
      port: "50051"
      protocol: grpc
      serviceName: demo-catalog
      serviceNamespace: local
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
$ kubectl get subscription kni -n kniops -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  creationTimestamp: "2019-07-15T21:16:49Z"
  generation: 1
  name: kni
  namespace: kniops
  ownerReferences:
  - apiVersion: kni.openshift.com/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: KNICluster
    name: example-knicluster
    uid: dbf1802e-a745-11e9-8d77-1ccefc66179f
  resourceVersion: "1941"
  selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/kniops/subscriptions/kni
  uid: dbf619fb-a745-11e9-8d77-1ccefc66179f
spec:
  channel: singlenamespace-alpha
  name: etcd
  source: demo-catalog
  sourceNamespace: local
status:
  catalogHealth:
  - catalogSourceRef:
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      name: demo-catalog
      namespace: local
      resourceVersion: "1872"
      uid: dbf4d538-a745-11e9-8d77-1ccefc66179f
    healthy: true
    lastUpdated: "2019-07-15T21:27:22Z"
  conditions:
  - lastTransitionTime: "2019-07-15T21:27:22Z"
    message: all available catalogsources are healthy
    reason: AllCatalogSourcesHealthy
    status: "False"
    type: CatalogSourcesUnhealthy
  currentCSV: etcdoperator.v0.9.2
  installPlanRef:
    apiVersion: operators.coreos.com/v1alpha1
    kind: InstallPlan
    name: install-kpzpb
    namespace: kniops
    resourceVersion: "1878"
    uid: 551ae58b-a747-11e9-8d77-1ccefc66179f
  installedCSV: etcdoperator.v0.9.2
  installplan:
    apiVersion: operators.coreos.com/v1alpha1
    kind: InstallPlan
    name: install-kpzpb
    uuid: 551ae58b-a747-11e9-8d77-1ccefc66179f
  lastUpdated: "2019-07-15T21:27:28Z"
  state: AtLatestKnown

Now change the image on CatalogSource to a new image. The image and lastSync time gets updated.

$ kubectl get catalogsources -n local -o yaml
apiVersion: v1
items:
- apiVersion: operators.coreos.com/v1alpha1
  kind: CatalogSource
  metadata:
    creationTimestamp: "2019-07-15T21:16:49Z"
    generation: 2
    name: demo-catalog
    namespace: local
    resourceVersion: "2785"
    selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/local/catalogsources/demo-catalog
    uid: dbf4d538-a745-11e9-8d77-1ccefc66179f
  spec:
    displayName: KNI Operators
    icon:
      base64data: ""
      mediatype: ""
    image: quay.io/mhrivnak/demo-operator-registry:1.1
    publisher: kni.openshift.com
    sourceType: grpc
  status:
    lastSync: "2019-07-15T21:32:12Z"
    registryService:
      createdAt: "2019-07-15T21:31:58Z"
      port: "50051"
      protocol: grpc
      serviceName: demo-catalog
      serviceNamespace: local
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

The subscription doesn't change for a while.

Eventually:

$ kubectl get catalogsources -n local -o yaml
apiVersion: v1
items:
- apiVersion: operators.coreos.com/v1alpha1
  kind: CatalogSource
  metadata:
    creationTimestamp: "2019-07-15T21:16:49Z"
    generation: 2
    name: demo-catalog
    namespace: local
    resourceVersion: "4423"
    selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/local/catalogsources/demo-catalog
    uid: dbf4d538-a745-11e9-8d77-1ccefc66179f
  spec:
    displayName: KNI Operators
    icon:
      base64data: ""
      mediatype: ""
    image: quay.io/mhrivnak/demo-operator-registry:1.1
    publisher: kni.openshift.com
    sourceType: grpc
  status:
    lastSync: "2019-07-15T21:42:22Z"
    registryService:
      createdAt: "2019-07-15T21:31:58Z"
      port: "50051"
      protocol: grpc
      serviceName: demo-catalog
      serviceNamespace: local
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
$ kubectl get subscription kni -n kniops -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  creationTimestamp: "2019-07-15T21:16:49Z"
  generation: 1
  name: kni
  namespace: kniops
  ownerReferences:
  - apiVersion: kni.openshift.com/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: KNICluster
    name: example-knicluster
    uid: dbf1802e-a745-11e9-8d77-1ccefc66179f
  resourceVersion: "4460"
  selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/kniops/subscriptions/kni
  uid: dbf619fb-a745-11e9-8d77-1ccefc66179f
spec:
  channel: singlenamespace-alpha
  name: etcd
  source: demo-catalog
  sourceNamespace: local
status:
  catalogHealth:
  - catalogSourceRef:
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      name: demo-catalog
      namespace: local
      resourceVersion: "1872"
      uid: dbf4d538-a745-11e9-8d77-1ccefc66179f
    healthy: true
    lastUpdated: "2019-07-15T21:27:22Z"
  conditions:
  - lastTransitionTime: "2019-07-15T21:27:22Z"
    message: all available catalogsources are healthy
    reason: AllCatalogSourcesHealthy
    status: "False"
    type: CatalogSourcesUnhealthy
  - lastTransitionTime: "2019-07-15T21:42:28Z"
    reason: InstallComponentFailed
    status: "True"
    type: InstallPlanFailed
  currentCSV: etcdoperator.v0.9.4
  installPlanRef:
    apiVersion: operators.coreos.com/v1alpha1
    kind: InstallPlan
    name: install-rgkpc
    namespace: kniops
    resourceVersion: "4432"
    uid: 6dfc884a-a749-11e9-8d77-1ccefc66179f
  installedCSV: etcdoperator.v0.9.4
  installplan:
    apiVersion: operators.coreos.com/v1alpha1
    kind: InstallPlan
    name: install-rgkpc
    uuid: 6dfc884a-a749-11e9-8d77-1ccefc66179f
  lastUpdated: "2019-07-15T21:42:28Z"
  state: AtLatestKnown

@ecordell
Copy link
Member

You shouldn't see this issue on master anymore - please re-open if you do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants