Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[regression] 1.3.2 regresses on "SinkBinding on KSVC results into ping-pong between operators #1936" #2184

Closed
maschmid opened this issue Mar 30, 2021 · 19 comments · Fixed by #2190
Assignees
Labels
area/knative Related to Knative kind/bug Something isn't working

Comments

@maschmid
Copy link

When deploying the splitter integration in the example/knative dir with latest Knative (verified on Serverless 1.13 and 1.14) and camel-k operator 1.3.2, splitter ksvc is repeatedly reconciled between two operators.

The same scenario works fine with camel-k operator 1.3.1, so it seems to be a regression in camel-k 1.3.2 vs 1.3.1.

See also #1936

possibly introduced by https://github.com/apache/camel-k/pull/2042/files ? (not verified)


NAME                                         READY   STATUS              RESTARTS   AGE
camel-k-kit-c1hmlapse357at2e5n40-1-build     0/1     Completed           0          2m45s
camel-k-kit-c1hmlapse357at2e5n40-builder     0/1     Completed           0          3m30s
camel-k-kit-c1hmlb1se357at2e5n4g-1-build     0/1     Completed           0          86s
camel-k-kit-c1hmlb1se357at2e5n4g-builder     0/1     Completed           0          2m5s
camel-k-kit-c1hmlb9se357at2e5n50-builder     1/1     Running             0          58s
splitter-00001-deployment-66c99c65c9-tzxlh   1/2     Running             1          58s
splitter-00002-deployment-646bd5c67b-5cwrb   2/2     Running             0          58s
splitter-00003-deployment-7f65746644-dr7xd   1/2     Running             2          56s
splitter-00004-deployment-6f8d7c8d48-v2zzl   2/2     Running             0          56s
splitter-00005-deployment-6fbd456fd5-vjj4l   1/2     Running             2          55s
splitter-00006-deployment-5c5cf58d79-442tj   2/2     Running             0          55s
splitter-00007-deployment-748bf46445-sg2cz   1/2     Running             1          54s
splitter-00008-deployment-777845d7d9-tmsr9   2/2     Running             0          53s
splitter-00009-deployment-6bf7f74475-fnc4x   0/2     Error               1          52s
splitter-00010-deployment-7ff6b966c-crlpw    1/2     Running             0          51s
splitter-00011-deployment-69c54f7c85-78dm7   1/2     Running             1          51s
splitter-00012-deployment-55757c7686-phvzx   2/2     Running             0          50s
splitter-00013-deployment-99d7fd48c-wgvss    1/2     Running             1          50s
splitter-00014-deployment-6c6b88b67d-5dcvj   1/2     Running             0          49s
splitter-00015-deployment-7f5d76496c-zqh6v   1/2     Running             1          48s
splitter-00016-deployment-79d7bcb6dc-fbdrp   2/2     Running             0          48s
splitter-00017-deployment-5d98c79ccd-5kg52   1/2     Running             1          47s
splitter-00018-deployment-7d7fbf5cb4-jxmj8   1/2     Running             0          47s
splitter-00019-deployment-56b79846d5-4w4dj   1/2     Running             1          46s
splitter-00020-deployment-c87644d94-r7s6m    2/2     Running             0          46s
splitter-00021-deployment-5db4cb69b5-rlq8j   0/2     Error               1          44s
splitter-00022-deployment-6d9bc8d86b-w4flv   1/2     Running             0          43s
splitter-00023-deployment-56f8dd4cf6-gmncl   0/2     Error               0          42s
splitter-00024-deployment-7d45b49459-cw854   2/2     Running             0          41s
splitter-00025-deployment-578977fc74-btvfg   1/2     Running             1          40s
splitter-00026-deployment-7d8fdb988c-6fgdp   2/2     Running             0          39s
splitter-00027-deployment-6dcccf67cc-sxtfl   1/2     Running             0          39s
splitter-00028-deployment-8684bbfdcb-4j8pg   1/2     Running             0          38s
splitter-00029-deployment-6c45f89db9-9l95g   1/2     Running             1          37s
splitter-00030-deployment-6dd8556d8f-gmqnf   1/2     Running             0          36s
splitter-00031-deployment-6758cfc687-kxfmh   1/2     Running             0          35s
splitter-00032-deployment-85854d4498-q4g7p   1/2     Running             0          34s
splitter-00033-deployment-7865b98856-qwttm   1/2     Running             1          32s
splitter-00034-deployment-7d8fc987f-kh6km    1/2     Running             0          31s
splitter-00035-deployment-86d7544c48-cf9w7   1/2     Running             0          30s
splitter-00036-deployment-66ff4bb7d6-hpsfl   1/2     Running             0          30s
splitter-00037-deployment-b5455569d-2fq4m    1/2     Running             0          27s
splitter-00038-deployment-6bfbf7595b-bj6hv   1/2     Running             0          26s
splitter-00039-deployment-6569dfd8b6-jp7x4   1/2     Running             0          22s
splitter-00040-deployment-674ff4797f-bhmx4   1/2     Running             0          21s
splitter-00041-deployment-64c556b7b6-jfwmm   1/2     Running             0          17s
splitter-00042-deployment-54bf57d879-kp6hh   1/2     Running             0          16s
splitter-00043-deployment-7db48ffb96-2gj6v   1/2     Running             0          14s
splitter-00044-deployment-549bbbd666-gb4nn   1/2     Running             0          13s
splitter-00045-deployment-554b8896db-jz26v   0/2     ContainerCreating   0          1s
splitter-00046-deployment-549d66cfc6-dgggf   0/2     ContainerCreating   0          1s
@astefanutti
Copy link
Member

astefanutti commented Mar 30, 2021

@maschmid would that be possible for you to test if the issue is present in master branch?

I'm asking, because #2042 is required to fix other issues, but it's been a tactical fix specifically for 1.3.x, and another solution is currently present in master branch, so it that works on master, we may have a solution ready to backport.

Also, we need to make sure this is covered by the e2e tests.

@maschmid
Copy link
Author

Do we have a CI image of the operator I can use, or do I have to build it locally myself first?

@maschmid
Copy link
Author

@astefanutti I still see the same issue with docker.io/maschmid/camel-k:1.4.0-SNAPSHOT , which I built from current master

{"level":"info","ts":1617188536.706038,"logger":"cmd","msg":"Go Version: go1.15.8"}
{"level":"info","ts":1617188536.7060597,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1617188536.706063,"logger":"cmd","msg":"Buildah Version: 1.14.0"}
{"level":"info","ts":1617188536.7060657,"logger":"cmd","msg":"Kaniko Version: 0.17.1"}
{"level":"info","ts":1617188536.7060683,"logger":"cmd","msg":"Camel K Operator Version: 1.4.0-SNAPSHOT"}
{"level":"info","ts":1617188536.706071,"logger":"cmd","msg":"Camel K Default Runtime Version: 1.7.0-SNAPSHOT"}
{"level":"info","ts":1617188536.7060733,"logger":"cmd","msg":"Camel K Git Commit: 18eb00851c92906cce668ba197c71176443495bd"}

@astefanutti
Copy link
Member

@maschmid sorry for the delayed answer, as I was AFK. Thanks a lot for the testing!

I've tried to test it on my setup (Camel K master branch, Knative 0.20 with Kourier, OCP 4.6) and could not reproduce. I can have the feed, splitter and printer integrations from the examples/knative directory working :( I'll try to understand what's different in my setup...

In the meantime, could you please share the content of the KnativeService and the Deployment resources? That may help me understanding what are the conflicting changes the Camel K operator and the Knative controller / webhook contend for.

In master branch we moved to using server-side apply (#2039), to try to cope with the issue. I assumed the problem is for environment variables set on the KnativeService, and that Knative also uses server-side apply.

In the 1.3.x branch, #2042 is a fixed on top of #1937, that was too aggressive and prevented some required updates to the resources metadata.

Still, the fact I cannot reproduce on my setup puzzles me, and I'd like to have the bottom line of all this.

@maschmid
Copy link
Author

maschmid commented Apr 1, 2021

With OpenShift Serverless 1.13 (knative serving/eventing 0.19)

splitter ksvc: https://gist.github.com/maschmid/bcc929c842ca5c93ac650ff8f1ae3007

@maschmid
Copy link
Author

maschmid commented Apr 1, 2021

all the deployments (there are lots of splitter deployments, as each revision has a new one...) https://gist.github.com/maschmid/ae0da49fa58283e2317043679e15f316

@astefanutti
Copy link
Member

Thanks. I'm interested by the managed fields, that show all the controllers cooperating on the resource.

In my tests, I only have the camel-k-operator and (Knative) controller, but yours shows also the webhook as controller, that seems to have ownership for the containers field.

@maschmid
Copy link
Author

maschmid commented Apr 1, 2021

One theory I have is this could be caused by a different sinkbinding configuration in Serverless... It uses SINK_BINDING_SELECTION_MODE=inclusion , which requires resources bound by SinkBinding to have bindings.knative.dev/include: "true" label.. (in this case, the ksvc)

without the label, the eventing webhook will not modify the ksvc during creation, but it will try to replace it at runtime... I suppose this could be causing the issue.

@maschmid
Copy link
Author

maschmid commented Apr 1, 2021

exclusion is default in upstream, OpenShift Serverless defaults to inclusion, because exclusion causes potentially any Deployment on the cluster to go via eventing webhook, which could cause instability (when a SinkBinding binding to Deployment exists, even system Deployments would go through eventing webhook)

@astefanutti
Copy link
Member

I've been able to reproduce by editing the eventing webhook and setting SINK_BINDING_SELECTION_MODE=inclusion.

I can now analyse the issue and research what's the best strategy.

@astefanutti astefanutti self-assigned this Apr 2, 2021
@astefanutti astefanutti added the kind/bug Something isn't working label Apr 2, 2021
@astefanutti
Copy link
Member

I'm realizing that adding the bindings.knative.dev/include: "true" label, to the source object referenced in the SinkBinding, may have been the solution all along.

The source objects are owned by Camel K, and we always want the eventing webhook to select them, irrespective of the Knative sink binding configuration (Inclusion or Exclusion).

I've tried that approach and it seems to be working well.

@astefanutti
Copy link
Member

Current proposed fix #2190.

@maschmid could you please review it, and maybe test it on your side.

Once it validated, I'll adapt it for the 1.3.x branch.

@astefanutti astefanutti added the area/knative Related to Knative label Apr 2, 2021
@maschmid
Copy link
Author

maschmid commented Apr 3, 2021

Works for me, and the fix looks good to me, too. I agree the "bindings.knative.dev/include=true" should be safe to add with either setting.

@weimeilin79
Copy link

weimeilin79 commented Apr 5, 2021

I am facing the same issue. it there a workaround for it for now?
Currently I just roll back to using 1.3.1. But where is this "bindings.knative.dev/include=true" I can set?

@maschmid
Copy link
Author

maschmid commented Apr 5, 2021

@weimeilin79 If you install via an Operator (Serverless operator or Knative Operator), a possible workaround could be to switch KnativeEventing sinkBindingSelectionMode to exclusion

apiVersion: operator.knative.dev/v1alpha1
kind: KnativeEventing
metadata:
  name: knative-eventing
  namespace: knative-eventing
spec:
  sinkBindingSelectionMode: exclusion

@weimeilin79
Copy link

Thanks @maschmid :) it works like a charm!!

@astefanutti
Copy link
Member

@maschmid thanks a lot for the testing and reviewing. I've merged #2190, and I'm now preparing the back port into 1.3.x branch.

@astefanutti
Copy link
Member

In case changing the Serverless / Knative configuration is not an option, another work-around is to add the label manually, e.g.:

$ kamel run --label bindings.knative.dev/include=true -t owner.target-labels=bindings.knative.dev/include

@astefanutti
Copy link
Member

1.3.x backport PR: #2191.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/knative Related to Knative kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants