Fix DLQ flake #814

ikavgo · 2022-06-13T15:04:28Z

Separates queue and policy reconciliation to prevent a race condition on RMQ server.

/kind bug

Fixes #792

knative-prow · 2022-06-13T15:04:30Z

@ikvmw: The label(s) kind/<kind> cannot be applied, because the repository doesn't have them.

In response to this:

Changes

/kind

Fixes #

Release Note
Docs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

linux-foundation-easycla · 2022-06-13T15:04:34Z

The committers listed above are authorized under a signed CLA.

✅ login: ikvmw / name: Iliia Khaprov - VMware (963417c, d6e0a74)

codecov · 2022-06-13T15:50:30Z

Codecov Report

Merging #814 (d6e0a74) into main (d7e48d6) will decrease coverage by 0.33%.
The diff coverage is 10.25%.

❗ Current head d6e0a74 differs from pull request most recent head 963417c. Consider uploading reports for the commit 963417c to get more accurate results

@@            Coverage Diff             @@
##             main     #814      +/-   ##
==========================================
- Coverage   74.30%   73.96%   -0.34%     
==========================================
  Files          39       39              
  Lines        2506     2520      +14     
==========================================
+ Hits         1862     1864       +2     
- Misses        577      588      +11     
- Partials       67       68       +1

Impacted Files	Coverage Δ
pkg/rabbit/service.go	`11.36% <0.00%> (-0.18%)`	⬇️
pkg/reconciler/trigger/trigger.go	`62.34% <22.22%> (-1.72%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7e48d6...963417c. Read the comment docs.

gab-satchi

Changes lgtm. just a question about overriding the ERL_MAX_PORTS

gab-satchi · 2022-06-13T17:22:20Z

test/conformance/resources/rabbitmqcluster/rabbitmqcluster.yaml

+            containers:
+            - name: rabbitmq
+              env:
+              - name: ERL_MAX_PORTS


Is this change still needed?

mentioned it today during retro - rabbitmq/cluster-operator#959

But we're not experiencing a crash loop with the node not starting though right? Our RMQ version is also newer (3.10) I believe so maybe that's why we're bypassing the issue?

it doesn't depend on RMQ version. it's a combination of host setup and memory limit. When Erlang starts it tries to allocate memory stuctures for all available FDs. For example:

cat /proc/sys/fs/file-max 9223372036854775807

So the fact that it works locally and on GH is a pure luck

All of these run in Kind and looking at the control-plane there, I get:

root@knative-control-plane:/# ulimit -n 1048576

Is 1048576 too high? That seems fine for 1GB of RAM. Also doesn't 4096 feel too low?

I can just barely grok what's happening with Erlang here. I understand that with a too high FD limit, Erlang can cause RMQ broker to OOM as it tries to allocate something per FD.

But it does look like this change isn't directly tied to the DLQ flake we're seeing in conformance tests. Maybe we can omit this change for this PR to unblock it and then open something separate to discuss this? We also have RMQClusters defined elsewhere in docs and setup instructions so we'll likely need a broader change than just the test clusters.

I can just barely grok what's happening with Erlang here. I understand that with a too high FD limit, Erlang can cause RMQ broker to OOM as it tries to allocate something per FD.

you got it completely right. we have to limit the subset erlang sees because we don't control test environments. it's like restricting max file size while doing uploads handler, etc.

Having really high fd limit is not a problem of course. The problem is this particular software pattern of preallocating stuff.

It's of course possible for me to have this change in a separate PR. However, on my system tests can't be run without capping ERL_MAX_PRTS. So that's why it goes as a package.

gabo1208 · 2022-06-13T17:24:03Z

test/conformance/resources/rabbitmqcluster/rabbitmqcluster.yaml

+            - name: rabbitmq
+              env:
+              - name: ERL_MAX_PORTS
+                value: "4096"


why this limit?

mentioned it today during retro - rabbitmq/cluster-operator#959

knative-prow · 2022-06-13T17:28:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabo1208, ikvmw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [gabo1208,ikvmw]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gabo1208 · 2022-06-13T17:28:16Z

Just a comment but seems good to me

ikavgo · 2022-06-14T13:27:44Z

removed ports fix @gab-satchi. will open another pr

gab-satchi · 2022-06-14T18:56:41Z

/lgtm

knative-prow bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 13, 2022

knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 13, 2022

knative-prow bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 13, 2022

gab-satchi mentioned this pull request Jun 13, 2022

Wait for queue ready before creating a policy #812

Closed

RabbitMQ queue/policy race fix. close knative-extensions#792

963417c

ikavgo force-pushed the fix-dlq-flake branch from 22e5caa to d6e0a74 Compare June 13, 2022 15:46

knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 13, 2022

gab-satchi reviewed Jun 13, 2022

View reviewed changes

gabo1208 approved these changes Jun 13, 2022

View reviewed changes

knative-prow bot assigned gabo1208 Jun 13, 2022

knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jun 13, 2022

ikavgo force-pushed the fix-dlq-flake branch from d6e0a74 to 963417c Compare June 14, 2022 13:08

knative-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 14, 2022

gab-satchi changed the title ~~WIP: Fix dlq flake~~ Fix DLQ flake Jun 14, 2022

knative-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 14, 2022

knative-prow bot assigned gab-satchi Jun 14, 2022

knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jun 14, 2022

knative-prow bot merged commit 68a9b30 into knative-extensions:main Jun 14, 2022

gab-satchi mentioned this pull request Jun 15, 2022

Fix flakey DeliverySpec test assertion #603

Closed

ikavgo mentioned this pull request Jul 3, 2022

*Possibly new Flake* #850

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DLQ flake #814

Fix DLQ flake #814

ikavgo commented Jun 13, 2022 •

edited by gab-satchi

Loading

knative-prow bot commented Jun 13, 2022

Changes

linux-foundation-easycla bot commented Jun 13, 2022 •

edited

Loading

codecov bot commented Jun 13, 2022 •

edited

Loading

gab-satchi left a comment

gab-satchi Jun 13, 2022

ikavgo Jun 13, 2022

gab-satchi Jun 13, 2022

ikavgo Jun 13, 2022

gab-satchi Jun 13, 2022

gab-satchi Jun 13, 2022

ikavgo Jun 13, 2022 •

edited

Loading

gabo1208 Jun 13, 2022

ikavgo Jun 13, 2022

knative-prow bot commented Jun 13, 2022

gabo1208 commented Jun 13, 2022

ikavgo commented Jun 14, 2022

gab-satchi commented Jun 14, 2022

Fix DLQ flake #814

Fix DLQ flake #814

Conversation

ikavgo commented Jun 13, 2022 • edited by gab-satchi Loading

knative-prow bot commented Jun 13, 2022

Changes

linux-foundation-easycla bot commented Jun 13, 2022 • edited Loading

codecov bot commented Jun 13, 2022 • edited Loading

Codecov Report

gab-satchi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ikavgo Jun 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knative-prow bot commented Jun 13, 2022

gabo1208 commented Jun 13, 2022

ikavgo commented Jun 14, 2022

gab-satchi commented Jun 14, 2022

ikavgo commented Jun 13, 2022 •

edited by gab-satchi

Loading

linux-foundation-easycla bot commented Jun 13, 2022 •

edited

Loading

codecov bot commented Jun 13, 2022 •

edited

Loading

ikavgo Jun 13, 2022 •

edited

Loading