ci: fix flaky e2e tests by colocating registry & hccm #687

apricote · 2024-07-08T08:56:16Z

In our dev/test environment we deploy a container registry to the cluster and push our image there to avoid a dependency on external registries.

To pull from the registry, containerd is instructed to use the service IP. This IP is resolved to the Pod IP in kube-proxy. The CNI is then responsible for making a connection to the Pod IP.

In our case we have the CNI configured to use the Cloud Routes for accessing the Pod IPs from other nodes. This is only being setup by HCCM, which is a problem if we need the routes for pulling the HCCM container image.

This fixes the circular dependency by running HCCM on the same node as the registry. That way no external routes are required and the traffic stays on the same node.

This is not a problem on clusters where HCCM is pulled from an external registry or where the routes controller is not used.

codecov · 2024-07-08T08:59:11Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.26%. Comparing base (1a8ea95) to head (1a2eddf).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #687      +/-   ##
==========================================
+ Coverage   72.16%   72.26%   +0.09%     
==========================================
  Files          31       32       +1     
  Lines        2497     2668     +171     
==========================================
+ Hits         1802     1928     +126     
- Misses        523      552      +29     
- Partials      172      188      +16

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

In our dev/test environment we deploy a container registry to the cluster and push our image there to avoid a dependency on external registries. To pull from the registry, containerd is instructed to use the service IP. This IP is resolved to the Pod IP in kube-proxy. The CNI is then responsible for making a connection to the Pod IP. In our case we have the CNI configured to use the Cloud Routes for accessing the Pod IPs from other nodes. This is only being setup by HCCM, which is a problem if we need the routes for pulling the HCCM container image. This fixes the circular dependency by running HCCM on the same node as the registry. That way no external routes are required and the traffic stays on the same node. This is not a problem on clusters where HCCM is pulled from an external registry or where the routes controller is not used.

apricote added the bug Something isn't working label Jul 8, 2024

apricote self-assigned this Jul 8, 2024

apricote had a problem deploying to e2e-robot July 8, 2024 08:56 — with GitHub Actions Error

apricote mentioned this pull request Jul 8, 2024

feat(helm): allow setting affinity for deployment #686

Merged

jooola approved these changes Jul 8, 2024

View reviewed changes

Base automatically changed from helm-affinity to main July 8, 2024 09:02

apricote force-pushed the dev-affinity branch from b738cba to 1a2eddf Compare July 8, 2024 09:02

apricote had a problem deploying to e2e-robot July 8, 2024 09:03 — with GitHub Actions Error

apricote marked this pull request as ready for review July 8, 2024 09:03

apricote requested a review from a team as a code owner July 8, 2024 09:03

apricote merged commit 68db213 into main Jul 8, 2024
7 of 8 checks passed

apricote deleted the dev-affinity branch July 8, 2024 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: fix flaky e2e tests by colocating registry & hccm #687

ci: fix flaky e2e tests by colocating registry & hccm #687

apricote commented Jul 8, 2024 •

edited

Loading

codecov bot commented Jul 8, 2024 •

edited

Loading

ci: fix flaky e2e tests by colocating registry & hccm #687

ci: fix flaky e2e tests by colocating registry & hccm #687

Conversation

apricote commented Jul 8, 2024 • edited Loading

codecov bot commented Jul 8, 2024 • edited Loading

Codecov Report

apricote commented Jul 8, 2024 •

edited

Loading

codecov bot commented Jul 8, 2024 •

edited

Loading