Guard against possible race condition in new CNI #4560

slonka · 2022-07-07T08:05:07Z

Description

TLDR:

If an application pod starts before CNI installer installs CNI, Kubelet has no knowledge of the CNI plugin. The result is that the application pod comes up without iptables rules.

The race condition is best described in the Istio design doc

Istio first solved the issue by labeling the broken pods with a specific label and then killing them (PR). However this approach causes additional churning during a node start. It also produces a number of errors in customer logs and lacks portability. So they switched to the taint controller approach implemented in this PR.

Possible solutions:

Reimplement the taint controller
At DP startup check if transparent proxy is set and fail the start if it's not.
Try to revive "KEP: Node Readiness Gates #1003"
Base this on pod healthcheck

WDYT?

jakubdyszkiewicz · 2022-07-11T14:06:44Z

Triage: Let's go with option 1. implemented in CP. The extra K8S RBAC (update node) should be only set when CNI is enabled.

slonka · 2022-07-21T14:22:24Z

I started implementing this and it seems that if we had a default probe (like testserver in the test framework has it and demo-client does not) that will not go healthy if the traffic is not redirected to Envoy that would prevent workloads from being accessible. If we couple that with having the CNI on startup getting a list of pods and injecting the rules inside those pods then we should be fine against this race condition. WDYT? decided to stick with the controller

slonka added triage/pending This issue will be looked at on the next triage meeting kind/feature New feature labels Jul 7, 2022

jakubdyszkiewicz added triage/accepted The issue was reviewed and is complete enough to start working on it and removed triage/pending This issue will be looked at on the next triage meeting labels Jul 11, 2022

lahabana assigned slonka Jul 11, 2022

slonka mentioned this issue Jul 27, 2022

feat(cni): taint controller #4650

Merged

4 tasks

slonka closed this as completed in #4650 Aug 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guard against possible race condition in new CNI #4560

Guard against possible race condition in new CNI #4560

slonka commented Jul 7, 2022 •

edited

Loading

jakubdyszkiewicz commented Jul 11, 2022

slonka commented Jul 21, 2022 •

edited

Loading

Guard against possible race condition in new CNI #4560

Guard against possible race condition in new CNI #4560

Comments

slonka commented Jul 7, 2022 • edited Loading

Description

jakubdyszkiewicz commented Jul 11, 2022

slonka commented Jul 21, 2022 • edited Loading

slonka commented Jul 7, 2022 •

edited

Loading

slonka commented Jul 21, 2022 •

edited

Loading