Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use xtables.lock #506

Closed
Daryltp opened this issue Aug 7, 2018 · 6 comments · Fixed by #884
Closed

use xtables.lock #506

Daryltp opened this issue Aug 7, 2018 · 6 comments · Fixed by #884
Labels

Comments

@Daryltp
Copy link

Daryltp commented Aug 7, 2018

When deploying networkpolicies to two namespaces in the same cluster on Azure Kubernetes Service simultaneously we have observed the following error. It appears to be due to resource contention on the iptables command. The issue is that both deployments will complete "successfully" but the traffic flows the policy allows will fail and the error is only apparent when you view the logs of the kube-router pod/s.

E0801 10:27:45.538722       1 network_policy_controller.go:182] Error syncing network policy for the update to network policy: <NAMESPACE_NAME>/<POLICY_NAME> Error: Aborting sync. Failed to sync network policy chains: Failed to run iptables command: running [/sbin/iptables -t filter -N KUBE-NWPLCY-YYN4V3F74PSONPOB --wait]: exit status 4: iptables: Resource temporarily unavailable.
@murali-reddy
Copy link
Member

@Daryltp thanks for reporting the issue. kube-router need to hold xtables.lock before trying to run iptables commands

kube-router manifests need to mount xtables.lock, go-iptables library that kube-router uses can lock to prevent concurrency issues.

@serbrech
Copy link

serbrech commented Oct 2, 2018

I took a quick look at this.
the iptables are being updated through the go-iptables package in the different controllers :

we cannot guarantee that other services running on the host would not try to update the iptables as at the same time too.

Would the file lock implementation from the go-iptable package work across containers?

What would be the prefered approach in kube-router?
Take the lock in each controller individually? or have a shared lock at the base of the service that the different controller use?

It seems that the go-iptable package does a no-op when the file is already locked (would block). How should that be handled?

@serbrech
Copy link

serbrech commented Oct 2, 2018

I see now that the go-iptables library always runs with --wait, so would mounting the xtables.lock file be enough to resolve this?

-w, --wait
Wait for the xtables lock. To prevent multiple instances of the program from running concurrently, an attempt will be made to obtain an exclusive lock at launch. By default, the program will exit if the lock cannot be obtained. This option will make the program wait until the exclusive lock can be obtained.

@roffe
Copy link
Collaborator

roffe commented Oct 2, 2018

I've added the following to my kube-router deployments to allows usage of xtables

        volumeMounts:
        ...
        - mountPath: /var/run/xtables.lock
          name: xtables
      volumes:
      ...
      - name: xtables
        hostPath:
          path: /run/xtables.lock

@murali-reddy murali-reddy changed the title "Failed to run iptables" error when deploying to two namespaces simultaneously use xtables.lock Apr 3, 2019
@soumeng09
Copy link

The volume mount doesn't really solve the issue.
We see kube-router stuck in "controller still performing bootup full-sync" loop for a while then restarts with messages "Network Policy Controller heartbeat missed" and "Shutting down the controllers".

We also see kube-proxy on the same node giving errors like -
"Failed to ensure that filter chain KUBE-EXTERNAL-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s."

From the messages it looks like a race condition between kube-proxy and kube-router on iptables lock.

We've been seeing this issue persistently in our clusters (AKS cluster with around 20 nodes).

@murali-reddy
Copy link
Member

We see kube-router stuck in "controller still performing bootup full-sync" loop for a while then restarts with messages "Network Policy Controller heartbeat missed" and "Shutting down the controllers".

@soumeng09 can you please share kube-router logs? Lock is held by iptables command just for the duration when command is run. There is possibility of race condition but both kube-router and kube-proxy launches iptables command with -w option. So there is less chance of failing to execute.

It woule be good to look at the both kube-router and kube-proxy logs to understand what sort of contention is happening. Also is it possible that some other process is holding up the lock?

murali-reddy added a commit that referenced this issue Apr 23, 2020
used by iptables command when run by kube-router

Fixes #506
murali-reddy added a commit that referenced this issue Apr 24, 2020
…884)

used by iptables command when run by kube-router

Fixes #506
FabienZouaoui pushed a commit to SirDataFR/kube-router that referenced this issue May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants