-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't attach to pods in step 09-workload #223
Comments
Interesting observation. Indeed, we haven't experience that in any of our recent deployments. The only traffic that should be going through the firewall is anything that is influenced by the UDR (next hop traffic leaving the subnet). So traffic to the managed AKS master nodes, for example. But not traffic between nodes. We've seen this step fail occasionally and are thinking of just removing it. As for your experience with changing the firewall rules for this to work, that's what's got my head scratching. You mentioned changing INBOUND rules as well as part of the solution.... can you say more about that. The FW is not responsible for gating inbound traffic. |
Hi @ckittel, I since managed to narrow this down to port 9000. The AKS egress rules docs lists this as a requirement "For tunneled secure communication between the nodes and the control plane. This is not required for private clusters." So the working firewall rule looks like this: Rule Collection Group: DefaultNetworkRuleCollectionGroup My latest understanding is that by default the managed AKS control plane runs separately to the node pools; we don't have access to it. So it would make sense to me that we need this rule to allow them to traverse the subnet and out of the firewall to wherever in Azure the control plane is running. Again, this still doesn't explain why it works for everyone else though :)
I undertand that kubectl run is not something you generally want to advocate, however, it's still core k8s functionality I would expect to work in this setup (same with kubectl logs). In my case it's not showing the right thing, I can't get the 403 nor any response as the terminal can't even attach without the above rule in place. Appreciate if you have any other thoughts on this, particularly around the port 9000 requirement. Happy to provide any other info as needed. |
Your understanding is correct on the separation of the node pools and the managed portion of AKS. But this deployment already deploys the necessary firewall rules. Did all of the other firewall rules get deployed when you did the deployment? Are you by chance deploying on older version of AKS? Port 9000 was used with tunnelfront, but your cluster should be using konnectivity at this point (which is all 443). One other question, is the cluster being deployed to the same region as the firewall? It's strange that the first failure in this process is being run into this far into the instructions. For example, I can't think of a reason you'd have successfully gotten past any of the prior validation steps before running into an error here with this identified step. Weird that this one step (vs any of the other kubectl commands you've had to execute so far to get to this point), would be the one that trips it up. Something seems "off" here for sure. Please validate that the firewall's rules all got deployed properly and that the cluster is otherwise healthy before this validation step (this includes that flux was installed and is syncing, etc). The comment I had about for |
If I'm reading correctly I can see two firewall policies created within hub-regionA.json:
They all look to have been successfully deployed in my environment.
This is interesting, and quite possibly the issue. I haven't specified the cluster version anywhere so I'm just using whatever I get with these templates, which currently seems to be 1.21.2:
From what I can see konnectivity was introduced in k8s 1.18, though I don't see any reference to it in the AKS docs. It appears my cluster is using tunnelfront...
Yes, everything is being deployed in the West Europe region. |
Well that certainly explains what's happening. We removed the port 9000 requirement back in #199 as that was no longer required with new deployments of AKS. That's fascinating that you're getting tunnelfront on your 1.21.2 cluster. Azure/AKS#2452 was raised by another customer that also happens to be in West Europe that noticed the same thing back in June. I wonder if, oddly, that rollout never fully completed? Any chance you can "me too" on to this linked issue and see if there is any known updates on the konnectivity rollout? Maybe the rollout has been only done in certain regions (and the default regions of this repo happen to have that change, but other regions do not yet). I feel like we're getting somewhere on this, but we might need AKS product team's input. |
I am experiencing the same issue as documented by brk3 with the following difference. There is no instance of tunnelfront to explain what is going on, and opening port 9000 does not resolve the issue. As expected in this case since it's not tunnelfront. In all other respects my results are the exact same as brk3.
|
Following along with https://docs.microsoft.com/en-us/azure/firewall/protect-azure-kubernetes-service as brk3 did I've tried opening different ports and port combinations. Allowing UDP traffic on port 1194 resolved my ability to attach to the pod and execute the curl query. It appears to me that 1194 is still required. My hub is in East US 2 if it matters. |
It looks like konnectivity is rolling out more broadly now. Since the egress affordances for aks-link have been replaced with the simplified egress rules found in this reference implementation for konnectivity, I'm going to close this issue. But if your region doesn't use konnectivity, then the conversation above will help. It's just a matter of timing between the two, unfortunately. |
* Allow communication with API server via udp/1194. References: #223 https://docs.microsoft.com/en-us/azure/firewall/protect-azure-kubernetes-service * Return IP address instead of res. ID (acc to doc) * Minimal user feedback: echo variables to console. * ifconfig.io to return IPv4 addr for access policy * Notes for macOS users, having BSD sed. * Improvement to comment. Co-authored-by: Chad Kittel <[email protected]> * Comment out firewall rule, but add hints. * Enable FW rule in bicep; remove warning. Co-authored-by: Chad Kittel <[email protected]>
* Allow communication with API server via udp/1194. References: #223 https://docs.microsoft.com/en-us/azure/firewall/protect-azure-kubernetes-service * Return IP address instead of res. ID (acc to doc) * Minimal user feedback: echo variables to console. * ifconfig.io to return IPv4 addr for access policy * Notes for macOS users, having BSD sed. * Improvement to comment. Co-authored-by: Chad Kittel <[email protected]> * Comment out firewall rule, but add hints. * Enable FW rule in bicep; remove warning. * Update references to 'aks-baseline'. * Get current branch name and pass as parameter. * Pass domain name as parameter to curl container. * Optimize docs for pre-existing AAD group. - Add bash snippet to set pre-existing group. - Add hints to skip user creation / member adding group has members. * Hint for single-tenant deployment. * Make namespace reader group optional. * Fix: Print correct variable name. * Only stage intentionally changed file for commit. * FIx deployment failures on role lookup * Add some clarification to docs. * Make saveenv.sh independent of current directory. * Append suffix to GITOPS variables... ...making sure they are also written to aks_baseline.env by saveenv.sh. * export GITOPS variables. * Revert "FIx deployment failures on role lookup" This reverts commit 9234b57. * Revert "Only stage intentionally changed file for commit." This reverts commit fba516b. * GITOPS variables are just 'local'. * Update 01-prerequisites.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 11-validation.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * GITOPS variables are just 'local'. Co-authored-by: Chad Kittel <[email protected]>
* Allow communication with API server via udp/1194. References: #223 https://docs.microsoft.com/en-us/azure/firewall/protect-azure-kubernetes-service * Return IP address instead of res. ID (acc to doc) * Minimal user feedback: echo variables to console. * ifconfig.io to return IPv4 addr for access policy * Notes for macOS users, having BSD sed. * Improvement to comment. Co-authored-by: Chad Kittel <[email protected]> * Comment out firewall rule, but add hints. * Enable FW rule in bicep; remove warning. * Update references to 'aks-baseline'. * Get current branch name and pass as parameter. * Pass domain name as parameter to curl container. * Optimize docs for pre-existing AAD group. - Add bash snippet to set pre-existing group. - Add hints to skip user creation / member adding group has members. * Hint for single-tenant deployment. * Make namespace reader group optional. * Fix: Print correct variable name. * Only stage intentionally changed file for commit. * FIx deployment failures on role lookup * Add some clarification to docs. * Make saveenv.sh independent of current directory. * Append suffix to GITOPS variables... ...making sure they are also written to aks_baseline.env by saveenv.sh. * export GITOPS variables. * Revert "FIx deployment failures on role lookup" This reverts commit 9234b57. * Revert "Only stage intentionally changed file for commit." This reverts commit fba516b. * GITOPS variables are just 'local'. * Update 01-prerequisites.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 11-validation.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * GITOPS variables are just 'local'. * Replace WAF configuration with WAF policy. Co-authored-by: Chad Kittel <[email protected]>
* Allow communication with API server via udp/1194. References: mspnp#223 https://docs.microsoft.com/en-us/azure/firewall/protect-azure-kubernetes-service * Return IP address instead of res. ID (acc to doc) * Minimal user feedback: echo variables to console. * ifconfig.io to return IPv4 addr for access policy * Notes for macOS users, having BSD sed. * Improvement to comment. Co-authored-by: Chad Kittel <[email protected]> * Comment out firewall rule, but add hints. * Enable FW rule in bicep; remove warning. * Update references to 'aks-baseline'. * Get current branch name and pass as parameter. * Pass domain name as parameter to curl container. * Optimize docs for pre-existing AAD group. - Add bash snippet to set pre-existing group. - Add hints to skip user creation / member adding group has members. * Hint for single-tenant deployment. * Make namespace reader group optional. * Fix: Print correct variable name. * Only stage intentionally changed file for commit. * FIx deployment failures on role lookup * Add some clarification to docs. * Make saveenv.sh independent of current directory. * Append suffix to GITOPS variables... ...making sure they are also written to aks_baseline.env by saveenv.sh. * export GITOPS variables. * Revert "FIx deployment failures on role lookup" This reverts commit 9234b57. * Revert "Only stage intentionally changed file for commit." This reverts commit fba516b. * GITOPS variables are just 'local'. * Update 01-prerequisites.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 11-validation.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * GITOPS variables are just 'local'. * Replace WAF configuration with WAF policy. Co-authored-by: Chad Kittel <[email protected]>
* Allow communication with API server via udp/1194. References: #223 https://docs.microsoft.com/en-us/azure/firewall/protect-azure-kubernetes-service * Return IP address instead of res. ID (acc to doc) * Minimal user feedback: echo variables to console. * ifconfig.io to return IPv4 addr for access policy * Notes for macOS users, having BSD sed. * Improvement to comment. Co-authored-by: Chad Kittel <[email protected]> * Comment out firewall rule, but add hints. * Enable FW rule in bicep; remove warning. * Update references to 'aks-baseline'. * Get current branch name and pass as parameter. * Pass domain name as parameter to curl container. * Optimize docs for pre-existing AAD group. - Add bash snippet to set pre-existing group. - Add hints to skip user creation / member adding group has members. * Hint for single-tenant deployment. * Make namespace reader group optional. * Fix: Print correct variable name. * Only stage intentionally changed file for commit. * FIx deployment failures on role lookup * Add some clarification to docs. * Make saveenv.sh independent of current directory. * Append suffix to GITOPS variables... ...making sure they are also written to aks_baseline.env by saveenv.sh. * export GITOPS variables. * Revert "FIx deployment failures on role lookup" This reverts commit 9234b57. * Revert "Only stage intentionally changed file for commit." This reverts commit fba516b. * GITOPS variables are just 'local'. * Update 01-prerequisites.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 11-validation.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * GITOPS variables are just 'local'. * Fix: Peering name length for long region names. * Update networking/spoke-BU0001A0008.bicep Co-authored-by: Chad Kittel <[email protected]> * Change: Replace AppGW WAF config with WAF policy resource. (#316) * Allow communication with API server via udp/1194. References: #223 https://docs.microsoft.com/en-us/azure/firewall/protect-azure-kubernetes-service * Return IP address instead of res. ID (acc to doc) * Minimal user feedback: echo variables to console. * ifconfig.io to return IPv4 addr for access policy * Notes for macOS users, having BSD sed. * Improvement to comment. Co-authored-by: Chad Kittel <[email protected]> * Comment out firewall rule, but add hints. * Enable FW rule in bicep; remove warning. * Update references to 'aks-baseline'. * Get current branch name and pass as parameter. * Pass domain name as parameter to curl container. * Optimize docs for pre-existing AAD group. - Add bash snippet to set pre-existing group. - Add hints to skip user creation / member adding group has members. * Hint for single-tenant deployment. * Make namespace reader group optional. * Fix: Print correct variable name. * Only stage intentionally changed file for commit. * FIx deployment failures on role lookup * Add some clarification to docs. * Make saveenv.sh independent of current directory. * Append suffix to GITOPS variables... ...making sure they are also written to aks_baseline.env by saveenv.sh. * export GITOPS variables. * Revert "FIx deployment failures on role lookup" This reverts commit 9234b57. * Revert "Only stage intentionally changed file for commit." This reverts commit fba516b. * GITOPS variables are just 'local'. * Update 01-prerequisites.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 11-validation.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * Update 03-aad.md Co-authored-by: Chad Kittel <[email protected]> * GITOPS variables are just 'local'. * Replace WAF configuration with WAF policy. Co-authored-by: Chad Kittel <[email protected]> Co-authored-by: Chad Kittel <[email protected]>
Hi, thanks for putting this reference guide together, I've found it really useful.
I'm having an issue connecting to pods on the worker nodes as shown in step 9.4:
The result is the same with kubectl logs:
Looking at Microsoft's troubleshooting page I see this exact error, but it's unclear to me as of yet which NSG I may need to modify, or if this is indeed the actual problem. I'm assuming it would be the subnet 'snet-clusternodes' but the NSG attached to this is wide open...
Update: I managed to get this working by adding a network rule to the firewall opening all ports in both directions, effectively disabling the firewall. Narrowing it down to the kubelet port (10250) doesn't work, so I still have more questions that answers. Are certain internal cluster comms traversing the firewall? If so why? Also curious as to why other people are not seeing this issue when using these templates.
The text was updated successfully, but these errors were encountered: