-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to get pod logs from AKS cluster #97
Comments
This issue seems to be related : |
@nitinkhandelwal26 I see that you have closed this issue. Did you find the information that you need? |
Thanks @neilpeterson for support, no still I didn't got that information, but my issue got resolved by removing subnet level NSG, i still need to find out that.. if you have any information on that then it will be really helpful... |
We don't document any subnet level NSG specific port requirements that I'm aware of outside of our general egress guidance. Obviously AKS applies NSG rules to the NICs in your cluster, but if you're applying at the subnet level as you said, your responsibility is to ensure they don't interfere with normal healthy traffic. If you find a ruleset that works for you, do please share. I'm guessing you'll see your ruleset look similar to the documented ruleset in the link above + any additional configuration added to the cluster that demands even more openings. Out of curiosity, what specific problem are you looking to solve with added subnet-level NSG rules that the NIC-level NSG rules + egress via FW doesn't already solve for? |
@ckittel Our network team providing the HUB - Spoke has implemented this. |
Will share surely the ruleset once fixing that.. currently removed NSGs to make it work. thank you @ckittel for your support.. |
I do also have this issue. Kubelet (Port 10250) is not reachable from kube-apiserver. I can get pods but I can't access logs (timeout error as mentioned above). I added an Inbound Rule for Ports mentioned here https://docs.microsoft.com/en-us/azure/aks/limit-egress-traffic and here https://kubernetes.io/docs/reference/ports-and-protocols/ in the spoke-nodepools NSG. It's still not working. What's wrong with my approach? |
@Cogax thanks for reporting. We have a faq entry on this error, but it's not very detailed. We've seen some inconsistency in network rules depending on if your cluster is running konnectivity or not. It seems to depend on what region to you deploy into (as far as we can tell so far). I'm curious when you run
I just deployed this cluster two days ago and I can freely get logs (see Examples below) across various pods across the two node pools without that error. But I know we've run into the error you've seen before as well, so any help in triage will be appreciated. Let's start with konnectivity vs tunnelfront and see if that helps narrow this down. Also, what region did you deploy into? cc: @abossard Examples:
|
Thanks for your quick response. I found your FAQ article while researching this issue but it didn't help me. I opened all Inbound Traffic (Any Source, Destination, Port, etc) on alls NSG's but it had no effect. I removed the whole setup so I can't give you exact answers. I will recreate it later an check if the issue still exists. Some information I have at the moment:
"agentPoolProfiles": [
{
"name": "npsystem",
"count": 1,
"vmSize": "Standard_DS2_v2",
"osDiskSizeGB": 80,
"osDiskType": "Ephemeral",
"osType": "Linux",
"minCount": 1,
"maxCount": 1,
"vnetSubnetID": "[variables('vnetNodePoolSubnetResourceId')]",
"enableAutoScaling": true,
"type": "VirtualMachineScaleSets",
"mode": "System",
"scaleSetPriority": "Regular",
"scaleSetEvictionPolicy": "Delete",
"orchestratorVersion": "[parameters('kubernetesVersion')]",
"enableNodePublicIP": false,
"maxPods": 30,
"availabilityZones": ["1", "2", "3"],
"upgradeSettings": {
"maxSurge": "33%"
},
"nodeTaints": ["CriticalAddonsOnly=true:NoSchedule"]
},
{
"name": "npuser01",
"count": 1,
"vmSize": "Standard_DS3_v2",
"osDiskSizeGB": 120,
"osDiskType": "Ephemeral",
"osType": "Linux",
"minCount": 1,
"maxCount": 1,
"vnetSubnetID": "[variables('vnetNodePoolSubnetResourceId')]",
"enableAutoScaling": true,
"type": "VirtualMachineScaleSets",
"mode": "User",
"scaleSetPriority": "Regular",
"scaleSetEvictionPolicy": "Delete",
"orchestratorVersion": "[parameters('kubernetesVersion')]",
"enableNodePublicIP": false,
"maxPods": 30,
"availabilityZones": ["1", "2", "3"],
"upgradeSettings": {
"maxSurge": "33%"
}
}
kubectl apply -f kubernetes/new-aks/cluster-manifests/kube-system/container-azm-ms-agentconfig.yaml
kubectl apply -f kubernetes/new-aks/cluster-manifests/cluster-baseline-settings/kured.yaml
kubectl apply -f kubernetes/new-aks/cluster-manifests/cluster-baseline-settings/aad-pod-identity.yaml
kubectl apply -f kubernetes/new-aks/cluster-manifests/a0008/ingress-network-policy.yaml
Hope that helps. I will recreate the whole setup I did and update this issue later. |
I think we're on to something. We made a change to this repo to migrate to konnectivity's network rulesets (basically all over 443) instead what tunnelfront/aks-link required (see #199 - specifically the removal of the rules in See related conversation happening @ #223 where @brk3 had a similar observation (also in |
Encountered the problem to get pod logs until I allow the node IPs to access port 9000 of API server in hub firewall network rule. This is documented in below. I would suggest amending the ARM template hub-regionA.json for that. https://docs.microsoft.com/en-us/azure/aks/limit-egress-traffic#azure-global-required-network-rules |
@ccyflai -- can I ask what region you were deploying to? Just want to see if the pattern continues to emerge here. Glad you added that extra firewall rule to proceed. Don't forget to remove it once konnectivity is used within your cluster, as it won't be necessary anymore. |
I deployed in southeastasia. |
It looks like konnectivity is rolling out more broadly now. Since the egress affordances for aks-link have been replaced with the simplified egress rules found in this reference implementation for konnectivity, I'm going to close this issue. But if your region doesn't use konnectivity, then the conversation above will help. It's just a matter of timing between the two, unfortunately. |
Hello Team,
Try to get the pod log using :
kubectl logs csi-secrets-store-9wr95 -n cluster-baseline-settings -c secrets-store
and getting below output:
Error from server: Get https://aks-npuser01-42213062-vmss00000c:10250/containerLogs/cluster-baseline-settings/csi-secrets-store-9wr95/s
ecrets-store: dial tcp 10.10.128.197:10250: i/o timeout
Where in portal its showing like
Don't know why I am unable to fetch logs from any pods, Can you please help me here..
The text was updated successfully, but these errors were encountered: