plugin type="calico" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized #9295

ClaudZen · 2024-09-30T21:00:57Z

Hello, I have deployed a MicroK8s cluster with 3 master nodes and 3 worker nodes. We are using MicroCeph for persistence, but periodically, certain pods on worker nodes are unable to be deleted or created due to an authorization error related to the Calico plugin.

Expected Behavior

Pods should be created and deleted on worker nodes at any time without encountering Calico-related authorization errors.

Current Behavior

Periodically, pods running on worker nodes cannot be evicted or scheduled due to the Calico authorization issue.

Possible Solution

The only temporary solution I've found is to delete the Calico pods associated with the worker nodes, which resolves the issue temporarily.

Steps to Reproduce (for bugs)

I'm not sure how to consistently reproduce the bug.

Context

Here are the solutions I have tried so far:

I upgraded Calico from version 3.24.5 to 3.27.4 and updated the cluster role binding using this manifest https://github.com/ClaudZen/calico-problem/blob/main/cni.yaml, but the error still occurs periodically.

I also noticed that the issue affects the /etc/nci/net.d/calico-kubeconfig file, which stops updating. As a result, the token becomes outdated, causing the problem. When I restart the Calico pod on the worker node, the file gets updated with a new token.

This is the microk8s inspect report for one affected worker node:
inspection-report-20240930_202905.tar.gz

These are my resources:

Your Environment

Calico version: 3.27.4
Orchestrator version (e.g. Kubernetes, Mesos, rkt): MicroK8s 1.29.9
Operating System and version: Ubuntu 22.04.4 LTS (Jammy Jellyfish)
MicroCeph is deployed on the 3 master nodes
Master nodes specs: 4 cores, 60 GB disk, 100 GB disk, 8 GB RAM
Worker nodes specs: 4 cores, 60 GB disk, 16 GB RAM

Request for Help

I would appreciate any guidance or help in understanding why this issue with Calico token authorization occurs periodically and how it might be permanently resolved. Specifically, I am interested in learning:

How to ensure the /etc/cni/net.d/calico-kubeconfig file remains up-to-date without needing to restart the Calico pods.
If there are any best practices for managing token updates with Calico to avoid outdated tokens.
Any known issues related to this behavior in Calico or MicroK8s, and if there are fixes or configurations I should be aware of.

Thank you in advance for any help or insights you can provide.

The text was updated successfully, but these errors were encountered:

nicjohnson145 · 2024-10-03T01:34:35Z

I'm also pretty consistently running into this issue as well.

caseydavenport · 2024-10-04T16:09:54Z

For anyone hitting this, there are a number of potentially related issues (or at least issues with similar symptoms) to look at (e.g., #9271)

This comment summarizes some of the things to check: #8368 (comment)

caseydavenport · 2024-10-04T16:15:24Z

@ClaudZen based on your description, I think this might be something different than the resolution for the linked issue though - you don't seem to have any RBAC resources in the process of being deleted, and you are not installing using the tigera/operator.

In general, I'd highly recommend using the tigera/operator to manage Calico - it should provide a better experience and is where the majority of our testing and development is focused.

Some things to check:

Is Calico getting throttled by the apiserver? (calico-node cannot refresh expired serviceaccount token due to apiserver throttling #7694)
Is your cluster configured to use token projection for service account tokens? (https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#serviceaccount-token-volume-projection)
Have you made any changes in your calico.yaml from the upstream manifests?
Are there any error / warning logs from the calico-node pod on the affected node?

caseydavenport · 2024-10-22T16:25:17Z

@ClaudZen @nicjohnson145 any additional info on your end?

tomastigera · 2024-11-05T17:34:34Z

Feel free to reopen if you managed to narrow the issue down and/or have additional information like tcpdumps, logs, iptables dumps etc.

caseydavenport added the kind/support label Oct 22, 2024

tomastigera closed this as completed Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plugin type="calico" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized #9295

plugin type="calico" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized #9295

ClaudZen commented Sep 30, 2024

nicjohnson145 commented Oct 3, 2024

caseydavenport commented Oct 4, 2024

caseydavenport commented Oct 4, 2024

caseydavenport commented Oct 22, 2024

tomastigera commented Nov 5, 2024

plugin type="calico" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized #9295

plugin type="calico" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized #9295

Comments

ClaudZen commented Sep 30, 2024

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

Request for Help

nicjohnson145 commented Oct 3, 2024

caseydavenport commented Oct 4, 2024

caseydavenport commented Oct 4, 2024

caseydavenport commented Oct 22, 2024

tomastigera commented Nov 5, 2024