Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin type="calico" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized #9295

Closed
ClaudZen opened this issue Sep 30, 2024 · 5 comments

Comments

@ClaudZen
Copy link

Hello, I have deployed a MicroK8s cluster with 3 master nodes and 3 worker nodes. We are using MicroCeph for persistence, but periodically, certain pods on worker nodes are unable to be deleted or created due to an authorization error related to the Calico plugin.

image

Expected Behavior

Pods should be created and deleted on worker nodes at any time without encountering Calico-related authorization errors.

Current Behavior

Periodically, pods running on worker nodes cannot be evicted or scheduled due to the Calico authorization issue.

Possible Solution

The only temporary solution I've found is to delete the Calico pods associated with the worker nodes, which resolves the issue temporarily.

Steps to Reproduce (for bugs)

I'm not sure how to consistently reproduce the bug.

Context

Here are the solutions I have tried so far:

I upgraded Calico from version 3.24.5 to 3.27.4 and updated the cluster role binding using this manifest https://github.com/ClaudZen/calico-problem/blob/main/cni.yaml, but the error still occurs periodically.

I also noticed that the issue affects the /etc/nci/net.d/calico-kubeconfig file, which stops updating. As a result, the token becomes outdated, causing the problem. When I restart the Calico pod on the worker node, the file gets updated with a new token.

image

This is the microk8s inspect report for one affected worker node:
inspection-report-20240930_202905.tar.gz

These are my resources:

Your Environment

  • Calico version: 3.27.4
  • Orchestrator version (e.g. Kubernetes, Mesos, rkt): MicroK8s 1.29.9
  • Operating System and version: Ubuntu 22.04.4 LTS (Jammy Jellyfish)
  • MicroCeph is deployed on the 3 master nodes
  • Master nodes specs: 4 cores, 60 GB disk, 100 GB disk, 8 GB RAM
  • Worker nodes specs: 4 cores, 60 GB disk, 16 GB RAM

Request for Help

I would appreciate any guidance or help in understanding why this issue with Calico token authorization occurs periodically and how it might be permanently resolved. Specifically, I am interested in learning:

  • How to ensure the /etc/cni/net.d/calico-kubeconfig file remains up-to-date without needing to restart the Calico pods.
  • If there are any best practices for managing token updates with Calico to avoid outdated tokens.
  • Any known issues related to this behavior in Calico or MicroK8s, and if there are fixes or configurations I should be aware of.

Thank you in advance for any help or insights you can provide.

@nicjohnson145
Copy link

I'm also pretty consistently running into this issue as well.

@caseydavenport
Copy link
Member

For anyone hitting this, there are a number of potentially related issues (or at least issues with similar symptoms) to look at (e.g., #9271)

This comment summarizes some of the things to check: #8368 (comment)

@caseydavenport
Copy link
Member

@ClaudZen based on your description, I think this might be something different than the resolution for the linked issue though - you don't seem to have any RBAC resources in the process of being deleted, and you are not installing using the tigera/operator.

In general, I'd highly recommend using the tigera/operator to manage Calico - it should provide a better experience and is where the majority of our testing and development is focused.

Some things to check:

@caseydavenport
Copy link
Member

@ClaudZen @nicjohnson145 any additional info on your end?

@tomastigera
Copy link
Contributor

Feel free to reopen if you managed to narrow the issue down and/or have additional information like tcpdumps, logs, iptables dumps etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants