Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump Calico image version #2878

Merged
merged 12 commits into from
Feb 16, 2022
Merged

Bump Calico image version #2878

merged 12 commits into from
Feb 16, 2022

Conversation

joedborg
Copy link
Contributor

@joedborg joedborg commented Jan 24, 2022

Introduce a mechanism to upgrade the calico CNI and upgrade to 3.21.4.

Also in this PR:

  • Removal of the offloading the apiserver kicker does
  • Unit testing called by GH action -> tox -e unit

@joedborg joedborg self-assigned this Jan 24, 2022
@ktsakalozos
Copy link
Member

Thank you for this PR @joedborg.

With this PR we want to also remove the TCP offloading we have in place to mitigate the issue in [1]. We will need to update the apiserver-kicker not to do any offloading. ALso please investigate if we need to configure calico in a different way. We need to do this update because the code we have for offloading seems to be freezing the apiserver on arm64/rpi.

[1] projectcalico/calico#3145
[2] https://github.com/ubuntu/microk8s/blob/master/microk8s-resources/wrappers/apiservice-kicker

Copy link
Member

@ktsakalozos ktsakalozos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comments.

@ktsakalozos
Copy link
Member

Also @joedborg please take the calico manifest from [1] and apply the changes described in [2]. There are new sections introduced in the calico manifest so just updating the image tags might be enough.

[1] https://projectcalico.docs.tigera.io/archive/v3.19/getting-started/kubernetes/self-managed-onprem/onpremises
[2] https://github.com/ubuntu/microk8s/blob/master/docs/build.md

@joedborg
Copy link
Contributor Author

Let's see if that builds.

@joedborg joedborg requested a review from ktsakalozos January 24, 2022 21:05
@ktsakalozos
Copy link
Member

This is better @joedborg, we need however to also take into account the clusters that get refreshed and are still running the the 3.19.1 calico. In such clusters the TCP offloading should be kept in place. Only new clusters should have the offloading disabled.

@joedborg
Copy link
Contributor Author

This is better @joedborg, we need however to also take into account the clusters that get refreshed and are still running the the 3.19.1 calico. In such clusters the TCP offloading should be kept in place. Only new clusters should have the offloading disabled.

In this case, should we keep offloading or check the calico version in the refresh hook (and upgrade calico to 3.19.3)?

@ktsakalozos
Copy link
Member

I think it would be best to try to upgrade calico, it is a patch release so there should not be any danger. The challenge in upgrading is how would we do this refresh as at the time the refresh (or configure hooks) kubernetes may be stopped (eg via microk8s stop). Maybe we could remove the ${SNAP_DATA}/var/lock/cni-loaded lock, replace the cni.yaml in ${SNAP_DATA}/args/cni-network/cni.yaml and let the apiservice-kicker to re-apply the cni manifest when the API server comes up. In this case however we would need to parse the old cni to see if we have set the "can-reach" [2] to the new cni manifest.

[1] https://github.com/ubuntu/microk8s/blob/master/microk8s-resources/wrappers/apiservice-kicker#L86
[2] https://github.com/ubuntu/microk8s/blob/master/scripts/cluster/common/utils.py#L273

@MBcom MBcom mentioned this pull request Jan 31, 2022
@joedborg joedborg marked this pull request as draft February 9, 2022 20:38
@joedborg
Copy link
Contributor Author

joedborg commented Feb 9, 2022

Have added upgrade-scripts/003-calico-3-19-3/patch-cni-yaml.py, which actually patches the cni.yaml file. Will need to add the shell scripts / hooks in place to remove the empty document (---\n---) and run this. @ktsakalozos, is there any docs for how these hooks work / which to use?

@ktsakalozos
Copy link
Member

@joedborg changing the tag numbers is not what we want to do for this upgrade. See #2878 (comment)

We have to start with the calico manifest from upstream and update the IP_AUTODETECTION_METHOD [1] to whatever is currently used.

Could we please also move directly to v3.latest so we match the version charmed k8s has?

The upgrade scripts are used during the ha-cluster upgrade [2].

[1] https://github.com/ubuntu/microk8s/blob/master/upgrade-scripts/000-switch-to-calico/resources/calico.yaml#L3941
[2] https://github.com/canonical/microk8s-addons/blob/main/addons/ha-cluster/enable#L33

@ktsakalozos ktsakalozos marked this pull request as ready for review February 15, 2022 09:53
Copy link
Contributor

@neoaggelos neoaggelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calico-related work LGTM.

One code-related comment: upgrade-scripts (up to this point) use a specific format of "hooks" ({commit,prepare,rollback}-{node,master}.sh). I would strongly prefer that either 003-calico-upgrade follow the same scheme, or be moved to a separate location entirely.

Copy link
Contributor

@neoaggelos neoaggelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Tested:

  • Refresh to this snap from 1.23 (single-node + cluster. for clusters, refreshing a worker node has no effect. calico is only updated when the first control plane node gets the update)
  • Fresh install

tox.ini Outdated
@@ -23,7 +23,7 @@ commands =
black --diff --check --exclude "/(\.eggs|\.git|\.tox|\.venv|\.build|dist|charmhelpers|mod)/" .

[testenv:unit]
commands = pytest -s upgrade-scripts/003-upgrade-calico/test-upgrade-calico-cni.py
commands = pytest -s scripts/upgrade-calico/test-upgrade-calico-cni.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; The test file could be moved under a tests/unit folder (to avoid clutter in the result snap build).

@ktsakalozos ktsakalozos force-pushed the MK-174/calico-upgrade branch from 2877844 to c0bc33e Compare February 16, 2022 16:40
@ktsakalozos ktsakalozos merged commit 857a200 into master Feb 16, 2022
@ktsakalozos ktsakalozos deleted the MK-174/calico-upgrade branch February 16, 2022 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants