Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dual-stack support in host-local IPAM + Canal #5313

Merged

Conversation

manuelbuil
Copy link
Contributor

@manuelbuil manuelbuil commented Dec 20, 2021

Signed-off-by: Manuel Buil [email protected]

Description

This PR provides dual-stack support for canal. It requires three changes in the canal config/manifest:

Change 1

The ipam section of the cni_network_config for dual-stack must be defined like this:

          "ipam": {
              "type": "host-local",
              "ranges": [
                  [
                      {
                          "subnet": "usePodCidr"
                      }
                  ],
                  [
                      {
                          "subnet": "usePodCidrIPv6"
                      }
                  ]
             ]
          },

The effective changes are:
a) the "subnet": "usePodCidr" moves inside the ranges slice
b) we define a new subnet called usePodCidrIPv6

This PR does not break the current way of deploying canal for ipv4 single-stack. If you want dual-stack, you'll need to apply this config change. If not, you are fine with the current config.

Change 2

The flannel image of the kube-flannel container, must move to, at least, version 0.15 which support dual-stack. For example:
image: quay.io/coreos/flannel:v0.15.1

Change 3

We must activate the dual-stack feature in flannel. To do so, in the flannel net-conf.json, we must add the following fields:

"IPv6Network": "2001:dada:42:0::/56",
"EnableIPv6": true,

The IPv6Network can be anything. We will anyway read the CIDR assigned to the node

In the code, the essential change is that we are now asking for node.Spec.PodCIDRs instead of node.Spec.PodCIDR. As a consequence, we will receive a slice of strings, which will have length=1 in the single-stack case and length=2 in the dual-stack case. There is a new function getIPsByFamily that reads the podCidrs and returns the ipv4 and the ipv6 cidr. The former substitutes "subnet": "usePodCidr" in the host-local ipam config. The latter substitutes "subnet": "usePodCidrIPv6" in the host-local ipam config.

Moreover, we use a dummy ipv4 and ipv6 address to release the host-local IP and a test is included with usePodCidrIPv6

I did not change anything in UpdateHostLocalIPAMDataForWindows because dual-stack is not supported in k8s for vxlan scenarios in Windows: https://kubernetes.io/docs/setup/production-environment/windows/intro-windows-in-kubernetes/#ipv6-networking (added a comment about this)

Related issues/PRs

Todos

  • Tests
  • Documentation
  • Release note

Release Note

None required

@manuelbuil manuelbuil requested a review from a team as a code owner December 20, 2021 16:02
@marvin-tigera marvin-tigera added this to the Calico v3.22.0 milestone Dec 20, 2021
@marvin-tigera marvin-tigera added docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact labels Dec 20, 2021
@CLAassistant
Copy link

CLAassistant commented Dec 20, 2021

CLA assistant check
All committers have signed the CLA.

@caseydavenport caseydavenport self-assigned this Dec 20, 2021
@caseydavenport
Copy link
Member

/sem-approve

@caseydavenport
Copy link
Member

This looks lovely, thank you @manuelbuil!

Fixes #5188

@caseydavenport
Copy link
Member

Looks like a static check failure:

internal/pkg/utils/utils.go:358:3: ineffectual assignment to `subnet` (ineffassign)04:30
		subnet = ipv6Cidr04:30
		^04:30
../lib.Makefile:531: recipe for target 'golangci-lint' failed

@manuelbuil
Copy link
Contributor Author

Looks like a static check failure:

internal/pkg/utils/utils.go:358:3: ineffectual assignment to `subnet` (ineffassign)04:30
		subnet = ipv6Cidr04:30
		^04:30
../lib.Makefile:531: recipe for target 'golangci-lint' failed

Thanks, I see why

@manuelbuil
Copy link
Contributor Author

Is it normal for the CI to take more than 5 hours to report the result?

@fasaxc
Copy link
Member

fasaxc commented Dec 21, 2021

/sem-approve

@fasaxc
Copy link
Member

fasaxc commented Dec 21, 2021

Is it normal for the CI to take more than 5 hours to report the result?

CI only runs once a team member approves the PR to run.

@manuelbuil
Copy link
Contributor Author

Is it normal for the CI to take more than 5 hours to report the result?

CI only runs once a team member approves the PR to run.

Ah! Thanks for the info

@manuelbuil
Copy link
Contributor Author

I need some help to fix the unit tests because I don't know what is going on :(. 5 tests are failing with:

Unexpected error:
      <*types.Error | 0xc000c56420>: {
          Code: 999,
          Msg: "nodes \"localhost.localdomain\" not found",
          Details: "",
      }
      nodes "localhost.localdomain" not found
  occurred^

The 6th test that fails is the one I added. It fails because there is no dual-stack k8s cluster and thus there is no ipv6 cidr to replace "subnet": "usePodCidrIPv6". I tried adding an ipv6 cidr in --cluster-cidr= in https://github.com/projectcalico/calico/blob/master/cni-plugin/Makefile#L242 but I still get the error

@manuelbuil
Copy link
Contributor Author

After spending some hours digging the first error, I still did not find what is going on. I found that the error appears here:
https://github.com/projectcalico/calico/blob/master/cni-plugin/internal/pkg/testutils/utils_linux.go#L241, i.e.

r, err = invoke.ExecPluginWithResult(context.Background(), pluginPath, []byte(netconf), args, nil)

And the error is &types.Error{Code:0x3e7, Msg:"nodes \"localhost.localdomain\" not found", Details:""}

pluginPath is /go/src/github.com/projectcalico/calico/cni-plugin/bin/amd64/calico
netconf is:

"\n\t\t{\n\t\t  \"cniVersion\": \"0.3.1\",\n\t\t  \"name\": \"net1\",\n\t\t  \"type\": \"calico\",\n\t\t  \"etcd_endpoints\": \"http://192.168.1.14:2379\",\n\t\t  \"datastore_type\": \"etcdv3\",\n\t\t  \"ipam\": {\n\t\t    \"type\": \"host-local\",\n\t\t    \"subnet\": \"10.0.0.0/8\"\n\t\t  },\n\t\t  \"kubernetes\": {\n\t\t    \"kubeconfig\": \"/home/user/certs/kubeconfig\"\n\t\t  },\n\t\t  \"policy\": {\"type\": \"k8s\"},\n\t\t  \"nodename_file_optional\": true,\n\t\t  \"log_level\":\"info\"\n\t\t}"

args is:

&testutils.cniArgs{Env:[]string{"CNI_COMMAND=ADD", "CNI_IFNAME=eth0", "CNI_PATH=/go/src/github.com/projectcalico/calico/cni-plugin/bin/amd64", "CNI_CONTAINERID=cnitest-88", "CNI_NETNS=/var/run/netns/cnitest-88b68c82-2037-92e6-2887-1859ec3651d7", "CNI_ARGS=K8S_POD_NAME=run1570682536;K8S_POD_NAMESPACE=test;K8S_POD_INFRA_CONTAINER_ID=whatever"}}

I added a log in func cmdAdd(args *skel.CmdArgs) but I don't see it being executed. So somewhere in between fails. I lost track of the error in github.com/containernetworking/cni/pkg/invoke

@caseydavenport caseydavenport added release-note-required Change has user-facing impact (no matter how small) and removed release-note-not-required Change has no user-facing impact labels Dec 21, 2021
@caseydavenport
Copy link
Member

@manuelbuil I'll take a look and let you know what I find!

@@ -721,6 +721,52 @@ var _ = Describe("Kubernetes CNI tests", func() {
numIPv4IPs: 1,
numIPv6IPs: 1,
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario is failing because the node created as part of this set of tests doesn't have dual-stack CIDRs present: https://github.com/projectcalico/calico/blob/master/cni-plugin/tests/calico_cni_k8s_test.go#L859-L865

We probably want a new section for dual-stack tests so that we don't reduce coverage for the legacy / single-stack case. i.e., we want tests for both PodCIDR and PodCIDRs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried something :)

@caseydavenport
Copy link
Member

@manuelbuil I think I figured out what is going on - stuck some comments on the PR in the relevant places. Let me know if you have any questions!

@manuelbuil manuelbuil force-pushed the canal-ipam-dualStack branch 2 times, most recently from a4c3cb0 to 2d96056 Compare December 22, 2021 15:27
@fasaxc
Copy link
Member

fasaxc commented Dec 22, 2021

/sem-approve

@manuelbuil
Copy link
Contributor Author

manuelbuil commented Dec 22, 2021

@manuelbuil I think I figured out what is going on - stuck some comments on the PR in the relevant places. Let me know if you have any questions!

Tests are finally running locally but in CI it fails with:

  �[91mUnexpected error:
      <*types.Error | 0xc002195740>: {
          Code: 999,
          Msg: "failed to allocate for range 0: 10.0.0.7 has been allocated to cnitest-51, duplicate allocation is not allowed",
          Details: "",
      }
      failed to allocate for range 0: 10.0.0.7 has been allocated to cnitest-51, duplicate allocation is not allowed
  occurred�[0m
�[91m�[1mSummarizing 1 Failure:�[0m

Any idea? Maybe I should change the range in the second set of tests? Let me try that

@caseydavenport
Copy link
Member

/sem-approve

@manuelbuil
Copy link
Contributor Author

@manuelbuil I think I figured out what is going on - stuck some comments on the PR in the relevant places. Let me know if you have any questions!

Tests are finally running locally but in CI it fails with:

  �[91mUnexpected error:
      <*types.Error | 0xc002195740>: {
          Code: 999,
          Msg: "failed to allocate for range 0: 10.0.0.7 has been allocated to cnitest-51, duplicate allocation is not allowed",
          Details: "",
      }
      failed to allocate for range 0: 10.0.0.7 has been allocated to cnitest-51, duplicate allocation is not allowed
  occurred�[0m
�[91m�[1mSummarizing 1 Failure:�[0m

Any idea? Maybe I should change the range in the second set of tests? Let me try that

It did not help. Same error again:

  �[91mUnexpected error:
      <*types.Error | 0xc000733d40>: {
          Code: 999,
          Msg: "failed to allocate for range 0: 10.0.0.10 has been allocated to cnitest-86, duplicate allocation is not allowed",
          Details: "",
      }
      failed to allocate for range 0: 10.0.0.10 has been allocated to cnitest-86, duplicate allocation is not allowed
  occurred�[0m

Any idea? Thanks!

@caseydavenport
Copy link
Member

@manuelbuil I'll take a look a this today - sorry for the delay, have been on vacation! Happy new year :)

@caseydavenport
Copy link
Member

@manuelbuil I think this might be a result of a flake in the gRPC test that is failing that has been fixed in master - could you try rebasing this PR against master and we'll see if the test passes?

@manuelbuil
Copy link
Contributor Author

@manuelbuil I'll take a look a this today - sorry for the delay, have been on vacation! Happy new year :)

No worries, I was on vacation as well :). Happy new year to you too!

@manuelbuil manuelbuil force-pushed the canal-ipam-dualStack branch from b5b491b to 44cd28c Compare January 5, 2022 08:56
@fasaxc
Copy link
Member

fasaxc commented Jan 5, 2022

/sem-approve

@manuelbuil manuelbuil force-pushed the canal-ipam-dualStack branch from 44cd28c to 32d008e Compare January 5, 2022 16:57
@fasaxc
Copy link
Member

fasaxc commented Jan 5, 2022

/sem-approve

@caseydavenport
Copy link
Member

Looks like the tests are happy now, hooray!

@caseydavenport caseydavenport merged commit 6aea48e into projectcalico:master Jan 6, 2022
@manuelbuil manuelbuil deleted the canal-ipam-dualStack branch January 7, 2022 07:57
@caseydavenport caseydavenport changed the title Add dual-stack support in Canal Add dual-stack support in host-local IPAM + Canal Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required Docs not required for this change release-note-required Change has user-facing impact (no matter how small)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants