Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BGP capabilities to the NAT GW #4285

Merged
merged 15 commits into from
Jul 29, 2024
Merged

Conversation

SkalaNetworks
Copy link
Contributor

@SkalaNetworks SkalaNetworks commented Jul 11, 2024

Pull Request

What type of this PR

Examples of user facing changes:

  • Introduces a flag to the BGP speaker to turn it into a "NAT GW mode", this makes the speaker announce EIPs linked to the GW the speaker is currently in. This disables announcement of Pod IPs/Services/Subnets by the speaker. Note: this mode can only be set inside a NAT GW as the pod will search for a specific new env variable in the GW to determine in which one it is running.
  • Introduces a new container to the NAT GW if the experimental BGP option is set in the NAT GW configmap, this container contains a BGP speaker that will announce EIPs added to the GW if they have the correct annotation

Which issue(s) this PR fixes

Fixes 4217

This PR is highly experimental, do not merge in master.

To test:

We need to have a NAD for the interface that the speaker will use to talk to the apiserver.
For that we follow the exact same pattern as for the vpc-dns

Create a NAD with access to kube-ovn default network:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: ovn-nad
  namespace: default
spec:
  config: '{
      "cniVersion": "0.3.0",
      "type": "kube-ovn",
      "server_socket": "/run/openvswitch/kube-ovn-daemon.sock",
      "provider": "ovn-nad.default.ovn"
    }'

Edit the ovn-default subnet to use the NAD as a provider.

Create new permissions for the gateway to poll the apiserver:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:vpc-nat-gw
rules:
  - apiGroups:
    - ""
    resources:
    - services
    - pods
    verbs:
    - list
    - watch
  - apiGroups:
    - kubeovn.io
    resources:
    - iptables-eips
    - subnets
    - vpc-nat-gateways
    verbs:
    - list
    - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: vpc-nat-gw
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:vpc-nat-gw
subjects:
- kind: ServiceAccount
  name: vpc-nat-gw
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vpc-nat-gw
  namespace: kube-system

Launch a NAT GW and mark its EIP with the BGP annotation:
ovn.kubernetes.io/bgp: "true"

@SkalaNetworks
Copy link
Contributor Author

This is in progress, I'll fix conflicts in a few hours, but you can at least see how I got the thing to work for now

@zhangzujian
Copy link
Member

Great job! 🎉

@zhangzujian
Copy link
Member

Please add Signed-off-by in the commit message and rebase the master branch.

<COMMIT MESSAGE>

Signed-off-by: Author Name <[email protected]>

@SkalaNetworks
Copy link
Contributor Author

Yeah I've done that but not pushed it yet, i'm seeing an awful lot of messages such as
"E0712 04:50:36.827065 7 vpc_nat_gateway.go:719] failed to ExecuteCommandInContainer, errOutput: Error: any valid address is expected rather than "dev".
failed to exec command "ip route replace default via dev net1"

in the controller logs, idk if my code broke something or not ever since i rebased the changes you've done yesterday on the nat gateway or if it's something else, I need to investigate

pkg/speaker/utils.go Fixed Show fixed Hide fixed
@SkalaNetworks
Copy link
Contributor Author

Ah, you modified a bunch of stuff in the shell script of the nat-gateway image and the pullPolicy is IfNotPresent, I had an old version running. This is now fixed, everything seems to be working.

I'll add a few lines to make the image used as a speaker a parameter, and find a way to configure the BGP neighbors. Do you think we should make this a parameter of the VpcNatGateway CRD or just something adjusted from the nat-gw configmap ? @zhangzujian

@zhangzujian
Copy link
Member

CRD would be better.

@SkalaNetworks
Copy link
Contributor Author

SkalaNetworks commented Jul 12, 2024

Edited a CRD to enable the bgp speaker per gw and to override the parameters sent to kube-ovn-speakers.

You guys can look at it and tell me if it's worthy of a merge!
I'll write the documentation in the other repo for that feature.

@zhangzujian zhangzujian added the feature New network feature label Jul 14, 2024
pkg/speaker/bgp.go Outdated Show resolved Hide resolved
pkg/speaker/bgp.go Outdated Show resolved Hide resolved
pkg/speaker/eip.go Outdated Show resolved Hide resolved
pkg/speaker/utils.go Outdated Show resolved Hide resolved
pkg/speaker/utils.go Outdated Show resolved Hide resolved
@SkalaNetworks
Copy link
Contributor Author

Hi, thanks for the review, I'll make the commits.
Also linking this PR: https://github.com/kubeovn/docs/pull/181, as gofumpt is not installed by default

@SkalaNetworks
Copy link
Contributor Author

Is the last test a flake?

@SkalaNetworks
Copy link
Contributor Author

@zhangzujian Hi, PR is now fully green!

@zhangzujian
Copy link
Member

E0720 08:07:53.667742      13 vpc_nat_gateway.go:294] failed to create statefulset 'vpc-nat-gw-gw1', err: StatefulSet.apps "vpc-nat-gw-gw1" is invalid: spec.template.annotations: Invalid value: ".kubernetes.io/routes": prefix part a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

Please fix it.

@SkalaNetworks
Copy link
Contributor Author

The error is caused by your configuration lacking the NAD Provider (same setup as the one needed by the VPC DNS), I was relying on the same configmap as the VPC-DNS one by mistake. Now the VPC NATGW configmap has a field to provide a networkattachementdefinition so that the speaker can actually talk with the API server.

See the PR's first message for the complete setup of the NAD.

You need the field apiNadProvider: ovn-nad.default.ovn in your ovn-vpc-nat-config configmap (or any other value that points to a NAD connected to the default subnet)

I added an error message to make the problem more explicit.

@SkalaNetworks
Copy link
Contributor Author

I fixed the conflict introduced by the latest commit on main

@SkalaNetworks
Copy link
Contributor Author

Well apparently there's also a new linter now, so I need to do some changes on the fmt part..

@zhangzujian
Copy link
Member

I'm kind of busy recently. Will review the commits later.

@zhangzujian
Copy link
Member

k8s.v1.cni.cncf.io/networks: kube-system/ovn-vpc-external-network, default/

Still fails. Please fix it.

@@ -747,6 +790,18 @@ func (c *Controller) genNatGwStatefulSet(gw *kubeovnv1.VpcNatGateway, oldSts *v1
util.LogicalSwitchAnnotation: gw.Spec.Subnet,
util.IPAddressAnnotation: gw.Spec.LanIP,
}

if gw.Spec.BgpSpeaker.Enabled { // Add an interface that can reach the API server
defaultSubnet, err := c.subnetsLister.Get(c.config.DefaultLogicalSwitch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have introduced configMap option apiNadProvider, we should use the nad instead of hard-coded default subnet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use function findSubnetByNetworkAttachmentDefinition instead of using the default subnet directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kubernetes apiserver runs in the default subnet doesn't it? The NAD just happens to be connected to that subnet. Do you want me to:

  • Lookup every subnet in the cluster
  • Determine which one has a provider equal to our NAD
  • Get its gateway

That can be an option

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kubernetes apiserver runs in the default subnet doesn't it?

K8s apiserver runs in control plane nodes with host network. It can be accessed from subnets in the default vpc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NAD can be pointed to ANY subnet which is running in the default vpc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, should kube-ovn have a default NAD installed in the default VPC for kube-dns and the NAT GW? That would be extremely handy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example I'm using:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: ovn-nad
  namespace: default
spec:
  config: '{
      "cniVersion": "0.3.0",
      "type": "kube-ovn",
      "server_socket": "/run/openvswitch/kube-ovn-daemon.sock",
      "provider": "ovn-nad.default.ovn"
    }'
---
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: vpc-apiserver-subnet
spec:
  protocol: IPv4
  cidrBlock: 100.100.100.0/24
  provider: ovn-nad.default.ovn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should kube-ovn have a default NAD installed in the default VPC for kube-dns and the NAT GW?

NAD resources can be created only when the CRD is installed in the cluster, while the CRD should be installed by users.

@SkalaNetworks
Copy link
Contributor Author

k8s.v1.cni.cncf.io/networks: kube-system/ovn-vpc-external-network, default/

Still fails. Please fix it.

New ConfigMap option:

apiVersion: v1
data:
  apiNadName: ovn-nad
  apiNadProvider: ovn-nad.default.ovn
  bgpSpeakerImage: docker.io/kubeovn/vpc-nat-gateway:v1.13.0
  image: docker.io/kubeovn/vpc-nat-gateway:v1.13.0
kind: ConfigMap
metadata:
  name: ovn-vpc-nat-config
  namespace: kube-system

This should fix it.
I don't really understand why I need to provide the "networks" by their NAD name, but configuration for routes etc... with the full provider name. Is there a specific reason.

return errors.New("no NetworkAttachmentDefinition provided to access apiserver, check configmap ovn-vpc-nat-config and field 'apiNadName'")
}

nad := fmt.Sprintf("%s/%s, %s/%s", c.config.PodNamespace, externalNetwork, corev1.NamespaceDefault, vpcNatAPINadName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the NAD namspace is always default, there is no need to set both apiNadName and apiNadProvider in the configMap since <apiNadProvider> equals <apiNadName>.default.ovn or <apiNadName>.default.

How about getting NAD name and namespace from the NAD provider?

@zhangzujian
Copy link
Member

I don't really understand why I need to provide the "networks" by their NAD name, but configuration for routes etc... with the full provider name. Is there a specific reason.

I am not very clear what you mean. What's the best implement in your mind?

@SkalaNetworks
Copy link
Contributor Author

I don't really understand why I need to provide the "networks" by their NAD name, but configuration for routes etc... with the full provider name. Is there a specific reason.

I am not very clear what you mean. What's the best implement in your mind?

I probably don't understand what providers are from the point of view of Kube-OVN.
To me, they basically symbolize a subnet attached to a NAD (so, to a specific interface in the Pod). The Kube-OVN API reference doc says the value of the "provider" field in the subnet field makes Kube-OVN find the NetworkAttachementDefinition.

Default value is ovn. In the case of multiple NICs, the value is <name>.<namespace> of the NetworkAttachmentDefinition, Kube-OVN will use this information to find the corresponding subnet resource

But my NAD has a provider called "ovn-nad.default.ovn" not ovn-nad.default, which makes me think Kube-OVN actually finds the NAD based on the name of the provider, not by decomposing it into name + namespace? Or am I mistaken?

@SkalaNetworks
Copy link
Contributor Author

Ok, seems like the "dot ovn" in the provider is a special case handled by "findSubnetByNetworkAttachmentDefinition"

@zhangzujian
Copy link
Member

But my NAD has a provider called "ovn-nad.default.ovn" not ovn-nad.default, which makes me think Kube-OVN actually finds the NAD based on the name of the provider, not by decomposing it into name + namespace? Or am I mistaken?

NAD with a suffix .ovn means both datapath and IPAM are provided by Kube-OVN. NAD without that suffix means the subnet is used only to provide IPAM for other CNI plugins.

@zhangzujian zhangzujian merged commit 2884484 into kubeovn:master Jul 29, 2024
61 checks passed
@oilbeater
Copy link
Collaborator

@SkalaNetworks can you add a doc to describe this feature?

@SkalaNetworks
Copy link
Contributor Author

I'll be doing it ASAP, I'm also adding the feature mentionned by @zhangzujian to make the NAD name/provider one parameter.

@SkalaNetworks
Copy link
Contributor Author

Related PR: #4352

zbb88888 pushed a commit that referenced this pull request Aug 13, 2024
zbb88888 pushed a commit that referenced this pull request Aug 13, 2024
Signed-off-by: SkalaNetworks <[email protected]>
Signed-off-by: bobz965 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New network feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Announce NAT-GW EIP over BGP
3 participants