Support Egress using IPs from a separate subnet #5799

tnqn · 2023-12-14T16:56:47Z

By default, it's assumed that the IPs allocated from the pool are in the
same subnet as the Node IPs. In some cases, users want to use IPs in
different subnets as Egress IPs. Additionally, users may want to use
VLAN tagging to segment the Egress traffic and the Node traffic.

The commit implements the requirements by introducing an optional field,
subnetInfo, to the ExternalIPPool resource. The subnetInfo field
contains the subnet attributes of the IPs in this pool. When using a
different subnet:

gateway and prefixLength must be set. Antrea will route Egress
traffic to the specified gateway when the destination is not in the
same subnet of the Egress IP, otherwise route it to the destination
directly.
Optionally, you can specify vlan if the underlying network is
expecting it. Once set, Antrea will tag Egress traffic leaving the
Egress Node with the specified VLAN ID. Correspondingly, it's
expected that reply traffic towards these Egress IPs are also tagged
with the specified VLAN ID when arriving the Egress Node.

The implementation involves VLAN sub-interfaces and policy routing.

For a given subnet with a VLAN ID, a separate VLAN sub-interface will
be created to hold the Egress IPs allocated from it. Egress traffic
and its reply traffic will be sent over and received from the VLAN
sub-interface for proper tagging and untagging.
For a given subnet, a separate route table will be created, routing
the selected Egress traffic to the specified gateway, or to its
neighbor.
For multiple Egress IPs associated allocated from the same subnet, a
separate IP rule will be created for each Egress IP, matching its pkt
mark and looking up the shared route table.

The feature is gated by the alpha "EgressSeparateSubnet" feature gate.

antoninbas · 2023-12-19T17:53:19Z

typos in PR description:
s/users may want to use VLAN taggaing/users may want to use VLAN tagging
s/when the destination is not in the same subnet of the Egress IP/when the destination is not in the same subnet as the Egress IP
s/For multiple Egress IPs associated allocated from the same subnet/For multiple Egress IPs allocated from the same subnet
s/reply traffic towards these Egress IPs are also tagged with the specified VLAN ID when arriving the Egress Node./reply traffic towards these Egress IPs is also tagged with the specified VLAN ID when arriving at the Egress Node.

antoninbas

initial review, I am not completely done yet

antoninbas · 2023-12-19T18:00:32Z

pkg/apis/crd/v1beta1/types.go

@@ -210,6 +210,8 @@ type ExternalIPPool struct {
 type ExternalIPPoolSpec struct {
 	// The IP ranges of this IP pool, e.g. 10.10.0.0/24, 10.10.10.2-10.10.10.20, 10.10.10.30-10.10.10.30.
 	IPRanges []IPRange `json:"ipRanges"`
+	// The Subnet info of this IP pool. If set, all IP ranges in the IP pool should share the same subnet attributes.
+	SubnetInfo *SubnetInfo `json:"subnetInfo,omitempty"`


Since ExternalIPPool is used by other features besides Egress (at least by ServiceExternalIP), will this field ever be relevant for these other features? For example, VLAN tagging / untagging for external Service traffic?

Should the documentation mention that this field is only used when an IP is allocated from the pool for the Egress feature, and is ignored otherwise?

I haven't verified whether it can be used by ServiceExternalIP but likely infeasible because ServiceExternalIP requires the IP to be not assigned to the Node physically, which can't work with VLAN sub-interface according to my experiments when implementing this feature.

I have added a comment to the field about this.

Do you think better to reuse SubnetIPRange to be consistent with IPPool?

I considered this in the beginning but found some deficiencies in this way:

With a list of SubnetIPRange in a single pool, it‘s possible to have different subnets in a pool, which makes implementation complex and inefficient: for example, when we want to know the subnet info of an IP, we have to loop over all ranges to figure it out; whenever there is a change in any of the IP ranges, we have to resync all resources associated with IPs allocated from it, increasing the span of the a single pool.

It may be unclear to users that they should create a pool for each subnet or a pool for all subnets. From API's perspective, one object for one resource is clearer and simpler.

It's redundant to fill the same subnet info when multiple ip ranges sharing the subnet.

I talked about this in the community meeting https://www.youtube.com/watch?v=q3vaVzzH6WM, starting from 20:42.

With your points, should we update IPPool to have a single SubnetInfo?

Yes, I discussed with @gran-vmv, I think there was no specific reason why IPPool was made so, we can update it when bumping it up to beta.

antoninbas · 2023-12-19T18:10:41Z

pkg/apis/crd/v1beta1/types.go

+	// Gateway IP for this subnet, e.g. 10.10.1.1.
+	Gateway string `json:"gateway"`
+	// Prefix length for the subnet, e.g. 24.
+	PrefixLength int32 `json:"prefixLength"`


I was wondering if it would be more natural for people to explicitly have to provide a CIDR for the subnet and a separate IP address (part of the CIDR) for the gateway, as opposed to deducing the subnet from the gateway IP + prefix length. This may be more in line with how people are used to interact with ip tools? But I don't have a strong opinion.

I refered to SubnetInfo in IPPool for this, though not completely. Let me check how the struct was determined and how such configuration was provided in other systems.

It seems when IPPool was designed, it just asked the least information from users to construct a subnet. Asking a subnet may be a bit tedious for users as they may have provided the same information in ipRanges once. For example, the 2nd config below would look strange, and we need have one more validation to ensure the provided subnet matches the ip ranges.

ipRanges: - cidr: 10.10.0.0/24 subnetInfo: gateway: 10.10.0.1 prefixLength: 24 vlan: 10

ipRanges: - cidr: 10.10.0.0/24 subnetInfo: gateway: 10.10.0.1 subnet: 10.10.0.0/24 vlan: 10

antoninbas · 2023-12-19T18:16:03Z

build/charts/antrea/crds/externalippool.yaml

+                    prefixLength:
+                      type: integer
+                      minimum: 1
+                      maximum: 128


nit: should it be 127? I think you reject 128 for IPv6 addresses is the validation webhook

antoninbas · 2023-12-19T18:22:18Z

pkg/controller/externalippool/validate.go

+			start := net.ParseIP(ipRange.Start)
+			end := net.ParseIP(ipRange.End)


out of curiosity, do we ever validate that start <= end?

Guess no, will add it via another PR, I found some other issues in validation when implementing this.

antoninbas · 2023-12-19T18:23:34Z

pkg/controller/externalippool/validate_test.go

+			},
+		},
+		{
+			name: "Adding unmatched SubnetInfo should not be allowed",


the name of this test case is invalid, this is a valid case as shown by the expectedResponse

thanks for catching it

antoninbas · 2023-12-19T18:26:20Z

pkg/agent/controller/egress/egress_controller.go

+	minEgressTable = 101
+	maxEgressTable = 120


nit: could we add Route to the name, e.g. minEgressRouteTable?

antoninbas · 2023-12-19T18:28:12Z

pkg/agent/controller/egress/egress_controller.go

+	// minEgressTable to maxEgressTable are the route table IDs that can be configured on a Node for Egress traffic.
+	// Each distinct subnet uses one route table. 20 subnets should be enough.


Do we allocate any other table ID for any other feature? I don't think we do, except maybe for policyOnly mode, which doesn't support Egress anyway. Just wondering if we should define this in a "central" place, like we do for packet marks, OVS registers, etc

Good idea, moved it to a common package., will see if there are other usages and move them to the same file if yes via another PR.

antoninbas · 2023-12-19T18:31:32Z

pkg/agent/controller/egress/egress_controller.go

@@ -70,6 +70,11 @@ const (
 	// maxEgressMark is the maximum mark of Egress IPs can be configured on a Node.
 	maxEgressMark = 255

+	// minEgressTable to maxEgressTable are the route table IDs that can be configured on a Node for Egress traffic.
+	// Each distinct subnet uses one route table. 20 subnets should be enough.


I know the limit is per Node theoretically, but I was wondering if we should mention in the Egress documentation that we recommend staying under 20 subnets across all ExternalIPPools (when explicitly providing subnetInfo).

Sure, added a note in egress.md:

Currently, the maximum number of different subnets that can be supported in a cluster is 20, which should be sufficient for most cases. If you need to have more subnets, please raise an issue with your use case, and we will consider revising the limit based on that.

antoninbas · 2023-12-19T18:32:40Z

docs/egress.md

+* Optionally, you can specify `vlan` if the underlying network is expecting it.
+Once set, Antrea will tag Egress traffic leaving the Egress Node with the
+specified VLAN ID. Correspondingly, it's expected that reply traffic towards
+these Egress IPs are also tagged with the specified VLAN ID when arriving the


s/are also tagged/is also tagged
s/when arriving the Egress Node/when arriving at the Egress Node

antoninbas · 2023-12-19T18:39:38Z

pkg/agent/controller/egress/egress_controller.go

+	if pool.Spec.SubnetInfo == nil {
+		return
+	}
+	c.queue.Add(pool.Name)


I feel like I may be asking a basic question, but how can we share the same workqueue between Egress resources and ExternalIPPool resources?

good catch, the code was added at the last minite and I haven't tested it.
Will add an unit test to cover it.

jianjuns · 2024-01-02T23:04:58Z

pkg/apis/crd/v1beta1/types.go

@@ -210,6 +210,8 @@ type ExternalIPPool struct {
 type ExternalIPPoolSpec struct {
 	// The IP ranges of this IP pool, e.g. 10.10.0.0/24, 10.10.10.2-10.10.10.20, 10.10.10.30-10.10.10.30.
 	IPRanges []IPRange `json:"ipRanges"`
+	// The Subnet info of this IP pool. If set, all IP ranges in the IP pool should share the same subnet attributes.
+	SubnetInfo *SubnetInfo `json:"subnetInfo,omitempty"`


Do you think better to reuse SubnetIPRange to be consistent with IPPool?

pkg/controller/externalippool/validate.go

pkg/agent/types/net.go

pkg/agent/route/route_linux.go

pkg/agent/ipassigner/ip_assigner_linux.go

pkg/agent/controller/egress/egress_controller.go

jianjuns

Could you explain to me how stale rules are cleaned up after agent restarts?

docs/feature-gates.md

pkg/agent/controller/egress/egress_controller.go

By default, it's assumed that the IPs allocated from the pool are in the same subnet as the Node IPs. In some cases, users want to use IPs in different subnets as Egress IPs. Additionally, users may want to use VLAN taggaing to segment the Egress traffic and the Node traffic. The commit implements the requirements by introducing an optional field, `subnetInfo`, to the ExternalIPPool resource. The `subnetInfo` field contains the subnet attributes of the IPs in this pool. When using a different subnet: * `gateway` and `prefixLength` must be set. Antrea will route Egress traffic to the specified gateway when the destination is not in the same subnet of the Egress IP, otherwise route it to the destination directly. * Optionally, you can specify `vlan` if the underlying network is expecting it. Once set, Antrea will tag Egress traffic leaving the Egress Node with the specified VLAN ID. Correspondingly, it's expected that reply traffic towards these Egress IPs are also tagged with the specified VLAN ID when arriving the Egress Node. The implementation involves VLAN sub-interfaces and policy routing. * For a given subnet with a VLAN ID, a separate VLAN sub-interface will be created to hold the Egress IPs allocated from it. Egress traffic and its reply traffic will be sent over and received from the VLAN sub-interface for proper tagging and untagging. * For a given subnet, a separate route table will be created, routing the selected Egress traffic to the specified gateway, or to its neighbor. * For multiple Egress IPs associated allocated from the same subnet, a separate IP rule will be created for each Egress IP, matching its pkt mark and looking up the shared route table. The feature is gated by the alpha "EgressSeparateSubnet" feature gate. Signed-off-by: Quan Tian <[email protected]>

tnqn · 2024-01-03T05:56:51Z

Could you explain to me how stale rules are cleaned up after agent restarts?

It's done by RestoreEgressRoutesAndRules, we simply remove all routes and rules associated with table IDs between 101 and 120. Later we can try a more graceful way, but we reallocate marks anyway, so there is no big difference before we can make marks permanent.

antoninbas

I think one thing that came up during the community meeting 2 weeks ago was whether overlapping subnets (CIDRs) could exist for different VLAN IDs. IIRC, this is not supported (and perhaps some parts of the code make that assumption). Do we have any validation for this or any meaningful error log? Any need to mention it in the documentation?

antoninbas · 2024-01-03T19:03:32Z

ci/kind/kind-setup.sh

@@ -66,11 +69,17 @@ where:
  --subnets: a subnet creates a separate Docker bridge network (named 'antrea-<idx>') with the assigned subnet. A worker
    Node will be connected to one of those network. Default is empty: all worker Nodes connected to the default Docker
    bridge network created by kind.
+  --vlan-subnets: specifies the subnets of the VLAN to which all Nodes will be connected, in addition to the primary network.
+    The IP expression of the subnet will be used as the gateway IP. For example, '--vlan-subnets 10.100.100.1/24' means
+    10.100.100.1/24 will be assigned to the VLAN sub-interface of the network.


I find the last sentence a bit confusing (specifically "VLAN sub-interface of the network"). Maybe: "means that a VLAN sub-interface will be created on the primary Docker bridge, and it will be assigned the 10.100.100.1/24 address"

antoninbas · 2024-01-03T19:04:26Z

ci/kind/kind-setup.sh

+    The IP expression of the subnet will be used as the gateway IP. For example, '--vlan-subnets 10.100.100.1/24' means
+    10.100.100.1/24 will be assigned to the VLAN sub-interface of the network.
+  --vlan-id: specifies the ID of the VLAN to which all Nodes will be connected, in addition to the primary network. Note,
+    '--vlan-subnets' and '--vlan-id' must be specified together.


nit: maybe give an error if only one is specified (instead of silently ignoring it)

antoninbas · 2024-01-03T19:06:30Z

ci/kind/kind-setup.sh

+      docker_run_with_host_net iptables -t filter -D FORWARD -i $bridge_interface -o $interface_name -j ACCEPT || true
+      docker_run_with_host_net iptables -t filter -D FORWARD -o $bridge_interface -i $interface_name -j ACCEPT || true


logically, it would make more sense to delete the iptables rules before deleting the interface? I don't know what happens to iptables rules referencing an interface that doesn't exist.

done.
BTW, it's ok to reference an non-existing interface in iptables rules.

antoninbas · 2024-01-03T19:08:49Z

test/e2e/framework.go

@@ -466,6 +482,38 @@ func (data *TestData) RunCommandOnNodeExt(nodeName, cmd string, envs map[string]
 	return data.provider.RunCommandOnNodeExt(nodeName, cmd, envs, stdin, sudo)
 }

+func (data *TestData) collectExternalInfo() error {


the function never returns an error, I don't know if this was intended

added error when input is invalid.

antoninbas · 2024-01-03T19:13:05Z

test/e2e/framework.go

+	externalServerIPv4 string
+	externalServerIPv6 string


I know that for existing Egress e2e tests, we use a network namespace to simulate an external server (see getCommandInFakeExternalNetwork). It's too bad that we have 2 separate mechanisms, but I assume that:

the netns way doesn't work for VLAN testing?

the new way is specific to Kind, hence we want to keep the netns way as well for Jenkins tests?

yes

we can change the other test to lerverage the external server, that's also why the external server is not made specific to vlan configuration, but I'd like to do it via another PR, the current PR is already too large. I feel it's not worth to run the test in a hacky way on non-kind testbeds.

antoninbas · 2024-01-03T20:10:28Z

pkg/agent/ipassigner/ip_assigner_linux.go

+// It can be used to determine whether it's safe to delete an interface when it's no longer used.
+const vlanInterfacePrefix = "antrea-ext."
+
+// assignee is the unit that IPs are assigned to. All IPs from the same subnet share an assignee.


I don't know if the distinction is meaningful here, but I think there is one assignee per VLAN ID, not per "subnet". Maybe the term "subnet" is a bit misleading here.

antoninbas · 2024-01-03T20:10:55Z

pkg/agent/ipassigner/ip_assigner_linux.go

+	// The field must not be nil.
+	logicalInterface *net.Interface
+	// link is used for IP link management and IP address add/del operation. The field can be nil if IPs don't need to
+	// assigned to an interface physically.


s/to assigned/to be assigned

antoninbas · 2024-01-03T20:11:34Z

pkg/agent/ipassigner/ip_assigner_linux.go

+
+func (as *assignee) destroy() error {
+	if err := netlink.LinkDel(as.link); err != nil {
+		return fmt.Errorf("error deleting interface %v", as.link)


not wrapping the actual error is intentional here?

antoninbas · 2024-01-03T20:17:09Z

pkg/agent/ipassigner/ip_assigner_linux.go

-				}
-			}
-		}
+func (a *ipAssigner) InitIPs(desired map[string]*crdv1b1.SubnetInfo) error {


The previous version of InitIPs would acquire the mutex for the entire function execution, but now we acquire the mutex for individual calls to AssignIP / UnassignIP. I assume this is ok based on how the function is meant to be used (called once for initialization, and not concurrently with other calls), but I just wanted to double-check with you.

Your understanding is correct, added a comment to make it clear.

It's not thread-safe and should only be called once for initialization before calling other methods.

antoninbas · 2024-01-03T20:18:07Z

pkg/agent/ipassigner/ip_assigner_linux.go

 	return nil
 }

+func (a *ipAssigner) GetInterfaceID(subnetInfo *crdv1b1.SubnetInfo) (int, bool) {
+	as, _ := a.getAssignee(subnetInfo, false)


not protected by read lock?

my bad, missed it when adding the method

jianjuns · 2024-01-03T21:48:01Z

pkg/agent/route/interfaces.go

@@ -59,6 +59,21 @@ type Interface interface {
 	// DeleteSNATRule should delete rule to SNAT outgoing traffic with the mark.
 	DeleteSNATRule(mark uint32) error

+	// RestoreEgressRoutesAndRules restores the routes and rules configured on the system for Egress to the cache.


Egress -> Egresses?

jianjuns · 2024-01-03T21:48:18Z

pkg/agent/route/route_linux.go

@@ -980,6 +993,38 @@ func (c *Client) listIPRoutesOnGW() ([]netlink.Route, error) {
 	return routes, nil
 }

+// RestoreEgressRoutesAndRules simply deletes all IP routes and rules created for Egress for now.


jianjuns · 2024-01-03T21:50:43Z

docs/egress.md

@@ -198,6 +199,52 @@ The `ipRanges` field contains a list of IP ranges representing the available IPs
 of this IP pool. Each IP range may consist of a `cidr` or a pair of `start` and
 `end` IPs (which are themselves included in the range).

+### SubnetInfo
+
+By default, it's assumed that the IPs allocated from the pool are in the same


the pool -> an ExternalIPPool

jianjuns · 2024-01-03T21:52:40Z

docs/egress.md

+these Egress IPs is also tagged with the specified VLAN ID when arriving at the
+Egress Node.
+
+An example of ExternalIPPool using a different subnet is as below:


different -> non-default

jianjuns · 2024-01-03T21:53:50Z

docs/egress.md

+**Note**: Specifying different subnets is currently in alpha version. To use
+this feature, users should enable the `EgressSeparateSubnet` feature gate.
+Currently, the maximum number of different subnets that can be supported in a
+cluster is 20, which should be sufficient for most cases. If you need to have


The limit is per Node, not per cluster?

It's per Node, however it will be risky for users to use more than 20 subnets in a cluster as they don't have control over how the Egress IPs are distributed, and in worst case (like most Egress candidate Nodes crash and Egress are scheduled to a few remaining Nodes) it may sometimes work and sometimes not. So I feel we should just ask users to not use 20 subnets in a cluster before we support it.

Another reason is if say 20 subnets per Node, we need to elaborate they can't assume the cluster limit is not 20*Nodes...

Ok, we can start from 20 per cluster. But I assume there can be cases that users configure different VLANs on different sets of Egress Nodes. This seems more likely to happen when Nodes are connected to physical network. We can see if users ask for more VLANs.

tnqn

@antoninbas @jianjuns thanks for your review. I have addressed all comments in the 2nd commit.

tnqn · 2024-01-04T03:03:19Z

ci/kind/kind-setup.sh

@@ -66,11 +69,17 @@ where:
  --subnets: a subnet creates a separate Docker bridge network (named 'antrea-<idx>') with the assigned subnet. A worker
    Node will be connected to one of those network. Default is empty: all worker Nodes connected to the default Docker
    bridge network created by kind.
+  --vlan-subnets: specifies the subnets of the VLAN to which all Nodes will be connected, in addition to the primary network.
+    The IP expression of the subnet will be used as the gateway IP. For example, '--vlan-subnets 10.100.100.1/24' means
+    10.100.100.1/24 will be assigned to the VLAN sub-interface of the network.


tnqn · 2024-01-04T03:17:27Z

ci/kind/kind-setup.sh

+    The IP expression of the subnet will be used as the gateway IP. For example, '--vlan-subnets 10.100.100.1/24' means
+    10.100.100.1/24 will be assigned to the VLAN sub-interface of the network.
+  --vlan-id: specifies the ID of the VLAN to which all Nodes will be connected, in addition to the primary network. Note,
+    '--vlan-subnets' and '--vlan-id' must be specified together.


tnqn · 2024-01-04T03:20:06Z

ci/kind/kind-setup.sh

+      docker_run_with_host_net iptables -t filter -D FORWARD -i $bridge_interface -o $interface_name -j ACCEPT || true
+      docker_run_with_host_net iptables -t filter -D FORWARD -o $bridge_interface -i $interface_name -j ACCEPT || true


done.
BTW, it's ok to reference an non-existing interface in iptables rules.

tnqn · 2024-01-04T03:24:41Z

test/e2e/framework.go

+	externalServerIPv4 string
+	externalServerIPv6 string


yes

we can change the other test to lerverage the external server, that's also why the external server is not made specific to vlan configuration, but I'd like to do it via another PR, the current PR is already too large. I feel it's not worth to run the test in a hacky way on non-kind testbeds.

tnqn · 2024-01-04T03:27:28Z

test/e2e/framework.go

@@ -466,6 +482,38 @@ func (data *TestData) RunCommandOnNodeExt(nodeName, cmd string, envs map[string]
 	return data.provider.RunCommandOnNodeExt(nodeName, cmd, envs, stdin, sudo)
 }

+func (data *TestData) collectExternalInfo() error {


added error when input is invalid.

tnqn · 2024-01-04T04:48:47Z

pkg/agent/route/interfaces.go

@@ -59,6 +59,21 @@ type Interface interface {
 	// DeleteSNATRule should delete rule to SNAT outgoing traffic with the mark.
 	DeleteSNATRule(mark uint32) error

+	// RestoreEgressRoutesAndRules restores the routes and rules configured on the system for Egress to the cache.


tnqn · 2024-01-04T04:49:18Z

pkg/agent/route/route_linux.go

@@ -980,6 +993,38 @@ func (c *Client) listIPRoutesOnGW() ([]netlink.Route, error) {
 	return routes, nil
 }

+// RestoreEgressRoutesAndRules simply deletes all IP routes and rules created for Egress for now.


tnqn · 2024-01-04T04:50:18Z

docs/egress.md

@@ -198,6 +199,52 @@ The `ipRanges` field contains a list of IP ranges representing the available IPs
 of this IP pool. Each IP range may consist of a `cidr` or a pair of `start` and
 `end` IPs (which are themselves included in the range).

+### SubnetInfo
+
+By default, it's assumed that the IPs allocated from the pool are in the same


tnqn · 2024-01-04T04:50:35Z

docs/egress.md

+these Egress IPs is also tagged with the specified VLAN ID when arriving at the
+Egress Node.
+
+An example of ExternalIPPool using a different subnet is as below:


tnqn · 2024-01-04T04:56:25Z

docs/egress.md

+**Note**: Specifying different subnets is currently in alpha version. To use
+this feature, users should enable the `EgressSeparateSubnet` feature gate.
+Currently, the maximum number of different subnets that can be supported in a
+cluster is 20, which should be sufficient for most cases. If you need to have


It's per Node, however it will be risky for users to use more than 20 subnets in a cluster as they don't have control over how the Egress IPs are distributed, and in worst case (like most Egress candidate Nodes crash and Egress are scheduled to a few remaining Nodes) it may sometimes work and sometimes not. So I feel we should just ask users to not use 20 subnets in a cluster before we support it.

Another reason is if say 20 subnets per Node, we need to elaborate they can't assume the cluster limit is not 20*Nodes...

tnqn · 2024-01-04T05:01:31Z

I think one thing that came up during the community meeting 2 weeks ago was whether overlapping subnets (CIDRs) could exist for different VLAN IDs. IIRC, this is not supported (and perhaps some parts of the code make that assumption). Do we have any validation for this or any meaningful error log? Any need to mention it in the documentation?

Let me try if it can work with overlapping subnets, will add validation or documentation if it can't. But I assume this can be in another PR given it's not common to use overlapping external IPs.

Signed-off-by: Quan Tian <[email protected]>

tnqn · 2024-01-04T08:23:23Z

I think one thing that came up during the community meeting 2 weeks ago was whether overlapping subnets (CIDRs) could exist for different VLAN IDs. IIRC, this is not supported (and perhaps some parts of the code make that assumption). Do we have any validation for this or any meaningful error log? Any need to mention it in the documentation?

Let me try if it can work with overlapping subnets, will add validation or documentation if it can't. But I assume this can be in another PR given it's not common to use overlapping external IPs.

It can't work, created #5842 for enhancing the validation.

antoninbas

LGTM, only 2 minor comments / questions

antoninbas · 2024-01-04T15:23:34Z

build/charts/antrea/templates/webhooks/validating/crdvalidator.yaml

        apiGroups: ["crd.antrea.io"]
-        apiVersions: ["v1alpha2"]
+        apiVersions: ["v1alpha2", "v1beta1"]


I missed this change before, should it be in a separate PR?

I thought this is a bug and thought it couldn't work without listing v1beta1 here, however, it turns out working even without it because the matchPolicy defaults to Equivalent. So at least for now this could work with or without v1beta1.
I keep the change because I need to add "CREATE" anyway.

Sounds good, no need for the separate PR then

antoninbas · 2024-01-04T15:37:40Z

pkg/agent/openflow/pipeline.go

 		fb = fb.Action().GotoTable(EgressQoSTable.GetID())
 	} else {
-		fb = fb.MatchCTStateNew(true).


I also didn't see this change before. I assume it's not specific to this PR. Was it just a redundant match?

It's related to the patch. Previously we only need the first packet of a connection to be marked because SNAT applies to the whole connection. But for policy routing, we need all packets of a connection to be marked.

Makes sense

antoninbas

typo in PR description that can be fixed when merging: s/taggaing/tagging

LGTM

tnqn · 2024-01-04T15:52:41Z

typo in PR description that can be fixed when merging: s/taggaing/tagging

thanks, corrected description, will edit commit message when merging.

tnqn · 2024-01-05T02:18:48Z

/test-all
/test-ipv6-all
/test-ipv6-only-all

tnqn · 2024-01-05T02:19:17Z

/test-windows-all

In certain e2e tests (e.g., Egress, ServiceExternalIP, NodePort/LoadBalancer Service), an external client/server is required. Currently, we use either a network namespace or a Node deployed with Antrea for this purpose. However, a network namespace can only be created on a single Node to do the tests within the Node, and using a Node deployed with Antrea, whose network configurations are affected by Antrea, potentially impacting related e2e tests. A similar functionality was introduced in antrea-io#5799, an external container is created after Kind cluster setup to serve as an external server/client for most e2e tests. For more complex e2e tests (e.g., involving an FRR router), the requirements include: - K8s-managed creation and deletion of the FRR router. - A network environment for the FRR router unaffected by Antrea. - Maximizing reuse of existing test framework code. To meet these needs, this commit introduces an option to add an extra worker Node to the Kind cluster where Antrea will not be deployed. This allows deploying a host network Pod on that Node, ensuring a clean network environment unaffected by Antrea. Signed-off-by: Hongliang Liu <[email protected]>

luolanzone added this to the Antrea v1.15 release milestone Dec 15, 2023

tnqn force-pushed the egress-vlan branch 6 times, most recently from bbb649e to 524e43b Compare December 19, 2023 15:53

tnqn changed the title ~~Support tagging Egress traffic with VLAN~~ Support Egress using IPs from a separate subnet Dec 19, 2023

tnqn marked this pull request as ready for review December 19, 2023 16:15

tnqn requested review from antoninbas, wenqiq, jianjuns and xliuxu December 19, 2023 16:51

antoninbas reviewed Dec 19, 2023

View reviewed changes

tnqn force-pushed the egress-vlan branch 3 times, most recently from 7672b02 to f2d525a Compare December 20, 2023 04:53

luolanzone added the action/release-note Indicates a PR that should be included in release notes. label Dec 20, 2023

tnqn force-pushed the egress-vlan branch 4 times, most recently from c46f060 to c6024ac Compare December 27, 2023 11:46

tnqn force-pushed the egress-vlan branch 3 times, most recently from fe8cb8c to ad1fe76 Compare January 2, 2024 15:53

jianjuns reviewed Jan 2, 2024

View reviewed changes

tnqn force-pushed the egress-vlan branch from ad1fe76 to 31b44be Compare January 3, 2024 04:28

jianjuns reviewed Jan 3, 2024

View reviewed changes

docs/feature-gates.md Outdated Show resolved Hide resolved

pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved

tnqn force-pushed the egress-vlan branch from 31b44be to 6ac68e8 Compare January 3, 2024 05:49

tnqn force-pushed the egress-vlan branch from 6ac68e8 to 35612bd Compare January 3, 2024 05:55

antoninbas reviewed Jan 3, 2024

View reviewed changes

jianjuns reviewed Jan 3, 2024

View reviewed changes

tnqn commented Jan 4, 2024

View reviewed changes

tnqn force-pushed the egress-vlan branch from 507b48d to 34cb15e Compare January 4, 2024 06:18

Address comments

1f1f268

Signed-off-by: Quan Tian <[email protected]>

tnqn force-pushed the egress-vlan branch from 34cb15e to 1f1f268 Compare January 4, 2024 08:15

antoninbas reviewed Jan 4, 2024

View reviewed changes

antoninbas approved these changes Jan 4, 2024

View reviewed changes

jianjuns approved these changes Jan 4, 2024

View reviewed changes

tnqn merged commit 43427a1 into antrea-io:main Jan 5, 2024
52 of 60 checks passed

tnqn deleted the egress-vlan branch January 5, 2024 06:04

tnqn mentioned this pull request Jan 22, 2024

Enhance ExternalIPPool validation #5898

Merged

tnqn mentioned this pull request Feb 5, 2024

Unify the IPPool subnet and ExteranalIPPool subnet definition #5961

Closed

hongliangl mentioned this pull request Jun 26, 2024

Support deploying one FRR container in Kind network #6488

Merged

		start := net.ParseIP(ipRange.Start)
		end := net.ParseIP(ipRange.End)

		// minEgressTable to maxEgressTable are the route table IDs that can be configured on a Node for Egress traffic.
		// Each distinct subnet uses one route table. 20 subnets should be enough.

		docker_run_with_host_net iptables -t filter -D FORWARD -i $bridge_interface -o $interface_name -j ACCEPT \|\| true
		docker_run_with_host_net iptables -t filter -D FORWARD -o $bridge_interface -i $interface_name -j ACCEPT \|\| true

Support Egress using IPs from a separate subnet #5799

Support Egress using IPs from a separate subnet #5799

Conversation

tnqn commented Dec 14, 2023 • edited Loading

antoninbas commented Dec 19, 2023 • edited Loading

antoninbas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jianjuns left a comment

Choose a reason for hiding this comment

tnqn commented Jan 3, 2024 • edited Loading

antoninbas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tnqn left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tnqn commented Dec 14, 2023 •

edited

Loading

antoninbas commented Dec 19, 2023 •

edited

Loading

tnqn commented Jan 3, 2024 •

edited

Loading

tnqn left a comment •

edited

Loading