-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Egress using IPs from a separate subnet #5799
Conversation
bbb649e
to
524e43b
Compare
typos in PR description: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial review, I am not completely done yet
@@ -210,6 +210,8 @@ type ExternalIPPool struct { | |||
type ExternalIPPoolSpec struct { | |||
// The IP ranges of this IP pool, e.g. 10.10.0.0/24, 10.10.10.2-10.10.10.20, 10.10.10.30-10.10.10.30. | |||
IPRanges []IPRange `json:"ipRanges"` | |||
// The Subnet info of this IP pool. If set, all IP ranges in the IP pool should share the same subnet attributes. | |||
SubnetInfo *SubnetInfo `json:"subnetInfo,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ExternalIPPool is used by other features besides Egress (at least by ServiceExternalIP
), will this field ever be relevant for these other features? For example, VLAN tagging / untagging for external Service traffic?
Should the documentation mention that this field is only used when an IP is allocated from the pool for the Egress feature, and is ignored otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't verified whether it can be used by ServiceExternalIP but likely infeasible because ServiceExternalIP requires the IP to be not assigned to the Node physically, which can't work with VLAN sub-interface according to my experiments when implementing this feature.
I have added a comment to the field about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think better to reuse SubnetIPRange
to be consistent with IPPool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this in the beginning but found some deficiencies in this way:
- With a list of SubnetIPRange in a single pool, it‘s possible to have different subnets in a pool, which makes implementation complex and inefficient: for example, when we want to know the subnet info of an IP, we have to loop over all ranges to figure it out; whenever there is a change in any of the IP ranges, we have to resync all resources associated with IPs allocated from it, increasing the span of the a single pool.
- It may be unclear to users that they should create a pool for each subnet or a pool for all subnets. From API's perspective, one object for one resource is clearer and simpler.
- It's redundant to fill the same subnet info when multiple ip ranges sharing the subnet.
I talked about this in the community meeting https://www.youtube.com/watch?v=q3vaVzzH6WM, starting from 20:42.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With your points, should we update IPPool to have a single SubnetInfo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I discussed with @gran-vmv, I think there was no specific reason why IPPool was made so, we can update it when bumping it up to beta.
// Gateway IP for this subnet, e.g. 10.10.1.1. | ||
Gateway string `json:"gateway"` | ||
// Prefix length for the subnet, e.g. 24. | ||
PrefixLength int32 `json:"prefixLength"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if it would be more natural for people to explicitly have to provide a CIDR for the subnet and a separate IP address (part of the CIDR) for the gateway, as opposed to deducing the subnet from the gateway IP + prefix length. This may be more in line with how people are used to interact with ip
tools? But I don't have a strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refered to SubnetInfo in IPPool
for this, though not completely. Let me check how the struct was determined and how such configuration was provided in other systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems when IPPool
was designed, it just asked the least information from users to construct a subnet. Asking a subnet may be a bit tedious for users as they may have provided the same information in ipRanges
once. For example, the 2nd config below would look strange, and we need have one more validation to ensure the provided subnet matches the ip ranges.
ipRanges:
- cidr: 10.10.0.0/24
subnetInfo:
gateway: 10.10.0.1
prefixLength: 24
vlan: 10
ipRanges:
- cidr: 10.10.0.0/24
subnetInfo:
gateway: 10.10.0.1
subnet: 10.10.0.0/24
vlan: 10
prefixLength: | ||
type: integer | ||
minimum: 1 | ||
maximum: 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should it be 127? I think you reject 128 for IPv6 addresses is the validation webhook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
start := net.ParseIP(ipRange.Start) | ||
end := net.ParseIP(ipRange.End) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, do we ever validate that start <= end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess no, will add it via another PR, I found some other issues in validation when implementing this.
}, | ||
}, | ||
{ | ||
name: "Adding unmatched SubnetInfo should not be allowed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name of this test case is invalid, this is a valid case as shown by the expectedResponse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for catching it
minEgressTable = 101 | ||
maxEgressTable = 120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we add Route
to the name, e.g. minEgressRouteTable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// minEgressTable to maxEgressTable are the route table IDs that can be configured on a Node for Egress traffic. | ||
// Each distinct subnet uses one route table. 20 subnets should be enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we allocate any other table ID for any other feature? I don't think we do, except maybe for policyOnly mode, which doesn't support Egress anyway. Just wondering if we should define this in a "central" place, like we do for packet marks, OVS registers, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, moved it to a common package., will see if there are other usages and move them to the same file if yes via another PR.
@@ -70,6 +70,11 @@ const ( | |||
// maxEgressMark is the maximum mark of Egress IPs can be configured on a Node. | |||
maxEgressMark = 255 | |||
|
|||
// minEgressTable to maxEgressTable are the route table IDs that can be configured on a Node for Egress traffic. | |||
// Each distinct subnet uses one route table. 20 subnets should be enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the limit is per Node theoretically, but I was wondering if we should mention in the Egress documentation that we recommend staying under 20 subnets across all ExternalIPPools (when explicitly providing subnetInfo
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added a note in egress.md:
Currently, the maximum number of different subnets that can be supported in a
cluster is 20, which should be sufficient for most cases. If you need to have
more subnets, please raise an issue with your use case, and we will consider
revising the limit based on that.
docs/egress.md
Outdated
* Optionally, you can specify `vlan` if the underlying network is expecting it. | ||
Once set, Antrea will tag Egress traffic leaving the Egress Node with the | ||
specified VLAN ID. Correspondingly, it's expected that reply traffic towards | ||
these Egress IPs are also tagged with the specified VLAN ID when arriving the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/are also tagged/is also tagged
s/when arriving the Egress Node/when arriving at the Egress Node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if pool.Spec.SubnetInfo == nil { | ||
return | ||
} | ||
c.queue.Add(pool.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I may be asking a basic question, but how can we share the same workqueue between Egress resources and ExternalIPPool resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, the code was added at the last minite and I haven't tested it.
Will add an unit test to cover it.
7672b02
to
f2d525a
Compare
c46f060
to
c6024ac
Compare
fe8cb8c
to
ad1fe76
Compare
@@ -210,6 +210,8 @@ type ExternalIPPool struct { | |||
type ExternalIPPoolSpec struct { | |||
// The IP ranges of this IP pool, e.g. 10.10.0.0/24, 10.10.10.2-10.10.10.20, 10.10.10.30-10.10.10.30. | |||
IPRanges []IPRange `json:"ipRanges"` | |||
// The Subnet info of this IP pool. If set, all IP ranges in the IP pool should share the same subnet attributes. | |||
SubnetInfo *SubnetInfo `json:"subnetInfo,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think better to reuse SubnetIPRange
to be consistent with IPPool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain to me how stale rules are cleaned up after agent restarts?
By default, it's assumed that the IPs allocated from the pool are in the same subnet as the Node IPs. In some cases, users want to use IPs in different subnets as Egress IPs. Additionally, users may want to use VLAN taggaing to segment the Egress traffic and the Node traffic. The commit implements the requirements by introducing an optional field, `subnetInfo`, to the ExternalIPPool resource. The `subnetInfo` field contains the subnet attributes of the IPs in this pool. When using a different subnet: * `gateway` and `prefixLength` must be set. Antrea will route Egress traffic to the specified gateway when the destination is not in the same subnet of the Egress IP, otherwise route it to the destination directly. * Optionally, you can specify `vlan` if the underlying network is expecting it. Once set, Antrea will tag Egress traffic leaving the Egress Node with the specified VLAN ID. Correspondingly, it's expected that reply traffic towards these Egress IPs are also tagged with the specified VLAN ID when arriving the Egress Node. The implementation involves VLAN sub-interfaces and policy routing. * For a given subnet with a VLAN ID, a separate VLAN sub-interface will be created to hold the Egress IPs allocated from it. Egress traffic and its reply traffic will be sent over and received from the VLAN sub-interface for proper tagging and untagging. * For a given subnet, a separate route table will be created, routing the selected Egress traffic to the specified gateway, or to its neighbor. * For multiple Egress IPs associated allocated from the same subnet, a separate IP rule will be created for each Egress IP, matching its pkt mark and looking up the shared route table. The feature is gated by the alpha "EgressSeparateSubnet" feature gate. Signed-off-by: Quan Tian <[email protected]>
It's done by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one thing that came up during the community meeting 2 weeks ago was whether overlapping subnets (CIDRs) could exist for different VLAN IDs. IIRC, this is not supported (and perhaps some parts of the code make that assumption). Do we have any validation for this or any meaningful error log? Any need to mention it in the documentation?
ci/kind/kind-setup.sh
Outdated
@@ -66,11 +69,17 @@ where: | |||
--subnets: a subnet creates a separate Docker bridge network (named 'antrea-<idx>') with the assigned subnet. A worker | |||
Node will be connected to one of those network. Default is empty: all worker Nodes connected to the default Docker | |||
bridge network created by kind. | |||
--vlan-subnets: specifies the subnets of the VLAN to which all Nodes will be connected, in addition to the primary network. | |||
The IP expression of the subnet will be used as the gateway IP. For example, '--vlan-subnets 10.100.100.1/24' means | |||
10.100.100.1/24 will be assigned to the VLAN sub-interface of the network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the last sentence a bit confusing (specifically "VLAN sub-interface of the network"). Maybe: "means that a VLAN sub-interface will be created on the primary Docker bridge, and it will be assigned the 10.100.100.1/24 address"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
The IP expression of the subnet will be used as the gateway IP. For example, '--vlan-subnets 10.100.100.1/24' means | ||
10.100.100.1/24 will be assigned to the VLAN sub-interface of the network. | ||
--vlan-id: specifies the ID of the VLAN to which all Nodes will be connected, in addition to the primary network. Note, | ||
'--vlan-subnets' and '--vlan-id' must be specified together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe give an error if only one is specified (instead of silently ignoring it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docker_run_with_host_net iptables -t filter -D FORWARD -i $bridge_interface -o $interface_name -j ACCEPT || true | ||
docker_run_with_host_net iptables -t filter -D FORWARD -o $bridge_interface -i $interface_name -j ACCEPT || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logically, it would make more sense to delete the iptables rules before deleting the interface? I don't know what happens to iptables rules referencing an interface that doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
BTW, it's ok to reference an non-existing interface in iptables rules.
@@ -466,6 +482,38 @@ func (data *TestData) RunCommandOnNodeExt(nodeName, cmd string, envs map[string] | |||
return data.provider.RunCommandOnNodeExt(nodeName, cmd, envs, stdin, sudo) | |||
} | |||
|
|||
func (data *TestData) collectExternalInfo() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function never returns an error, I don't know if this was intended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added error when input is invalid.
externalServerIPv4 string | ||
externalServerIPv6 string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that for existing Egress e2e tests, we use a network namespace to simulate an external server (see getCommandInFakeExternalNetwork
). It's too bad that we have 2 separate mechanisms, but I assume that:
- the netns way doesn't work for VLAN testing?
- the new way is specific to Kind, hence we want to keep the netns way as well for Jenkins tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- yes
- we can change the other test to lerverage the external server, that's also why the external server is not made specific to vlan configuration, but I'd like to do it via another PR, the current PR is already too large. I feel it's not worth to run the test in a hacky way on non-kind testbeds.
// It can be used to determine whether it's safe to delete an interface when it's no longer used. | ||
const vlanInterfacePrefix = "antrea-ext." | ||
|
||
// assignee is the unit that IPs are assigned to. All IPs from the same subnet share an assignee. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if the distinction is meaningful here, but I think there is one assignee per VLAN ID, not per "subnet". Maybe the term "subnet" is a bit misleading here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
// The field must not be nil. | ||
logicalInterface *net.Interface | ||
// link is used for IP link management and IP address add/del operation. The field can be nil if IPs don't need to | ||
// assigned to an interface physically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/to assigned/to be assigned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
func (as *assignee) destroy() error { | ||
if err := netlink.LinkDel(as.link); err != nil { | ||
return fmt.Errorf("error deleting interface %v", as.link) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not wrapping the actual error is intentional here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
} | ||
} | ||
} | ||
func (a *ipAssigner) InitIPs(desired map[string]*crdv1b1.SubnetInfo) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous version of InitIPs
would acquire the mutex for the entire function execution, but now we acquire the mutex for individual calls to AssignIP
/ UnassignIP
. I assume this is ok based on how the function is meant to be used (called once for initialization, and not concurrently with other calls), but I just wanted to double-check with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding is correct, added a comment to make it clear.
It's not thread-safe and should only be called once for initialization before calling other methods.
return nil | ||
} | ||
|
||
func (a *ipAssigner) GetInterfaceID(subnetInfo *crdv1b1.SubnetInfo) (int, bool) { | ||
as, _ := a.getAssignee(subnetInfo, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not protected by read lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, missed it when adding the method
pkg/agent/route/interfaces.go
Outdated
@@ -59,6 +59,21 @@ type Interface interface { | |||
// DeleteSNATRule should delete rule to SNAT outgoing traffic with the mark. | |||
DeleteSNATRule(mark uint32) error | |||
|
|||
// RestoreEgressRoutesAndRules restores the routes and rules configured on the system for Egress to the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Egress -> Egresses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/agent/route/route_linux.go
Outdated
@@ -980,6 +993,38 @@ func (c *Client) listIPRoutesOnGW() ([]netlink.Route, error) { | |||
return routes, nil | |||
} | |||
|
|||
// RestoreEgressRoutesAndRules simply deletes all IP routes and rules created for Egress for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Egresses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/egress.md
Outdated
@@ -198,6 +199,52 @@ The `ipRanges` field contains a list of IP ranges representing the available IPs | |||
of this IP pool. Each IP range may consist of a `cidr` or a pair of `start` and | |||
`end` IPs (which are themselves included in the range). | |||
|
|||
### SubnetInfo | |||
|
|||
By default, it's assumed that the IPs allocated from the pool are in the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pool -> an ExternalIPPool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/egress.md
Outdated
these Egress IPs is also tagged with the specified VLAN ID when arriving at the | ||
Egress Node. | ||
|
||
An example of ExternalIPPool using a different subnet is as below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
different -> non-default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
**Note**: Specifying different subnets is currently in alpha version. To use | ||
this feature, users should enable the `EgressSeparateSubnet` feature gate. | ||
Currently, the maximum number of different subnets that can be supported in a | ||
cluster is 20, which should be sufficient for most cases. If you need to have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit is per Node, not per cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's per Node, however it will be risky for users to use more than 20 subnets in a cluster as they don't have control over how the Egress IPs are distributed, and in worst case (like most Egress candidate Nodes crash and Egress are scheduled to a few remaining Nodes) it may sometimes work and sometimes not. So I feel we should just ask users to not use 20 subnets in a cluster before we support it.
Another reason is if say 20 subnets per Node, we need to elaborate they can't assume the cluster limit is not 20*Nodes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we can start from 20 per cluster. But I assume there can be cases that users configure different VLANs on different sets of Egress Nodes. This seems more likely to happen when Nodes are connected to physical network. We can see if users ask for more VLANs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoninbas @jianjuns thanks for your review. I have addressed all comments in the 2nd commit.
ci/kind/kind-setup.sh
Outdated
@@ -66,11 +69,17 @@ where: | |||
--subnets: a subnet creates a separate Docker bridge network (named 'antrea-<idx>') with the assigned subnet. A worker | |||
Node will be connected to one of those network. Default is empty: all worker Nodes connected to the default Docker | |||
bridge network created by kind. | |||
--vlan-subnets: specifies the subnets of the VLAN to which all Nodes will be connected, in addition to the primary network. | |||
The IP expression of the subnet will be used as the gateway IP. For example, '--vlan-subnets 10.100.100.1/24' means | |||
10.100.100.1/24 will be assigned to the VLAN sub-interface of the network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
The IP expression of the subnet will be used as the gateway IP. For example, '--vlan-subnets 10.100.100.1/24' means | ||
10.100.100.1/24 will be assigned to the VLAN sub-interface of the network. | ||
--vlan-id: specifies the ID of the VLAN to which all Nodes will be connected, in addition to the primary network. Note, | ||
'--vlan-subnets' and '--vlan-id' must be specified together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docker_run_with_host_net iptables -t filter -D FORWARD -i $bridge_interface -o $interface_name -j ACCEPT || true | ||
docker_run_with_host_net iptables -t filter -D FORWARD -o $bridge_interface -i $interface_name -j ACCEPT || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
BTW, it's ok to reference an non-existing interface in iptables rules.
externalServerIPv4 string | ||
externalServerIPv6 string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- yes
- we can change the other test to lerverage the external server, that's also why the external server is not made specific to vlan configuration, but I'd like to do it via another PR, the current PR is already too large. I feel it's not worth to run the test in a hacky way on non-kind testbeds.
@@ -466,6 +482,38 @@ func (data *TestData) RunCommandOnNodeExt(nodeName, cmd string, envs map[string] | |||
return data.provider.RunCommandOnNodeExt(nodeName, cmd, envs, stdin, sudo) | |||
} | |||
|
|||
func (data *TestData) collectExternalInfo() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added error when input is invalid.
pkg/agent/route/interfaces.go
Outdated
@@ -59,6 +59,21 @@ type Interface interface { | |||
// DeleteSNATRule should delete rule to SNAT outgoing traffic with the mark. | |||
DeleteSNATRule(mark uint32) error | |||
|
|||
// RestoreEgressRoutesAndRules restores the routes and rules configured on the system for Egress to the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/agent/route/route_linux.go
Outdated
@@ -980,6 +993,38 @@ func (c *Client) listIPRoutesOnGW() ([]netlink.Route, error) { | |||
return routes, nil | |||
} | |||
|
|||
// RestoreEgressRoutesAndRules simply deletes all IP routes and rules created for Egress for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/egress.md
Outdated
@@ -198,6 +199,52 @@ The `ipRanges` field contains a list of IP ranges representing the available IPs | |||
of this IP pool. Each IP range may consist of a `cidr` or a pair of `start` and | |||
`end` IPs (which are themselves included in the range). | |||
|
|||
### SubnetInfo | |||
|
|||
By default, it's assumed that the IPs allocated from the pool are in the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/egress.md
Outdated
these Egress IPs is also tagged with the specified VLAN ID when arriving at the | ||
Egress Node. | ||
|
||
An example of ExternalIPPool using a different subnet is as below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
**Note**: Specifying different subnets is currently in alpha version. To use | ||
this feature, users should enable the `EgressSeparateSubnet` feature gate. | ||
Currently, the maximum number of different subnets that can be supported in a | ||
cluster is 20, which should be sufficient for most cases. If you need to have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's per Node, however it will be risky for users to use more than 20 subnets in a cluster as they don't have control over how the Egress IPs are distributed, and in worst case (like most Egress candidate Nodes crash and Egress are scheduled to a few remaining Nodes) it may sometimes work and sometimes not. So I feel we should just ask users to not use 20 subnets in a cluster before we support it.
Another reason is if say 20 subnets per Node, we need to elaborate they can't assume the cluster limit is not 20*Nodes...
Let me try if it can work with overlapping subnets, will add validation or documentation if it can't. But I assume this can be in another PR given it's not common to use overlapping external IPs. |
Signed-off-by: Quan Tian <[email protected]>
It can't work, created #5842 for enhancing the validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only 2 minor comments / questions
apiGroups: ["crd.antrea.io"] | ||
apiVersions: ["v1alpha2"] | ||
apiVersions: ["v1alpha2", "v1beta1"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this change before, should it be in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this is a bug and thought it couldn't work without listing v1beta1 here, however, it turns out working even without it because the matchPolicy
defaults to Equivalent
. So at least for now this could work with or without v1beta1.
I keep the change because I need to add "CREATE" anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, no need for the separate PR then
fb = fb.Action().GotoTable(EgressQoSTable.GetID()) | ||
} else { | ||
fb = fb.MatchCTStateNew(true). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also didn't see this change before. I assume it's not specific to this PR. Was it just a redundant match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's related to the patch. Previously we only need the first packet of a connection to be marked because SNAT applies to the whole connection. But for policy routing, we need all packets of a connection to be marked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in PR description that can be fixed when merging: s/taggaing/tagging
LGTM
thanks, corrected description, will edit commit message when merging. |
/test-all |
/test-windows-all |
In certain e2e tests (e.g., Egress, ServiceExternalIP, NodePort/LoadBalancer Service), an external client/server is required. Currently, we use either a network namespace or a Node deployed with Antrea for this purpose. However, a network namespace can only be created on a single Node to do the tests within the Node, and using a Node deployed with Antrea, whose network configurations are affected by Antrea, potentially impacting related e2e tests. A similar functionality was introduced in antrea-io#5799, an external container is created after Kind cluster setup to serve as an external server/client for most e2e tests. For more complex e2e tests (e.g., involving an FRR router), the requirements include: - K8s-managed creation and deletion of the FRR router. - A network environment for the FRR router unaffected by Antrea. - Maximizing reuse of existing test framework code. To meet these needs, this commit introduces an option to add an extra worker Node to the Kind cluster where Antrea will not be deployed. This allows deploying a host network Pod on that Node, ensuring a clean network environment unaffected by Antrea. Signed-off-by: Hongliang Liu <[email protected]>
In certain e2e tests (e.g., Egress, ServiceExternalIP, NodePort/LoadBalancer Service), an external client/server is required. Currently, we use either a network namespace or a Node deployed with Antrea for this purpose. However, a network namespace can only be created on a single Node to do the tests within the Node, and using a Node deployed with Antrea, whose network configurations are affected by Antrea, potentially impacting related e2e tests. A similar functionality was introduced in antrea-io#5799, an external container is created after Kind cluster setup to serve as an external server/client for most e2e tests. For more complex e2e tests (e.g., involving an FRR router), the requirements include: - K8s-managed creation and deletion of the FRR router. - A network environment for the FRR router unaffected by Antrea. - Maximizing reuse of existing test framework code. To meet these needs, this commit introduces an option to add an extra worker Node to the Kind cluster where Antrea will not be deployed. This allows deploying a host network Pod on that Node, ensuring a clean network environment unaffected by Antrea. Signed-off-by: Hongliang Liu <[email protected]>
In certain e2e tests (e.g., Egress, ServiceExternalIP, NodePort/LoadBalancer Service), an external client/server is required. Currently, we use either a network namespace or a Node deployed with Antrea for this purpose. However, a network namespace can only be created on a single Node to do the tests within the Node, and using a Node deployed with Antrea, whose network configurations are affected by Antrea, potentially impacting related e2e tests. A similar functionality was introduced in antrea-io#5799, an external container is created after Kind cluster setup to serve as an external server/client for most e2e tests. For more complex e2e tests (e.g., involving an FRR router), the requirements include: - K8s-managed creation and deletion of the FRR router. - A network environment for the FRR router unaffected by Antrea. - Maximizing reuse of existing test framework code. To meet these needs, this commit introduces an option to add an extra worker Node to the Kind cluster where Antrea will not be deployed. This allows deploying a host network Pod on that Node, ensuring a clean network environment unaffected by Antrea. Signed-off-by: Hongliang Liu <[email protected]>
By default, it's assumed that the IPs allocated from the pool are in the
same subnet as the Node IPs. In some cases, users want to use IPs in
different subnets as Egress IPs. Additionally, users may want to use
VLAN tagging to segment the Egress traffic and the Node traffic.
The commit implements the requirements by introducing an optional field,
subnetInfo
, to the ExternalIPPool resource. ThesubnetInfo
fieldcontains the subnet attributes of the IPs in this pool. When using a
different subnet:
gateway
andprefixLength
must be set. Antrea will route Egresstraffic to the specified gateway when the destination is not in the
same subnet of the Egress IP, otherwise route it to the destination
directly.
Optionally, you can specify
vlan
if the underlying network isexpecting it. Once set, Antrea will tag Egress traffic leaving the
Egress Node with the specified VLAN ID. Correspondingly, it's
expected that reply traffic towards these Egress IPs are also tagged
with the specified VLAN ID when arriving the Egress Node.
The implementation involves VLAN sub-interfaces and policy routing.
For a given subnet with a VLAN ID, a separate VLAN sub-interface will
be created to hold the Egress IPs allocated from it. Egress traffic
and its reply traffic will be sent over and received from the VLAN
sub-interface for proper tagging and untagging.
For a given subnet, a separate route table will be created, routing
the selected Egress traffic to the specified gateway, or to its
neighbor.
For multiple Egress IPs associated allocated from the same subnet, a
separate IP rule will be created for each Egress IP, matching its pkt
mark and looking up the shared route table.
The feature is gated by the alpha "EgressSeparateSubnet" feature gate.