-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use reserved OVS controller ports for default Antrea ports #6202
Use reserved OVS controller ports for default Antrea ports #6202
Conversation
pkg/agent/config/node_config.go
Outdated
// that once a port number has been assigned, it will stay the same throughout the lifetime | ||
// of the port, but when using ofport_request it is not guaranteed. | ||
// See https://github.com/antrea-io/antrea/issues/6192 for more details. | ||
FirstControllerOVSPort = 32768 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this one should be defined somewhere else, in pkg/ovs
. But then, the same can be said of BridgeOFPort
and AutoAssignedOFPort
. I can move the 3 of them. What do you think @tnqn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, moving non business specific ofports to it makes sense to me.
UplinkNetConfig: &config.AdapterNetConfig{ | ||
MAC: fakeUplinkMAC, | ||
OFPort: uint32(4), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this was some typo. Before this change, the uplink port was using 3
, but these unit tests were using 4
which is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @wenyingd to check impact on Windows
pkg/agent/config/node_config.go
Outdated
// that once a port number has been assigned, it will stay the same throughout the lifetime | ||
// of the port, but when using ofport_request it is not guaranteed. | ||
// See https://github.com/antrea-io/antrea/issues/6192 for more details. | ||
FirstControllerOVSPort = 32768 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, moving non business specific ofports to it makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use these non-conflict port to request the control port, do we need the logic to allocate a port first and then use it in the functions? e.g., https://github.com/antrea-io/antrea/blob/main/pkg/agent/agent_linux.go#L117
BTW, do we need to customize a new reserve port for host interface, which is now used in AntreaIPAM, and it may be used on Windows in future to replace the bridge interface?
Another question is for Windows upgrade case in which OVS is running as native service. If antrea already configures the OVS ports using the original values, after a upgrade, the of port number in memory is changed, the Openflow is supposed to use the new port number; but for existing OVS ports, we don't have the logic to reset the number in this pr, it may cause conflict in the OVS port and OpenFlow entries.
@wenyingd these are good points. I need to improve the code so that we handle all ports like the tunnel port: if it already exists, keep using the existing port with no changes. If the port needs to be created, use the new ofport_request values.
I see what you mean now. But can you remind me why this is not |
cbe67e1
to
b35d85d
Compare
} else { | ||
// Antrea Interface type is not saved in OVS port external_ids in earlier Antrea versions, so we use | ||
// the old way to decide the interface type for the upgrade case. | ||
uplinkIfName := i.nodeConfig.UplinkNetConfig.Name | ||
var antreaIFType string | ||
switch { | ||
case port.OFPort == config.HostGatewayOFPort: | ||
intf = parseGatewayInterfaceFunc(port, ovsPort) | ||
antreaIFType = interfacestore.AntreaGateway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change and the v2 release, I think it makes sense to drop support for this legacy scenario. The interface type externalID was introduced by @wenyingd starting with Antrea v1.5 (#3027).
With the change to the default value of HostGatewayOFPort
, this switch case (port.OFPort == config.HostGatewayOFPort
) could no longer be supported.
cc @tnqn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me
// WithRequiredPortExternalIDs will ensure that whenever we create a port, the required | ||
// external ID (interface type) is provided. This is a sanity check to ensure code | ||
// correctness. | ||
ovsBridgeClient := ovsconfig.NewOVSBridge(o.config.OVSBridge, ovsDatapathType, ovsdbConnection, ovsconfig.WithRequiredPortExternalIDs(interfacestore.AntreaInterfaceTypeKey)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bringing reviewers' attention to this. I added WithRequiredPortExternalIDs
to guarantee that moving forward we will indeed always have "antrea-type" as an ExternalID for all ports. Right now we don't have any guarantee about this in the code, so we need to rely on code reviews. This could make it easier to catch errors in e2e tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, maybe add the info to PR description and commit message that the new ofPorts will only be used when newly creating ports, we should expect the legacy ofPorts for an existing cluster.
} else { | ||
// Antrea Interface type is not saved in OVS port external_ids in earlier Antrea versions, so we use | ||
// the old way to decide the interface type for the upgrade case. | ||
uplinkIfName := i.nodeConfig.UplinkNetConfig.Name | ||
var antreaIFType string | ||
switch { | ||
case port.OFPort == config.HostGatewayOFPort: | ||
intf = parseGatewayInterfaceFunc(port, ovsPort) | ||
antreaIFType = interfacestore.AntreaGateway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me
Thanks @tnqn. I will squash the commits and update the message after I get a review from @wenyingd. I want to make sure there is nothing I have missed. In the mean time, I will run some Jenkins e2e tests. |
/test-all |
With AntreaIPAM and VM Agent scenario, we now create a different OVS port to take over the role of the uplink, so the latest implementation is to rename the uplink as This mode is also planned on Windows container case, @XinShuYang has a patch working on it. |
i.nodeConfig.UplinkNetConfig.OFPort = uint32(ofport) | ||
i.nodeConfig.HostInterfaceOFPort = config.BridgeOFPort | ||
i.nodeConfig.HostInterfaceOFPort = ovsconfig.BridgeOFPort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, better to add compare between the existing OF port and the desired static value on uplink port. I would prefer to also add check for uplink on Linux, but it is not strong.
/test-vm-e2e |
4edd12c
to
be0ebab
Compare
After #6216 is merged, I will rebase, squash commits, and run tests again. |
/test-flexible-ipam-e2e |
pkg/agent/agent.go
Outdated
intf.InterfaceName != i.nodeConfig.DefaultTunName { | ||
klog.Infof("The discovered default tunnel interface name %s is different from the default value: %s", | ||
intf.InterfaceName, i.nodeConfig.DefaultTunName) | ||
if intf != nil && intf.InterfaceName != i.nodeConfig.DefaultTunName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect because we also call this function on the ipsec tunnel port, only compare on the interface name may impact on the ipsec case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the problem is that the function was using the port number as the condition for that, which is misleading and even potentially invalid. I will update that to use the interface type instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out, I fixed the logic
be0ebab
to
fdfc3d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
fdfc3d3
to
2ffad56
Compare
/test-all |
2ffad56
to
0733d0c
Compare
/test-all |
/test-windows-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-windows-all |
Windows CI failed due to
|
Thanks @XinShuYang, I was able to reproduce locally:
I will push a new commit to address this. Do you know if we upload the agent logs as artifacts for the Windows CI jobs? I couldn't find them. |
Tunnel port, gateway port and uplink port. These reserved port numbers (starting from 32,768) will never be auto-assigned by OVS. By using them, we can avoid surprising ofport changes in some edge cases. Note that if these ports already exist when the Agent starts, we will not re-create them or mutate their ofport. The new ofport values will only be used when creating the OVS ports for the first time (e.g., on a new K8s Node, or after a Node restart). That is the simplest solution and is consistent with existing behavior. It should also be sufficient for avoiding antrea-io#6192, which was typically triggered by creating a new tunnel port when updating the Antrea configuration in an existing cluster. After this change, we will also assume that br-int ports stored in OVSDB always have the "antrea-type" external ID. This external ID became required in Antrea v1.5, so this should be an acceptable change, especially considering that this change is destined for the Antrea v2 release, which is a major version bump. Users will not be able to upgrade from Antrea v1.5 to Antrea v2.0, but that this is not a supported upgrade path anyway (for other reasons). We also add some logic to prevent creating a port on the OVS bridge if a required external ID is missing. Fixes antrea-io#6192 Signed-off-by: Antonin Bas <[email protected]>
Signed-off-by: Antonin Bas <[email protected]>
0733d0c
to
c583b81
Compare
/test-windows-all |
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-flexible-ipam-e2e |
|
This is a pretty significant change and it will not be back-ported for several reasons. |
Use reserved OVS controller ports for default Antrea ports
Tunnel port, gateway port and uplink port.
These reserved port numbers (starting from 32,768) will never be
auto-assigned by OVS. By using them, we can avoid surprising ofport
changes in some edge cases.
Note that if these ports already exist when the Agent starts, we will
not re-create them or mutate their ofport. The new ofport values will
only be used when creating the OVS ports for the first time (e.g., on a
new K8s Node, or after a Node restart). That is the simplest solution
and is consistent with existing behavior. It should also be sufficient
for avoiding #6192, which was typically triggered by creating a new
tunnel port when updating the Antrea configuration in an existing
cluster.
After this change, we will also assume that br-int ports stored in OVSDB
always have the "antrea-type" external ID. This external ID became
required in Antrea v1.5, so this should be an acceptable change,
especially considering that this change is destined for the Antrea v2
release, which is a major version bump. Users will not be able to
upgrade from Antrea v1.5 to Antrea v2.0, but that this is not a
supported upgrade path anyway (for other reasons).
We also add some logic to prevent creating a port on the OVS bridge if a
required external ID is missing.
Fixes #6192