-
Notifications
You must be signed in to change notification settings - Fork 374
network: Use tc filtering rules in bridge mode #827
Conversation
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amshinde maybe a dumb question, but why did you replace the bridge mode, instead of creating a new internetworking model?
return nil | ||
} | ||
|
||
func addQdiscIngress(index int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if index < 0
? Does netlink.QdiscAdd()
handle that? Is it a valid index? (might be given QdiscAttrs.Index
is an int
wheres all other members are uint32
?!?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jodh-intel netlink.QdiscAdd will handle the case where the network index is negative.
It will error out if it cannot find an interface with the specified index.
return nil | ||
} | ||
|
||
func addRedirectTCFilter(sourceIndex, destIndex int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question - what if either of these two params are negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
a735361
to
2e26a2b
Compare
@sboeuf I was planning to replace the bridge mode initially. I pushed the changes for testing. |
cc @mcastelino PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment. We need to undo the qdisk we added on ingress on the CNI/CNM created interface.
virtcontainers/network.go
Outdated
@@ -810,6 +934,40 @@ func unBridgeNetworkPair(endpoint Endpoint) error { | |||
return nil | |||
} | |||
|
|||
func tcRemoveNetwork(endpoint Endpoint) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amshinde you need to remove the ingress qdisc you added to the veth/incoming interface. As part of the cleanup you should remove that too.
So you need to undo what do you did before
if err := addQdiscIngress(attrs.Index); err != nil {
return err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcastelino Have addressed this. PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename it removeTCFiltering()
Some more context for this review |
@bergwolf Can you take a look as well, I think you guys are using |
2e26a2b
to
7bd8355
Compare
/retest |
return err | ||
} | ||
|
||
if err := removeRedirectTCFilter(link); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amshinde you should not need to do this. Removing the qdisk removes the filters. But does not hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
7bd8355
to
658af9a
Compare
/retest |
Codecov Report
@@ Coverage Diff @@
## master #827 +/- ##
=========================================
Coverage ? 66.09%
=========================================
Files ? 87
Lines ? 10705
Branches ? 0
=========================================
Hits ? 7076
Misses ? 2897
Partials ? 732 |
@grahamwhaley Is the metrics failure here a valid failure? |
@amshinde probably not - we had a bunch of fails for the last 36h - I have tweaked the settings as per XXX to avoid false failures whilst I look at how to get things more stable.
but, as we discussed, you are better off running the network cpu test and/or other network tests locally to check if you have made any network impact. I'm half way through adding that test to the report gen btw. |
s/XXX/ kata-containers/ci#72 / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amshinde This PR looks very good to me, just a few comments, but mostly related to the fact that you should add more comments to make easy the understanding of this code ;)
virtcontainers/network.go
Outdated
@@ -459,6 +467,12 @@ func xconnectVMNetwork(endpoint Endpoint, connect bool, numCPUs uint32, disableV | |||
return tapNetworkPair(endpoint, numCPUs, disableVhostNet) | |||
} | |||
return untapNetworkPair(endpoint) | |||
case NetXConnectTCFilterModel: | |||
netPair.NetInterworkingModel = NetXConnectTCFilterModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed. The switch already checks this variable, there's no reason to reassign it.
virtcontainers/network.go
Outdated
@@ -723,6 +737,165 @@ func bridgeNetworkPair(endpoint Endpoint, numCPUs uint32, disableVhostNet bool) | |||
return nil | |||
} | |||
|
|||
func tcRedirectNetwork(endpoint Endpoint, numCPUs uint32, disableVhostNet bool) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment about naming here. I think it'd be clearer to call it setupTCFiltering()
.
virtcontainers/network.go
Outdated
@@ -810,6 +934,40 @@ func unBridgeNetworkPair(endpoint Endpoint) error { | |||
return nil | |||
} | |||
|
|||
func tcRemoveNetwork(endpoint Endpoint) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename it removeTCFiltering()
virtcontainers/network.go
Outdated
@@ -75,6 +80,9 @@ func (n *NetInterworkingModel) SetModel(modelName string) error { | |||
case "enlightened": | |||
*n = NetXConnectEnlightenedModel | |||
return nil | |||
case "tcfilter": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's not only related to this PR, but we should define a constant for this. We don't want to play around with strings like this.
return nil | ||
} | ||
|
||
func addQdiscIngress(index int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment about this function role would be nice (along those lines):
// addQdiscIngress creates a traffic rule for any incoming traffic to the interface
// designated by its index.
return nil | ||
} | ||
|
||
func addRedirectTCFilter(sourceIndex, destIndex int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, a comment could be nice ;)
virtcontainers/network.go
Outdated
|
||
for _, qdisc := range qdiscs { | ||
ingress, ok := qdisc.(*netlink.Ingress) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove this blank line
@@ -230,6 +230,13 @@ path = "@NETMONPATH@" | |||
# - macvtap | |||
# Used when the Container network interface can be bridged using | |||
# macvtap. | |||
# | |||
# - tcfilter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we measured the perf impact before we take this decision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sboeuf I ran some iperf tests to compare macvtap, bridge and tcfilter solutions on an Intel nuc.
With macvtap I got around 15Gbs throughtput, tcfiltering giving 13.5 Gbs while bridge gave around 12.9 to 13.5 Gbs.
So we get slightly better performance with tcfiltering as compared to bridge mode, but less performance as compared to macvtap. But I think, given that we get added functionality in terms of support for ipvlan, we can make this as the default. In case ipvlan support is not needed, users can choose to use macvtap instead.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's cool, thanks for running those tests :)
I don't have any strong opinion on this, but we need the community feedback here. I think we need to decide if we want the default to be the most performant or the most functional.
/cc @bergwolf @jodh-intel @egernst @mcastelino @WeiZhang555 @grahamwhaley @kata-containers/runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amshinde have you measured if this impacts launch time latency? As this is simpler, we may be a bit faster to launch a kata-container with this. If @egernst can let us know how this performs when pushing 40Gbps, that may give us a better data point.
/cc @grahamwhaley
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcastelino I did see the latency was about the same, but did see some improvements in jitter, but this was over a tiny data set. I'll rerun the tests few more times for latency and post my results.
658af9a
to
71e7fa6
Compare
@amshinde ping :) |
@sboeuf I have addressed all your comments. PTAL. |
/test |
@amshinde thanks! |
8152832
to
20774ea
Compare
/retest |
@sboeuf As discussed, I have dropped the commit to make this mode as the default. Will open another PR to address that. |
@amshinde I'll merge as soon as we get the CI all green! |
@sboeuf Can we merge this? |
@jodh-intel Can you take another look? |
20774ea
to
6e13e69
Compare
/retest |
Retriggering CI for code coverage. |
Introduce a new mode that uses tc filters to redirect traffic from the network interface created by the network plugin to a tap interface that we connect to the VM. This mode will help support ipvlan as well. Fixes kata-containers#144 Signed-off-by: Archana Shinde <[email protected]>
The test verifies tc filter setup by creating a test veth interface. Signed-off-by: Archana Shinde <[email protected]>
Introduce constants for the network model strings, so as to avoid using the strings directly at multiple places. Signed-off-by: Archana Shinde <[email protected]>
Document this mode for users to be able to use it. Signed-off-by: Archana Shinde <[email protected]>
6e13e69
to
c38792e
Compare
/retest |
@amshinde a long time ago I added a mode called enlighted interworking mode. The goal of that was to use it after we added tc support. The way I saw it implemented was
That way the user does not make the call. We do. |
A PR now needs *two* labels to be applied before it can be merged. One label must be a backport label from the list below and the other a forward port label: - backport labels: `needs-backport`, `no-backport-needed`, `backport`. - forward-port labels: `needs-forward-port`, `no-forward-port-needed`, `forward-port`. This is to make the maintainer think carefully before merging a PR and hopefully maximise efficient porting. Related: kata-containers/kata-containers#634 Fixes: kata-containers#827. Signed-off-by: James O. D. Hunt <[email protected]>
Instead of creating a bridge, use tc filter rules to redirect
traffic between veth and tap interface.
Signed-off-by: Archana Shinde [email protected]