Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

network: Use tc filtering rules in bridge mode #827

Merged
merged 4 commits into from
Oct 24, 2018

Conversation

amshinde
Copy link
Member

Instead of creating a bridge, use tc filter rules to redirect
traffic between veth and tap interface.

Signed-off-by: Archana Shinde [email protected]

@amshinde
Copy link
Member Author

/test

Copy link

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amshinde maybe a dumb question, but why did you replace the bridge mode, instead of creating a new internetworking model?

return nil
}

func addQdiscIngress(index int) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if index < 0? Does netlink.QdiscAdd() handle that? Is it a valid index? (might be given QdiscAttrs.Index is an int wheres all other members are uint32?!?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel netlink.QdiscAdd will handle the case where the network index is negative.
It will error out if it cannot find an interface with the specified index.

return nil
}

func addRedirectTCFilter(sourceIndex, destIndex int) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question - what if either of these two params are negative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@amshinde amshinde force-pushed the tc-filtering branch 2 times, most recently from a735361 to 2e26a2b Compare October 17, 2018 22:24
@amshinde
Copy link
Member Author

@sboeuf I was planning to replace the bridge mode initially. I pushed the changes for testing.
I have now introduced a new mode. I plan to do some testing for different plugins out there, plan is to then deprecate bridged mode in a future PR.

@amshinde
Copy link
Member Author

cc @mcastelino PTAL.

@amshinde amshinde changed the title wip: network: Use tc filtering rules in bridge mode network: Use tc filtering rules in bridge mode Oct 17, 2018
Copy link
Contributor

@mcastelino mcastelino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment. We need to undo the qdisk we added on ingress on the CNI/CNM created interface.

@@ -810,6 +934,40 @@ func unBridgeNetworkPair(endpoint Endpoint) error {
return nil
}

func tcRemoveNetwork(endpoint Endpoint) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amshinde you need to remove the ingress qdisc you added to the veth/incoming interface. As part of the cleanup you should remove that too.

So you need to undo what do you did before

	if err := addQdiscIngress(attrs.Index); err != nil {
		return err
	}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcastelino Have addressed this. PTAL.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename it removeTCFiltering()

@mcastelino
Copy link
Contributor

@amshinde
Copy link
Member Author

@bergwolf Can you take a look as well, I think you guys are using tc in runv?

@amshinde
Copy link
Member Author

/retest

return err
}

if err := removeRedirectTCFilter(link); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amshinde you should not need to do this. Removing the qdisk removes the filters. But does not hurt.

Copy link
Contributor

@mcastelino mcastelino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amshinde
Copy link
Member Author

/retest

@codecov
Copy link

codecov bot commented Oct 19, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@5a8b738). Click here to learn what that means.
The diff coverage is 51.2%.

@@            Coverage Diff            @@
##             master     #827   +/-   ##
=========================================
  Coverage          ?   66.09%           
=========================================
  Files             ?       87           
  Lines             ?    10705           
  Branches          ?        0           
=========================================
  Hits              ?     7076           
  Misses            ?     2897           
  Partials          ?      732

@amshinde
Copy link
Member Author

@grahamwhaley Is the metrics failure here a valid failure?

@grahamwhaley
Copy link
Contributor

@amshinde probably not - we had a bunch of fails for the last 36h - I have tweaked the settings as per XXX to avoid false failures whilst I look at how to get things more stable.
The logs for your fail look like:

02:47:48 Report Summary:
02:47:48 +-----+----------------------+-------+--------+--------+-------+--------+--------+-------+------+-----+
02:47:48 | P/F |         NAME         |  FLR  |  MEAN  |  CEIL  |  GAP  |  MIN   |  MAX   |  RNG  | COV  | ITS |
02:47:48 +-----+----------------------+-------+--------+--------+-------+--------+--------+-------+------+-----+
02:47:48 | P   | boot-times           | 90.0% | 91.5%  | 110.0% | 20.0% | 88.7%  | 100.6% | 13.4% | 2.7% |  20 |
02:47:48 | *F* | memory-footprint     | 92.3% | 113.8% | 107.7% | 15.4% | 113.8% | 113.8% | 0.0%  | 0.0% |   1 |
02:47:48 | P   | memory-footprint-ksm | 89.3% | 92.4%  | 110.7% | 21.4% | 92.4%  | 92.4%  | 0.0%  | 0.0% |   1 |
02:47:48 +-----+----------------------+-------+--------+--------+-------+--------+--------+-------+------+-----+

but, as we discussed, you are better off running the network cpu test and/or other network tests locally to check if you have made any network impact. I'm half way through adding that test to the report gen btw.

@grahamwhaley
Copy link
Contributor

s/XXX/ kata-containers/ci#72 /

Copy link

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amshinde This PR looks very good to me, just a few comments, but mostly related to the fact that you should add more comments to make easy the understanding of this code ;)

@@ -459,6 +467,12 @@ func xconnectVMNetwork(endpoint Endpoint, connect bool, numCPUs uint32, disableV
return tapNetworkPair(endpoint, numCPUs, disableVhostNet)
}
return untapNetworkPair(endpoint)
case NetXConnectTCFilterModel:
netPair.NetInterworkingModel = NetXConnectTCFilterModel
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. The switch already checks this variable, there's no reason to reassign it.

@@ -723,6 +737,165 @@ func bridgeNetworkPair(endpoint Endpoint, numCPUs uint32, disableVhostNet bool)
return nil
}

func tcRedirectNetwork(endpoint Endpoint, numCPUs uint32, disableVhostNet bool) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment about naming here. I think it'd be clearer to call it setupTCFiltering().

@@ -810,6 +934,40 @@ func unBridgeNetworkPair(endpoint Endpoint) error {
return nil
}

func tcRemoveNetwork(endpoint Endpoint) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename it removeTCFiltering()

@@ -75,6 +80,9 @@ func (n *NetInterworkingModel) SetModel(modelName string) error {
case "enlightened":
*n = NetXConnectEnlightenedModel
return nil
case "tcfilter":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not only related to this PR, but we should define a constant for this. We don't want to play around with strings like this.

return nil
}

func addQdiscIngress(index int) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment about this function role would be nice (along those lines):

// addQdiscIngress creates a traffic rule for any incoming traffic to the interface
// designated by its index.

return nil
}

func addRedirectTCFilter(sourceIndex, destIndex int) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, a comment could be nice ;)


for _, qdisc := range qdiscs {
ingress, ok := qdisc.(*netlink.Ingress)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this blank line

@@ -230,6 +230,13 @@ path = "@NETMONPATH@"
# - macvtap
# Used when the Container network interface can be bridged using
# macvtap.
#
# - tcfilter
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we measured the perf impact before we take this decision?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sboeuf I ran some iperf tests to compare macvtap, bridge and tcfilter solutions on an Intel nuc.
With macvtap I got around 15Gbs throughtput, tcfiltering giving 13.5 Gbs while bridge gave around 12.9 to 13.5 Gbs.
So we get slightly better performance with tcfiltering as compared to bridge mode, but less performance as compared to macvtap. But I think, given that we get added functionality in terms of support for ipvlan, we can make this as the default. In case ipvlan support is not needed, users can choose to use macvtap instead.
WDYT?

Copy link

@sboeuf sboeuf Oct 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's cool, thanks for running those tests :)
I don't have any strong opinion on this, but we need the community feedback here. I think we need to decide if we want the default to be the most performant or the most functional.

/cc @bergwolf @jodh-intel @egernst @mcastelino @WeiZhang555 @grahamwhaley @kata-containers/runtime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amshinde have you measured if this impacts launch time latency? As this is simpler, we may be a bit faster to launch a kata-container with this. If @egernst can let us know how this performs when pushing 40Gbps, that may give us a better data point.

/cc @grahamwhaley

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcastelino I did see the latency was about the same, but did see some improvements in jitter, but this was over a tiny data set. I'll rerun the tests few more times for latency and post my results.

@raravena80
Copy link
Member

@amshinde ping :)

@amshinde
Copy link
Member Author

@sboeuf I have addressed all your comments. PTAL.

@sboeuf
Copy link

sboeuf commented Oct 23, 2018

/test

@sboeuf
Copy link

sboeuf commented Oct 23, 2018

@amshinde thanks!
LGTM

@amshinde
Copy link
Member Author

/retest

@amshinde
Copy link
Member Author

@sboeuf As discussed, I have dropped the commit to make this mode as the default. Will open another PR to address that.

@sboeuf
Copy link

sboeuf commented Oct 23, 2018

@amshinde I'll merge as soon as we get the CI all green!

@amshinde
Copy link
Member Author

@sboeuf Can we merge this?

@amshinde
Copy link
Member Author

@jodh-intel Can you take another look?

@amshinde
Copy link
Member Author

/retest

@amshinde
Copy link
Member Author

Retriggering CI for code coverage.

@jodh-intel
Copy link
Contributor

jodh-intel commented Oct 24, 2018

lgtm

Although you have conflicts...

Approved with PullApprove

Introduce a new mode that uses tc filters to redirect traffic from
the network interface created by the network plugin to a
tap interface that we connect to the VM.
This mode will help support ipvlan as well.

Fixes kata-containers#144

Signed-off-by: Archana Shinde <[email protected]>
The test verifies tc filter setup by creating a test veth interface.

Signed-off-by: Archana Shinde <[email protected]>
Introduce constants for the network model strings, so as to
avoid using the strings directly at multiple places.

Signed-off-by: Archana Shinde <[email protected]>
Document this mode for users to be able to use it.

Signed-off-by: Archana Shinde <[email protected]>
@amshinde
Copy link
Member Author

/retest

@sboeuf sboeuf merged commit c7a9e45 into kata-containers:master Oct 24, 2018
@mcastelino
Copy link
Contributor

@amshinde a long time ago I added a mode called enlighted interworking mode. The goal of that was to use it after we added tc support.

The way I saw it implemented was

  1. If the interface is something macvtap can handle use macvtap mode
  2. If we know macvtap cannot handle it (or we are not familiar with) use the tc mode.

That way the user does not make the call. We do.

@amshinde amshinde deleted the tc-filtering branch July 11, 2019 22:27
egernst pushed a commit to egernst/runtime that referenced this pull request Feb 9, 2021
A PR now needs *two* labels to be applied before it can be merged. One
label must be a backport label from the list below and the other a
forward port label:

- backport labels: `needs-backport`, `no-backport-needed`, `backport`.
- forward-port labels: `needs-forward-port`, `no-forward-port-needed`, `forward-port`.

This is to make the maintainer think carefully before merging a PR and
hopefully maximise efficient porting.

Related: kata-containers/kata-containers#634

Fixes: kata-containers#827.

Signed-off-by: James O. D. Hunt <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants