Skip to content

Commit

Permalink
Ensure full functionality of AntreaProxy with proxyAll enabled when k…
Browse files Browse the repository at this point in the history
…ube-proxy presents

To ensure full functionality of AntreaProxy, except for handling ClusterIP from Nodes,
even when kube-proxy in iptables mode is present, certain key changes are implemented
when proxyAll is enabled:

The jump rules for the chains managed by Antrea, `ANTREA-PREROUTING` and `ANTREA-OUTPUT`
in nat table, are installed by inserting instead of appending to bypass the chain
`KUBE-SERVICES` performing Service DNAT managed by kube-proxy. Antrea ensures that
the jump rules take precedence over those managed by kube-proxy.

The iptables rules of nat table chain `ANTREA-PREROUTING` are like below, and they are
similar in chain `ANTREA-OUTPUT`.

```
-A ANTREA-PREROUTING -m comment --comment "Antrea: DNAT external to NodePort packets" -m set --match-set ANTREA-NODEPORT-IP dst,dst -j DNAT --to-destination 169.254.0.252
```

The rule is to DNAT NodePort traffic, bypassing chain `KUBE-SERVICES`.

The iptables rules of raw table chains ANTREA-PREROUTING / ANTREA-OUTPUT are like
below:

```
1. -A ANTREA-PREROUTING -m comment --comment "Antrea: do not track incoming encapsulation packets" -m udp -p udp --dport 6081 -m addrtype --dst-type LOCAL -j NOTRACK
2. -A ANTREA-PREROUTING -m comment --comment "Antrea: drop Pod multicast traffic forwarded via underlay network" -m set --match-set CLUSTER-NODE-IP src -d 224.0.0.0/4 -j DROP
3. -A ANTREA-PREROUTING -m comment --comment "Antrea: do not track external to external IP request packets" -m set --match-set ANTREA-EXTERNAL-IP dst -j NOTRACK
4. -A ANTREA-PREROUTING -m comment --comment "Antrea: do not track external to external IP reply packets" -m set --match-set ANTREA-EXTERNAL-IP src -j NOTRACK
5. -A ANTREA-OUTPUT -m comment --comment "Antrea: do not track local to external IP request packets" -m set --match-set ANTREA-EXTERNAL-IP dst -j NOTRACK
6. -A ANTREA-OUTPUT -m comment --comment "Antrea: do not track local to external IP reply packets" -m set --match-set ANTREA-EXTERNAL-IP src -j NOTRACK
```

- Rules 1-2 are not new rules.
- Rules 3-4 are to bypass conntrack and for LoadBalancerIP/externalIP traffic sourced
  from external network, which also results in bypassing the chains managed by Antrea
  Proxy and kube-proxy in nat table.
- Rules 5-6 are to bypass conntrack for LoadBalancerIP/externalIP traffic sourced
  from local Node, which also results in bypassing the chains managed by Antrea
  Proxy and kube-proxy in nat table.

The following are the benchmark results of a LoadBalancer Service configured with DSR mode.
The results of TCP_STREAM and TCP_RR (single TCP connection) are almost the same as that
before. The result of TCP_CRR (multiple TCP connections) performs better than before. One
reason should be that conntrack is skipped for LoadBalancer Services.

```
Test           v2.0 proxyAll     Dev proxyAll    Delta
TCP_STREAM     4933.97           4918.35         -0.32%
TCP_RR         8095.49           8032.4         -0.78%
TCP_CRR        1645.66           1888.93         +14.79%
```

Signed-off-by: Hongliang Liu <[email protected]>
  • Loading branch information
hongliangl committed Jul 9, 2024
1 parent 0d6f201 commit 1041cd6
Show file tree
Hide file tree
Showing 14 changed files with 731 additions and 421 deletions.
1 change: 1 addition & 0 deletions .github/workflows/kind.yml
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ jobs:
--coverage \
--encap-mode encap \
--proxy-all \
--no-kube-proxy \
--feature-gates LoadBalancerModeDSR=true \
--load-balancer-mode dsr \
--node-ipam
Expand Down
10 changes: 8 additions & 2 deletions ci/kind/test-e2e-kind.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ _usage="Usage: $0 [--encap-mode <mode>] [--ip-family <v4|v6|dual>] [--coverage]
--feature-gates A comma-separated list of key=value pairs that describe feature gates, e.g. AntreaProxy=true,Egress=false.
--run Run only tests matching the regexp.
--proxy-all Enables Antrea proxy with all Service support.
--no-kube-proxy Don't deploy kube-proxy.
--load-balancer-mode LoadBalancer mode.
--node-ipam Enables Antrea NodeIPAM.
--multicast Enables Multicast.
Expand Down Expand Up @@ -72,6 +73,7 @@ mode=""
ipfamily="v4"
feature_gates=""
proxy_all=false
no_kube_proxy=false
load_balancer_mode=""
node_ipam=false
multicast=false
Expand Down Expand Up @@ -106,6 +108,10 @@ case $key in
proxy_all=true
shift
;;
--no-kube-proxy)
no_kube_proxy=true
shift
;;
--load-balancer-mode)
load_balancer_mode="$2"
shift 2
Expand Down Expand Up @@ -299,7 +305,7 @@ function setup_cluster {
echoerr "invalid value for --ip-family \"$ipfamily\", expected \"v4\" or \"v6\""
exit 1
fi
if $proxy_all; then
if $no_kube_proxy; then
args="$args --no-kube-proxy"
fi
if $node_ipam; then
Expand Down Expand Up @@ -353,7 +359,7 @@ function run_test {
cat $CH_OPERATOR_YML | docker exec -i kind-control-plane dd of=/root/clickhouse-operator-install-bundle.yml
fi

if $proxy_all; then
if $no_kube_proxy; then
apiserver=$(docker exec -i kind-control-plane kubectl get endpoints kubernetes --no-headers | awk '{print $2}')
if $coverage; then
docker exec -i kind-control-plane sed -i.bak -E "s/^[[:space:]]*[#]?kubeAPIServerOverride[[:space:]]*:[[:space:]]*[a-z\"]+[[:space:]]*$/ kubeAPIServerOverride: \"$apiserver\"/" /root/antrea-coverage.yml /root/antrea-ipsec-coverage.yml
Expand Down
22 changes: 15 additions & 7 deletions docs/antrea-proxy.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,21 @@ the introduction of `proxyAll`, Antrea relied on userspace kube-proxy, which is
no longer actively maintained by the K8s community and is slower than other
kube-proxy backends.

Note that on Linux, even when `proxyAll` is enabled, kube-proxy will usually
take priority and will keep handling NodePort Service traffic (unless the source
is a Pod, which is pretty unusual as Pods typically access Services by
ClusterIP). This is because kube-proxy rules typically come before the rules
installed by AntreaProxy to redirect traffic to OVS. When kube-proxy is not
deployed or is removed from the cluster, AntreaProxy will then handle all
Service traffic.
Note that on Linux, before Antrea v2.1, when `proxyAll` is enabled, kube-proxy
will usually take priority over AntreaProxy and will keep handling all kinds of
Service traffic (unless the source is a Pod, which is pretty unusual as Pods
typically access Services by ClusterIP). This is because kube-proxy rules typically
come before the rules installed by AntreaProxy to redirect traffic to OVS. When
kube-proxy is not deployed or is removed from the cluster, AntreaProxy will then
handle all Service traffic.

Starting with Antrea v2.1, when only `proxyAll` is enabled, AntreaProxy is capable
of handling all kinds of Service traffic except ClusterIP, even if kube-proxy in
iptables mode is present. This is accomplished by prioritizing the rules installed
by AntreaProxy redirecting Service traffic to OVS over those installed by kube-proxy.

For kube-proxy in IPVS mode, you can either configure it to use iptables mode or
remove it.

### Removing kube-proxy

Expand Down
79 changes: 17 additions & 62 deletions pkg/agent/proxy/proxier.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ import (
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/selection"
apimachinerytypes "k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/sets"
coreinformers "k8s.io/client-go/informers/core/v1"
discoveryinformers "k8s.io/client-go/informers/discovery/v1"
clientset "k8s.io/client-go/kubernetes"
Expand Down Expand Up @@ -121,13 +120,6 @@ type proxier struct {
serviceHealthServer healthcheck.ServiceHealthServer
numLocalEndpoints map[apimachinerytypes.NamespacedName]int

// serviceIPRouteReferences tracks the references of Service IP routes. The key is the Service IP and the value is
// the set of ServiceInfo strings. Because a Service could have multiple ports and each port will generate a
// ServicePort (which is the unit of the processing), a Service IP route may be required by several ServicePorts.
// With the references, we install a route exactly once as long as it's used by any ServicePorts and uninstall it
// exactly once when it's no longer used by any ServicePorts.
// It applies to ClusterIP and LoadBalancerIP.
serviceIPRouteReferences map[string]sets.Set[string]
// syncedOnce returns true if the proxier has synced rules at least once.
syncedOnce bool
syncedOnceMutex sync.RWMutex
Expand Down Expand Up @@ -569,10 +561,10 @@ func (p *proxier) installNodePortService(localGroupID, clusterGroupID binding.Gr
IsNested: false, // Unsupported for NodePort
IsDSR: false, // Unsupported because external traffic has been DNAT'd in host network before it's forwarded to OVS.
}); err != nil {
return fmt.Errorf("failed to install NodePort load balancing flows: %w", err)
return fmt.Errorf("failed to install NodePort load balancing OVS flows: %w", err)
}
if err := p.routeClient.AddNodePort(p.nodePortAddresses, svcPort, protocol); err != nil {
return fmt.Errorf("failed to install NodePort traffic redirecting rules: %w", err)
if err := p.routeClient.AddNodePortConfigs(p.nodePortAddresses, svcPort, protocol); err != nil {
return fmt.Errorf("failed to install NodePort traffic redirecting routing configurations: %w", err)
}
return nil
}
Expand All @@ -588,8 +580,8 @@ func (p *proxier) uninstallNodePortService(svcPort uint16, protocol binding.Prot
if err := p.ofClient.UninstallServiceFlows(svcIP, svcPort, protocol); err != nil {
return fmt.Errorf("failed to remove NodePort load balancing flows: %w", err)
}
if err := p.routeClient.DeleteNodePort(p.nodePortAddresses, svcPort, protocol); err != nil {
return fmt.Errorf("failed to remove NodePort traffic redirecting rules: %w", err)
if err := p.routeClient.DeleteNodePortConfigs(p.nodePortAddresses, svcPort, protocol); err != nil {
return fmt.Errorf("failed to remove NodePort traffic redirecting routing configurations: %w", err)
}
return nil
}
Expand Down Expand Up @@ -618,10 +610,10 @@ func (p *proxier) installExternalIPService(svcInfoStr string,
IsNested: false, // Unsupported for ExternalIP
IsDSR: features.DefaultFeatureGate.Enabled(features.LoadBalancerModeDSR) && loadBalancerMode == agentconfig.LoadBalancerModeDSR,
}); err != nil {
return fmt.Errorf("failed to install ExternalIP load balancing flows: %w", err)
return fmt.Errorf("failed to install ExternalIP load balancing OVS flows: %w", err)
}
if err := p.addRouteForServiceIP(svcInfoStr, ip, p.routeClient.AddExternalIPRoute); err != nil {
return fmt.Errorf("failed to install ExternalIP traffic redirecting routes: %w", err)
if err := p.routeClient.AddExternalIPConfigs(svcInfoStr, ip); err != nil {
return fmt.Errorf("failed to install ExternalIP load balancing routing configurations: %w", err)
}
}
return nil
Expand All @@ -631,10 +623,10 @@ func (p *proxier) uninstallExternalIPService(svcInfoStr string, externalIPString
for _, externalIP := range externalIPStrings {
ip := net.ParseIP(externalIP)
if err := p.ofClient.UninstallServiceFlows(ip, svcPort, protocol); err != nil {
return fmt.Errorf("failed to remove ExternalIP load balancing flows: %w", err)
return fmt.Errorf("failed to remove ExternalIP load balancing OVS flows: %w", err)
}
if err := p.deleteRouteForServiceIP(svcInfoStr, ip, p.routeClient.DeleteExternalIPRoute); err != nil {
return fmt.Errorf("failed to remove ExternalIP traffic redirecting routes: %w", err)
if err := p.routeClient.DeleteExternalIPConfigs(svcInfoStr, ip); err != nil {
return fmt.Errorf("failed to remove ExternalIP traffic redirecting routing configurations: %w", err)
}
}
return nil
Expand Down Expand Up @@ -665,71 +657,35 @@ func (p *proxier) installLoadBalancerService(svcInfoStr string,
IsNested: false, // Unsupported for LoadBalancerIP
IsDSR: features.DefaultFeatureGate.Enabled(features.LoadBalancerModeDSR) && loadBalancerMode == agentconfig.LoadBalancerModeDSR,
}); err != nil {
return fmt.Errorf("failed to install LoadBalancer load balancing flows: %w", err)
return fmt.Errorf("failed to install LoadBalancerIP load balancing OVS flows: %w", err)
}
if p.proxyAll {
if err := p.addRouteForServiceIP(svcInfoStr, ip, p.routeClient.AddExternalIPRoute); err != nil {
return fmt.Errorf("failed to install LoadBalancer traffic redirecting routes: %w", err)
if err := p.routeClient.AddExternalIPConfigs(svcInfoStr, ip); err != nil {
return fmt.Errorf("failed to install LoadBalancerIP traffic redirecting routing configurations: %w", err)
}
}
}
}
return nil
}

func (p *proxier) addRouteForServiceIP(svcInfoStr string, ip net.IP, addRouteFn func(net.IP) error) error {
ipStr := ip.String()
references, exists := p.serviceIPRouteReferences[ipStr]
// If the IP was not referenced by any Service port, install a route for it.
// Otherwise, just reference it.
if !exists {
if err := addRouteFn(ip); err != nil {
return err
}
references = sets.New[string](svcInfoStr)
p.serviceIPRouteReferences[ipStr] = references
} else {
references.Insert(svcInfoStr)
}
return nil
}

func (p *proxier) uninstallLoadBalancerService(svcInfoStr string, loadBalancerIPStrings []string, svcPort uint16, protocol binding.Protocol) error {
for _, ingress := range loadBalancerIPStrings {
if ingress != "" {
ip := net.ParseIP(ingress)
if err := p.ofClient.UninstallServiceFlows(ip, svcPort, protocol); err != nil {
return fmt.Errorf("failed to remove LoadBalancer load balancing flows: %w", err)
return fmt.Errorf("failed to remove LoadBalancerIP load balancing OVS flows: %w", err)
}
if p.proxyAll {
if err := p.deleteRouteForServiceIP(svcInfoStr, ip, p.routeClient.DeleteExternalIPRoute); err != nil {
return fmt.Errorf("failed to remove LoadBalancer traffic redirecting routes: %w", err)
if err := p.routeClient.DeleteExternalIPConfigs(svcInfoStr, ip); err != nil {
return fmt.Errorf("failed to remove LoadBalancerIP traffic redirecting routing configurations: %w", err)
}
}
}
}
return nil
}

func (p *proxier) deleteRouteForServiceIP(svcInfoStr string, ip net.IP, deleteRouteFn func(net.IP) error) error {
ipStr := ip.String()
references, exists := p.serviceIPRouteReferences[ipStr]
// If the IP was not referenced by this Service port, skip it.
if exists && references.Has(svcInfoStr) {
// Delete the IP only if this Service port is the last one referencing it.
// Otherwise, just dereference it.
if references.Len() == 1 {
if err := deleteRouteFn(ip); err != nil {
return err
}
delete(p.serviceIPRouteReferences, ipStr)
} else {
references.Delete(svcInfoStr)
}
}
return nil
}

func (p *proxier) installServices() {
for svcPortName, svcPort := range p.serviceMap {
svcInfo := svcPort.(*types.ServiceInfo)
Expand Down Expand Up @@ -1454,7 +1410,6 @@ func newProxier(
endpointsInstalledMap: types.EndpointsMap{},
endpointsMap: types.EndpointsMap{},
endpointReferenceCounter: map[string]int{},
serviceIPRouteReferences: map[string]sets.Set[string]{},
nodeLabels: map[string]string{},
serviceStringMap: map[string]k8sproxy.ServicePortName{},
groupCounter: groupCounter,
Expand Down
Loading

0 comments on commit 1041cd6

Please sign in to comment.