Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the advertise-service-cluster-ip-range parameter to summarize the announcement function of the service network segment, reduce the routing entries of the connected network devices, and support a larger BGP network. #920

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
352fa95
Add the advertise-cluster-subnet parameter to summarize the announcem…
Jun 4, 2020
c5a5977
modify img file
Jun 4, 2020
368dee1
Update large-networks03.md
cloudnativer Jun 4, 2020
7b5c657
Update large-networks03.md
cloudnativer Jun 4, 2020
a478274
Update large-networks03.md
cloudnativer Jun 4, 2020
534ef18
Update large-networks03.md
cloudnativer Jun 4, 2020
f4e127d
Update large-networks02.md
cloudnativer Jun 4, 2020
6765295
Update large-networks01.md
cloudnativer Jun 4, 2020
1a7d578
Update large-networks01.md
cloudnativer Jun 4, 2020
6a2d846
changed advertise-cluster-subnet to advertise-service-cluster-ip-range
Jun 12, 2020
15dedb4
Update large-networks01.md
cloudnativer Jun 18, 2020
b1e7cfa
Update large-networks03.md
cloudnativer Jun 18, 2020
c168e2f
Change advertise-cluster-subnet in the document to advertise-service-…
cloudnativer Jun 18, 2020
59220e7
Change advertise-cluster-subnet in the document to advertise-service-…
cloudnativer Jun 18, 2020
5f71bf5
Update large-networks02.md
cloudnativer Jun 19, 2020
f5f3789
Update large-networks02.md
cloudnativer Jun 19, 2020
8e31460
Update large-networks02.md
cloudnativer Jun 19, 2020
71f280f
Update large-networks03.md
cloudnativer Jun 19, 2020
0eb2243
Update large-networks03.md
cloudnativer Jun 19, 2020
908b760
Update large-networks03.md
cloudnativer Jun 19, 2020
5831fa7
Update large-networks03.md
cloudnativer Jun 19, 2020
9977b99
Update large-networks02.md
cloudnativer Jun 19, 2020
ecf8865
Update large-networks04.md
cloudnativer Jun 19, 2020
35aa65c
Update large-networks04.md
cloudnativer Jun 19, 2020
d83b15b
Update large-networks01.md
cloudnativer Jun 19, 2020
281dd47
Update large-networks01.md
cloudnativer Jun 19, 2020
47a5eaa
Update large-networks01.md
cloudnativer Jun 19, 2020
5d28d52
Update large-networks01.md
cloudnativer Jun 19, 2020
0bc2ed4
Update kube-router-daemonset-advertise-cluster-subnet.yaml
cloudnativer Jun 19, 2020
7d42a10
Update large-networks03.md
cloudnativer Jun 19, 2020
963a94b
Update large-networks03.md
cloudnativer Jun 19, 2020
3ef1b0d
Update large-networks03.md
cloudnativer Jun 19, 2020
6d0513f
Update large-networks03.md
cloudnativer Jun 19, 2020
42b3788
Separate code changes from documentation, leaving only code changes.
invalid-email-address Jun 30, 2020
67c6acf
Cut down the length of the description to a single line, and Add long…
invalid-email-address Jul 9, 2020
0bc222e
See large-networks03 documentation for details
cloudnativer Jul 9, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Usage of kube-router:
--advertise-external-ip Add External IP of service to the RIB so that it gets advertised to the BGP peers.
--advertise-loadbalancer-ip Add LoadbBalancer IP of service status as set by the LB provider to the RIB so that it gets advertised to the BGP peers.
--advertise-pod-cidr Add Node's POD cidr to the RIB so that it gets advertised to the BGP peers. (default true)
--advertise-service-cluster-ip-range string Add Cluster IP range of the service to the rib so that it advertises the IP range to BGP peers. (make sure that the "advertise-cluster-ip=true" flag is also set.)
--bgp-graceful-restart Enables the BGP Graceful Restart capability so that routes are preserved on unexpected restarts
--bgp-graceful-restart-deferral-time duration BGP Graceful restart deferral time according to RFC4724 4.1, maximum 18h. (default 6m0s)
--bgp-port uint16 The port open for incoming BGP connections and to use for connecting with other BGP peers. (default 179)
Expand Down Expand Up @@ -151,6 +152,8 @@ It does this by:
To set the default for all services use the `--advertise-cluster-ip`,
`--advertise-external-ip` and `--advertise-loadbalancer-ip` flags.

If you want to advertise the Cluster IP range of the service to BGP peers to reduce the number of routes on the network devices, you can try to set the flags of `advertise-cluster-ip=true` and `advertise-service-cluster-ip-range=ip_range_cidr` at the same time. When this flags is set, Kube-router will add the service cluster IP range set by this parameter to the RIB, and send the routing advertisement of the service cluster IP range to the BGP peer. The purpose of this parameter is to reduce the number of service route entries sent by the Kube-router to the uplink network device. See <a href="large-networks03.md">large-networks03 documentation</a> for details.

To selectively enable or disable this feature per-service use the
`kube-router.io/service.advertise.clusterip`, `kube-router.io/service.advertise.externalip`
and `kube-router.io/service.advertise.loadbalancerip` annotations.
Expand Down
12 changes: 10 additions & 2 deletions pkg/controllers/routing/bgp_policies.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,17 @@ func (nrc *NetworkRoutingController) AddPolicies() error {
// creates prefix set to represent all the advertisable IP associated with the services
advIPPrefixList := make([]config.Prefix, 0)
advIps, _, _ := nrc.getAllVIPs()
for _, ip := range advIps {
advIPPrefixList = append(advIPPrefixList, config.Prefix{IpPrefix: ip + "/32"})

//If the value of advertise-service-cluster-ip-range parameter is not empty, then the value of advertise-service-cluster-ip-range parameter is put into RIB, otherwise it will be done according to the original rules.
Copy link
Member

@murali-reddy murali-reddy Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like both service CIDR and service VIP to be announced only for the services marked with external traffic policy set to local. So that on upstream routers which has /32 route to a VIP is given precedence over route to service cluster IP range.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloudnativer Did you give any thought to above comment? I see below three scenarios

  • if --advertise-service-cluster-ip-range is configured then advertise ONLY the service cluster IP range from the nodes and DO NOT advertise service VIP's from the node (which is what this PR intends to achieve) irrespective of fact service has pod's running on the node and service is marked externalTrafficPolicy=Local.
  • if --advertise-service-cluster-ip-range is configured then advertise the service cluster IP range from the nodes AND advertise /32 service VIP's from the node if node has a endpoint pod corresponding to the service running on the node and service is marked externalTrafficPolicy=Local.
  • if --advertise-service-cluster-ip-range is NOT configured then keep the original behaviour i.e.) advertise /32 VIP from all the nodes if service is not marked with externalTrafficPolicy=Local and advertise service VIP ONLY from the nodes which has a endpoint pod corresponding to the service is running on the node and service is marked with externalTrafficPolicy=Local

problem with #1 is it can blackhole the traffic for services set to externalTrafficPolicy=Local. Traffic to service VIP get's ECMP ed to a node not running any service endpoint pod and proxy running on the node will reject the traffic.

Copy link
Contributor Author

@cloudnativer cloudnativer Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem with #1 is it can blackhole the traffic for services set to externalTrafficPolicy=Local. Traffic to service VIP get's ECMP ed to a node not running any service endpoint pod and proxy running on the node will reject the traffic.

This problem does exist if externalTrafficPolicy=Local is set.


(1) The applicable conditions of advertise-service-cluster-ip-range are as follows:

  • externalTrafficPolicy can only be used when the service is set to loadbalancer or nodeport.
  • If the service is set to loadbalancer or nodeport, you should set externalTrafficPolicy=Cluster at the same time, so that the advertise-service-cluster-ip-range parameter is meaningful.
  • Except for this situation, you can directly set up advertise-service-cluster-ip-range.

(2) In large-scale environment, network traffic load balancing is very important. However, the configuration of externalTrafficPolicy=Local will lead to unbalanced network traffic load, so our production environment uses the default externalTrafficPolicy=Cluster.

if len(nrc.advertiseServiceClusterIpRange) != 0 {
advIPPrefixList = append(advIPPrefixList, config.Prefix{IpPrefix: nrc.advertiseServiceClusterIpRange})
} else {
for _, ip := range advIps {
//housj add
advIPPrefixList = append(advIPPrefixList, config.Prefix{IpPrefix: ip + "/32"})
}
}

clusterIPPrefixSet, err := table.NewPrefixSet(config.PrefixSet{
PrefixSetName: "clusteripprefixset",
PrefixList: advIPPrefixList,
Expand Down
30 changes: 21 additions & 9 deletions pkg/controllers/routing/ecmp_vip.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,22 +23,34 @@ func (nrc *NetworkRoutingController) bgpAdvertiseVIP(vip string) error {
bgp.NewPathAttributeNextHop(nrc.nodeIP.String()),
}

glog.V(2).Infof("Advertising route: '%s/%s via %s' to peers", vip, strconv.Itoa(32), nrc.nodeIP.String())

_, err := nrc.bgpServer.AddPath("", []*table.Path{table.NewPath(nil, bgp.NewIPAddrPrefix(uint8(32),
vip), false, attrs, time.Now(), false)})
//If the value of advertise-service-cluster-ip-range parameter is not empty, then the value of advertise-service-cluster-ip-range parameter is put into RIB, otherwise it will be done according to the original rules.
var svcSubnet = vip
var svcCidrLen = 32
if len(nrc.advertiseServiceClusterIpRange) != 0 {
svcCidrStr := strings.Split(nrc.advertiseServiceClusterIpRange, "/")
svcSubnet = svcCidrStr[0]
svcCidrLen, _ = strconv.Atoi(svcCidrStr[1])
}
glog.V(2).Infof("Advertising route: '%s/%s via %s' to peers", svcSubnet, strconv.Itoa(svcCidrLen), nrc.nodeIP.String())
_, err := nrc.bgpServer.AddPath("", []*table.Path{table.NewPath(nil, bgp.NewIPAddrPrefix(uint8(svcCidrLen),
svcSubnet), false, attrs, time.Now(), false)})

return err
}

// bgpWithdrawVIP unadvertises the service vip
func (nrc *NetworkRoutingController) bgpWithdrawVIP(vip string) error {
glog.V(2).Infof("Withdrawing route: '%s/%s via %s' to peers", vip, strconv.Itoa(32), nrc.nodeIP.String())

pathList := []*table.Path{table.NewPath(nil, bgp.NewIPAddrPrefix(uint8(32),
vip), true, nil, time.Now(), false)}

err := nrc.bgpServer.DeletePath([]byte(nil), 0, "", pathList)
//If the value of the advertise-service-cluster-ip-range parameter is not empty, no operation will be performed, otherwise the original rules will be followed.
var err error
if len(nrc.advertiseServiceClusterIpRange) == 0 {
glog.V(2).Infof("Withdrawing route: '%s/%s via %s' to peers", vip, strconv.Itoa(32), nrc.nodeIP.String())
pathList := []*table.Path{table.NewPath(nil, bgp.NewIPAddrPrefix(uint8(32),
vip), true, nil, time.Now(), false)}
err = nrc.bgpServer.DeletePath([]byte(nil), 0, "", pathList)
} else {
err = nil
}

return err
}
Expand Down
2 changes: 2 additions & 0 deletions pkg/controllers/routing/network_routes_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ type NetworkRoutingController struct {
enablePodEgress bool
hostnameOverride string
advertiseClusterIP bool
advertiseServiceClusterIpRange string
advertiseExternalIP bool
advertiseLoadBalancerIP bool
advertisePodCidr bool
Expand Down Expand Up @@ -885,6 +886,7 @@ func NewNetworkRoutingController(clientset kubernetes.Interface,
nrc.bgpServerStarted = false
nrc.disableSrcDstCheck = kubeRouterConfig.DisableSrcDstCheck
nrc.initSrcDstCheckDone = false
nrc.advertiseServiceClusterIpRange = kubeRouterConfig.AdvertiseServiceClusterIpRange

nrc.hostnameOverride = kubeRouterConfig.HostnameOverride
node, err := utils.GetNodeObject(clientset, nrc.hostnameOverride)
Expand Down
3 changes: 3 additions & 0 deletions pkg/options/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ const DEFAULT_BGP_PORT = 179

type KubeRouterConfig struct {
AdvertiseClusterIp bool
AdvertiseServiceClusterIpRange string
AdvertiseExternalIp bool
AdvertiseNodePodCidr bool
AdvertiseLoadBalancerIp bool
Expand Down Expand Up @@ -118,6 +119,8 @@ func (s *KubeRouterConfig) AddFlags(fs *pflag.FlagSet) {
"The delay between route updates and advertisements (e.g. '5s', '1m', '2h22m'). Must be greater than 0.")
fs.BoolVar(&s.AdvertiseClusterIp, "advertise-cluster-ip", false,
"Add Cluster IP of the service to the RIB so that it gets advertises to the BGP peers.")
fs.StringVar(&s.AdvertiseServiceClusterIpRange, "advertise-service-cluster-ip-range", s.AdvertiseServiceClusterIpRange,
"Add Cluster IP range of the service to the rib so that it advertises the IP range to BGP peers. (make sure that the \"advertise-cluster-ip=true\" flag is also set.)")
fs.BoolVar(&s.AdvertiseExternalIp, "advertise-external-ip", false,
"Add External IP of service to the RIB so that it gets advertised to the BGP peers.")
fs.BoolVar(&s.AdvertiseLoadBalancerIp, "advertise-loadbalancer-ip", false,
Expand Down