Favor EndpointSlice over Endpoints

Document EndpointSlice as the preferred and most appropriate mechanism to record the backing endpoints of a Service. Co-authored-by: Rob Scott <[email protected]> Co-authored-by: Shannon Kularathna <[email protected]>
kubernetes · Sep 14, 2022 · b12d675 · b12d675
1 parent 77b688c
commit b12d675
Show file tree

Hide file tree

Showing 12 changed files with 172 additions and 113 deletions.
diff --git a/content/en/docs/concepts/architecture/cloud-controller.md b/content/en/docs/concepts/architecture/cloud-controller.md
@@ -107,11 +107,11 @@ routes appropriately. It requires Get access to Node objects.
 
 ### Service controller {#authorization-service-controller}
 
-The service controller listens to Service object Create, Update and Delete events and then configures Endpoints for those Services appropriately.
+The service controller listens to Service object Create, Update and Delete events and then configures EndpointSlices (and Endpoints) for those Services appropriately.
 
 To access Services, it requires List, and Watch access. To update Services, it requires Patch and Update access.
 
-To set up Endpoints resources for the Services, it requires access to Create, List, Get, Watch, and Update.
+To set up EndpointSlice / Endpoints resources for the Services, it requires access to Create, List, Get, Watch, and Update.
 
 `v1/Service`:
 

diff --git a/content/en/docs/concepts/overview/components.md b/content/en/docs/concepts/overview/components.md
@@ -53,8 +53,8 @@ Some types of these controllers are:
   * Node controller: Responsible for noticing and responding when nodes go down.
   * Job controller: Watches for Job objects that represent one-off tasks, then creates
     Pods to run those tasks to completion.
-  * Endpoints controller: Populates the Endpoints object (that is, joins Services & Pods).
-  * Service Account & Token controllers: Create default accounts and API access tokens for new namespaces.
+  * EndpointSlice controller: Populates EndpointSlice objects (to provide a link between Services and Pods).
+  * ServiceAccount controller: Create default ServiceAccounts for new namespaces.
 
 ### cloud-controller-manager
 

diff --git a/content/en/docs/concepts/services-networking/connect-applications-service.md b/content/en/docs/concepts/services-networking/connect-applications-service.md
@@ -92,10 +92,14 @@ my-nginx   ClusterIP   10.0.162.149   <none>        80/TCP    21s
 ```
 
 As mentioned previously, a Service is backed by a group of Pods. These Pods are
-exposed through `endpoints`. The Service's selector will be evaluated continuously
-and the results will be POSTed to an Endpoints object also named `my-nginx`.
-When a Pod dies, it is automatically removed from the endpoints, and new Pods
-matching the Service's selector will automatically get added to the endpoints.
+exposed through
+{{<glossary_tooltip term_id="endpoint-slice" text="EndpointSlices">}}.
+The Service's selector will be evaluated continuously and the results will be POSTed
+to an EndpointSlice that is connected to the Service using a
+{{< glossary_tooltip text="labels" term_id="label" >}}.
+When a Pod dies, it is automatically removed from the EndpointSlices that contain it 
+as an endpoint. New Pods that match the Service's selector will automatically get added
+to an EndpointSlice for that Service.
 Check the endpoints, and note that the IPs are the same as the Pods created in
 the first step:
 
@@ -116,11 +120,11 @@ Session Affinity:    None
 Events:              <none>
 ```
 ```shell
-kubectl get ep my-nginx
+kubectl get endpointslices -l kubernetes.io/service-name=my-nginx
 ```
 ```
-NAME       ENDPOINTS                     AGE
-my-nginx   10.244.2.5:80,10.244.3.4:80   1m
+NAME             ADDRESSTYPE   PORTS   ENDPOINTS               AGE
+my-nginx-7vzhx   IPv4          80      10.244.2.5,10.244.3.4   21s
 ```
 
 You should now be able to curl the nginx Service on `<CLUSTER-IP>:<PORT>` from

diff --git a/content/en/docs/concepts/services-networking/dns-pod-service.md b/content/en/docs/concepts/services-networking/dns-pod-service.md
@@ -183,8 +183,8 @@ the same namespace, the Pod will see its own FQDN as
 A or AAAA record at that name, pointing to the Pod's IP. Both Pods "`busybox1`" and
 "`busybox2`" can have their distinct A or AAAA records.
 
-The Endpoints object can specify the `hostname` for any endpoint addresses,
-along with its IP.
+An {{<glossary_tooltip term_id="endpoint-slice" text="EndpointSlice">}} can specify
+the DNS hostname for any endpoint addresses, along with its IP.
 
 {{< note >}}
 Because A or AAAA records are not created for Pod names, `hostname` is required for the Pod's A or AAAA

diff --git a/content/en/docs/concepts/services-networking/endpoint-slices.md b/content/en/docs/concepts/services-networking/endpoint-slices.md
@@ -19,24 +19,7 @@ Endpoints.
 
 <!-- body -->
 
-## Motivation
-
-The Endpoints API has provided a simple and straightforward way of
-tracking network endpoints in Kubernetes. Unfortunately as Kubernetes clusters
-and {{< glossary_tooltip text="Services" term_id="service" >}} have grown to handle and
-send more traffic to more backend Pods, limitations of that original API became
-more visible.
-Most notably, those included challenges with scaling to larger numbers of
-network endpoints.
-
-Since all network endpoints for a Service were stored in a single Endpoints
-resource, those resources could get quite large. That affected the performance
-of Kubernetes components (notably the master control plane) and resulted in
-significant amounts of network traffic and processing when Endpoints changed.
-EndpointSlices help you mitigate those issues as well as provide an extensible
-platform for additional features such as topological routing.
-
-## EndpointSlice resources {#endpointslice-resource}
+## EndpointSlice API {#endpointslice-resource}
 
 In Kubernetes, an EndpointSlice contains references to a set of network
 endpoints. The control plane automatically creates EndpointSlices
@@ -48,7 +31,7 @@ Service name.
 The name of a EndpointSlice object must be a valid
 [DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
 
-As an example, here's a sample EndpointSlice resource for the `example`
+As an example, here's a sample EndpointSlice object, that's owned by the `example`
 Kubernetes Service.
 
 ```yaml
@@ -81,8 +64,7 @@ flag, up to a maximum of 1000.
 
 EndpointSlices can act as the source of truth for
 {{< glossary_tooltip term_id="kube-proxy" text="kube-proxy" >}} when it comes to
-how to route internal traffic. When enabled, they should provide a performance
-improvement for services with large numbers of endpoints.
+how to route internal traffic.
 
 ### Address types
 
@@ -92,6 +74,10 @@ EndpointSlices support three address types:
 * IPv6
 * FQDN (Fully Qualified Domain Name)
 
+Each `EndpointSlice` object represents a specific IP address type. If you have
+a Service that is available via IPv4 and IPv6, there will be at least two
+`EndpointSlice` objects (one for IPv4, and one for IPv6).
+
 ### Conditions
 
 The EndpointSlice API stores conditions about endpoints that may be useful for consumers.
@@ -241,11 +227,44 @@ getting replaced.
 
 Due to the nature of EndpointSlice changes, endpoints may be represented in more
 than one EndpointSlice at the same time. This naturally occurs as changes to
-different EndpointSlice objects can arrive at the Kubernetes client watch/cache
-at different times. Implementations using EndpointSlice must be able to have the
-endpoint appear in more than one slice. A reference implementation of how to
-perform endpoint deduplication can be found in the `EndpointSliceCache`
-implementation in `kube-proxy`.
+different EndpointSlice objects can arrive at the Kubernetes client watch / cache
+at different times.
+
+{{< note >}}
+Clients of the EndpointSlice API must be able to handle the situation where
+a particular endpoint address appears in more than one slice.
+
+You can find a reference implementation for how to perform this endpoint deduplication
+as part of the `EndpointSliceCache` code within `kube-proxy`.
+{{< /note >}}
+
+## Comparison with Endpoints {#motivation}
+
+The original Endpoints API provided a simple and straightforward way of
+tracking network endpoints in Kubernetes. As Kubernetes clusters
+and {{< glossary_tooltip text="Services" term_id="service" >}} grew to handle
+more traffic and to send more traffic to more backend Pods, the
+limitations of that original API became more visible.
+Most notably, those included challenges with scaling to larger numbers of
+network endpoints.
+
+Since all network endpoints for a Service were stored in a single Endpoints
+object, those Endpoints objects could get quite large. For Services that stayed
+stable (the same set of endpoints over a long period of time) the impact was
+acceptable; however, some use cases of Kubernetes weren't well served.
+
+When a Service had a lot of endpoints and the service was either scaling
+frequently, or rolling out new changes frequently, each update to the single
+Endpoints object for that Service meant a lot of traffic between Kubernetes 
+cluster components (within the control plane, and also between nodes and the
+API server). This extra traffic also had a cost in terms of CPU use.
+
+With EndpointSlices, adding or removing a single Pod triggers the same _number_
+of updates to clients that are watching for changes, but the size of those
+update message is much smaller at large scale.
+
+EndpointSlices also enabled innovation around new features such as topology-aware
+routing.
 
 ## {{% heading "whatsnext" %}}
 

diff --git a/content/en/docs/concepts/services-networking/service.md b/content/en/docs/concepts/services-networking/service.md
@@ -61,7 +61,8 @@ The Service abstraction enables this decoupling.
 
 If you're able to use Kubernetes APIs for service discovery in your application,
 you can query the {{< glossary_tooltip text="API server" term_id="kube-apiserver" >}}
-for Endpoints, that get updated whenever the set of Pods in a Service changes.
+for matching EndpointSlices. Kubernetes updates the EndpointSlices for a Service
+whenever the set of Pods in a Service changes.
 
 For non-native applications, Kubernetes offers ways to place a network port or load
 balancer in between your application and the backend Pods.
@@ -159,8 +160,12 @@ Each port definition can have the same `protocol`, or a different one.
 ### Services without selectors
 
 Services most commonly abstract access to Kubernetes Pods thanks to the selector,
-but when used with a corresponding Endpoints object and without a selector, the Service can abstract other kinds of backends,
-including ones that run outside the cluster. For example:
+but when used with a corresponding set of
+{{<glossary_tooltip term_id="endpoint-slice" text="EndpointSlices">}}
+objects and without a selector, the Service can abstract other kinds of backends,
+including ones that run outside the cluster.
+
+For example:
 
 * You want to have an external database cluster in production, but in your
   test environment you use your own databases.
@@ -184,73 +189,94 @@ spec:
       targetPort: 9376
 ```
 
-Because this Service has no selector, the corresponding Endpoints object is not
-created automatically. You can manually map the Service to the network address and port
-where it's running, by adding an Endpoints object manually:
+Because this Service has no selector, the corresponding EndpointSlice (and
+legacy Endpoints) objects are not created automatically. You can manually map the Service
+to the network address and port where it's running, by adding an EndpointSlice
+object manually. For example:
 
 ```yaml
-apiVersion: v1
-kind: Endpoints
+apiVersion: discovery.k8s.io/v1
+kind: EndpointSlice
 metadata:
-  # the name here should match the name of the Service
-  name: my-service
-subsets:
+  name: my-service-1 # by convention, use the name of the Service
+                     # as a prefix for the name of the EndpointSlice
+  labels:
+    # You should set the "kubernetes.io/service-name" label.
+    # Set its value to match the name of the Service
+    kubernetes.io/service-name: my-service
+addressType: IPv4
+ports:
+  - name: '' # empty because port 9376 is not assigned as a well-known
+             # port (by IANA)
+    appProtocol: http
+    protocol: TCP
+    port: 9376
+endpoints:
   - addresses:
-      - ip: 192.0.2.42
-    ports:
-      - port: 9376
+      - "10.4.5.6" # the IP addresses in this list can appear in any order
+      - "10.1.2.3"
 ```
 
-The name of the Endpoints object must be a valid
-[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
-
-When you create an [Endpoints](/docs/reference/kubernetes-api/service-resources/endpoints-v1/)
-object for a Service, you set the name of the new object to be the same as that
-of the Service.
+When you create an [EndpointSlice](#endpointslices) object for a Service, you can
+use any name for the EndpointSlice. Each EndpointSlice in a namespace must have a
+unique name. You link an EndpointSlice to a Service by setting the
+`kubernetes.io/service-name` {{< glossary_tooltip text="label" term_id="label" >}}
+on that EndpointSlice.
 
 {{< note >}}
 The endpoint IPs _must not_ be: loopback (127.0.0.0/8 for IPv4, ::1/128 for IPv6), or
 link-local (169.254.0.0/16 and 224.0.0.0/24 for IPv4, fe80::/64 for IPv6).
 
-Endpoint IP addresses cannot be the cluster IPs of other Kubernetes Services,
+The endpoint IP addresses cannot be the cluster IPs of other Kubernetes Services,
 because {{< glossary_tooltip term_id="kube-proxy" >}} doesn't support virtual IPs
 as a destination.
 {{< /note >}}
 
 Accessing a Service without a selector works the same as if it had a selector.
-In the example above, traffic is routed to the single endpoint defined in
-the YAML: `192.0.2.42:9376` (TCP).
-
-{{< note >}}
-The Kubernetes API server does not allow proxying to endpoints that are not mapped to
-pods. Actions such as `kubectl proxy <service-name>` where the service has no
-selector will fail due to this constraint. This prevents the Kubernetes API server
-from being used as a proxy to endpoints the caller may not be authorized to access.
-{{< /note >}}
+In the example above, traffic is routed to one of the two endpoints defined in
+the EndpointSlice manifest: a TCP connection to 10.1.2.3 or 10.4.5.6, on port 9376.
 
 An ExternalName Service is a special case of Service that does not have
 selectors and uses DNS names instead. For more information, see the
 [ExternalName](#externalname) section later in this document.
 
-### Over Capacity Endpoints
-If an Endpoints resource has more than 1000 endpoints then a Kubernetes v1.22 (or later)
-cluster annotates that Endpoints with `endpoints.kubernetes.io/over-capacity: truncated`.
-This annotation indicates that the affected Endpoints object is over capacity and that
-the endpoints controller has truncated the number of endpoints to 1000.
-
 ### EndpointSlices
 
 {{< feature-state for_k8s_version="v1.21" state="stable" >}}
 
-EndpointSlices are an API resource that can provide a more scalable alternative
-to Endpoints. Although conceptually quite similar to Endpoints, EndpointSlices
-allow for distributing network endpoints across multiple resources. By default,
-an EndpointSlice is considered "full" once it reaches 100 endpoints, at which
-point additional EndpointSlices will be created to store any additional
-endpoints.
+[EndpointSlices](/docs/concepts/services-networking/endpoint-slices/) are objects that
+represent a subset (a _slice_) of the backing network endpoints for a Service.
+
+Your Kubernetes cluster tracks how many endpoints each EndpointSlice represents.
+If there are so many endpoints for a Service that a threshold is reached, then
+Kubernetes adds another empty EndpointSlice and stores new endpoint information
+there.
+By default, Kubernetes makes a new EndpointSlice once the existing EndpointSlices
+all contain at least 100 endpoints. Kubernetes does not make the new EndpointSlice
+until an extra endpoint needs to be added.
+
+See [EndpointSlices](/docs/concepts/services-networking/endpoint-slices/) for more
+information about this API.
+
+### Endpoints
+
+In the Kubernetes API, an
+[Endpoints](/docs/reference/kubernetes-api/service-resources/endpoints-v1/)
+(the resource kind is plural) defines a list of network endpoints, typically
+referenced by a Service to define which Pods the traffic can be sent to.
+
+The EndpointSlice API is the recommended replacement for Endpoints.
+
+#### Over-capacity endpoints
+
+If an Endpoints object has more than 1000 endpoints then a Kubernetes v1.22 (or later)
+cluster annotates that Endpoints with `endpoints.kubernetes.io/over-capacity: truncated`.
+This {{< glossary_tooltip text="annotation" term_id="annotation" >}} indicates that the
+affected Endpoints object is over capacity and that the endpoints controller has truncated
+the number of endpoints to 1000.
 
-EndpointSlices provide additional attributes and functionality which is
-described in detail in [EndpointSlices](/docs/concepts/services-networking/endpoint-slices/).
+Traffic is still sent to backends, but any load balancing mechanism that relies on the
+legacy Endpoints API only sends traffic to at most 1000 of the available backing endpoints.
 
 ### Application protocol
 
@@ -571,19 +597,22 @@ selectors defined:
 
 ### With selectors
 
-For headless Services that define selectors, the endpoints controller creates
-`Endpoints` records in the API, and modifies the DNS configuration to return
-A records (IP addresses) that point directly to the `Pods` backing the `Service`.
+For headless Services that define selectors, the Kubernetes control plane creates
+EndpointSlice objects in the Kubernetes API, and modifies the DNS configuration to return
+A or AAAA records (IPv4 or IPv6 addresses) that point directly to the Pods backing
+the Service.
 
 ### Without selectors
 
-For headless Services that do not define selectors, the endpoints controller does
-not create `Endpoints` records. However, the DNS system looks for and configures
+For headless Services that do not define selectors, the control plane does
+not create EndpointSlice objects. However, the DNS system looks for and configures
 either:
 
-* CNAME records for [`ExternalName`](#externalname)-type Services.
-* A records for any `Endpoints` that share a name with the Service, for all
-  other types.
+* DNS CNAME records for [`type: ExternalName`](#externalname) Services.
+* DNS A / AAAA records for all IP addresses of the Service's ready endpoints,
+  for all Service types other than `ExternalName`.
+  * For IPv4 endpoints, the DNS system creates A records.
+  * For IPv6 endpoints, the DNS system creates AAAA records.
 
 ## Publishing Services (ServiceTypes) {#publishing-services-service-types}
 

diff --git a/content/en/docs/concepts/services-networking/topology-aware-hints.md b/content/en/docs/concepts/services-networking/topology-aware-hints.md
@@ -13,7 +13,7 @@ weight: 45
 
 _Topology Aware Hints_ enable topology aware routing by including suggestions
 for how clients should consume endpoints. This approach adds metadata to enable
-consumers of EndpointSlice and / or Endpoints objects, so that traffic to
+consumers of EndpointSlice (or Endpoints) objects, so that traffic to
 those network endpoints can be routed closer to where it originated.
 
 For example, you can route traffic within a locality to reduce

diff --git a/content/en/docs/concepts/workloads/pods/pod-lifecycle.md b/content/en/docs/concepts/workloads/pods/pod-lifecycle.md
@@ -461,7 +461,7 @@ An example flow:
       order. If the order of shutdowns matters, consider using a `preStop` hook to synchronize.
       {{< /note >}}
 1. At the same time as the kubelet is starting graceful shutdown, the control plane removes that
-   shutting-down Pod from Endpoints (and, if enabled, EndpointSlice) objects where these represent
+   shutting-down Pod from EndpointSlice (and Endpoints) objects where these represent
    a {{< glossary_tooltip term_id="service" text="Service" >}} with a configured
    {{< glossary_tooltip text="selector" term_id="selector" >}}.
    {{< glossary_tooltip text="ReplicaSets" term_id="replica-set" >}} and other workload resources