- Issue: nginx#1566
- Status: Implementable
NGINX is highly configurable and offers rich features that can benefit our users. We want to expose this native NGINX configuration to our users through Gateway API extension points -- such as Policies and Filters. This Enhancement Proposal aims to identify the set of NGINX directives and parameters we will expose first, group them according to Gateway API role(s), NGINX contexts, and use cases, and propose the type of extension point for each group.
- Extensions for NGINX Features
- Summary
- Table of Contents
- Goals
- Non-Goals
- Gateway API Extensions
- Prioritized NGINX Features
- Grouping the Features
- API
- Testing
- Security Considerations
- Alternatives Considered
- References
- Identify the set of NGINX features to deliver to users.
- Group these features to reduce the number of APIs and improve user experience.
- For each group, identify the type of Gateway API extension to use and the applicable Gateway API role(s).
- Design the API of every extension. This design work will be completed for each extension before implementation.
- Design an API that will allow the user to insert raw NGINX config into the NGINX config that NGINX Gateway Fabric generates (e.g. snippets in NGINX Ingress Controller).
The Gateway API provides many extensions that implementations can leverage to deliver features that the general-purpose API cannot address. This section provides a summary of these extensions.
Role(s): Infrastructure Provider or Cluster Operator. For NGINX Gateway Fabric, this will be the responsibility of the Cluster Operator, as the Infrastructure Provider role only applies to cloud or PaaS providers.
Resource(s): GatewayClass
Status: Completed
Channel: Standard
Conformance Level: Implementation-specific
Example(s): EnvoyProxy CRD
First-class API field on the GatewayClass.spec
. The field refers to a resource containing the configuration parameters corresponding to the GatewayClass. The resource referenced can be a standard Kubernetes resource, a CRD, or a ConfigMap. While GatewayClasses are cluster-scoped, resources referenced by GatewayClass.spec.parametersRef
can be namespaced-scoped or cluster-scoped. The configuration parameters contained in the referenced resource are applied to all Gateways attached to the GatewayClass.
Example:
kind: GatewayClass
metadata:
name: internet
spec:
controllerName: "example.net/gateway-controller"
parametersRef:
group: example.net/v1alpha1
kind: Config
name: internet-gateway-config
---
apiVersion: example.net/v1alpha1
kind: Config
metadata:
name: internet-gateway-config
spec:
ip-address-pool: internet-vips
GEP-1867 raises the following operational challenges with parameterRefs
:
- Permissions: To make declarative changes to a Gateway, the Gateway owner (who has RBAC permissions to a specific Gateway) must have access to GatewayClass, a cluster-scoped resource.
- Scope: If a change is made on a GatewayClass, all Gateways are affected by that change. This will become an issue once we support multiple Gateways.
- Dynamic Changes: GatewayClasses are meant to be templates, so changes made to the GatewayClass are not expected to change deployed Gateways. This means the configuration is not dynamic. However, this is just a recommendation by the spec and is not required. If we propagate the changes from a GatewayClass to existing Gateways, we must clearly document this behavior.
Role(s): Cluster Operator, and Infrastructure Provider
Resource(s): Gateway, GatewayClass (planned)
Status: Experimental
Channel: Experimental
Conformance Level: Core
First-class infrastructure API on Gateway and GatewayClass. Several GEPs are related to this API: GEP-1867, GEP-1762, and GEP-1651.
Gateways represent a piece of infrastructure that often needs vendor-specific config (e.g., size or version of the infrastructure to provide). The goal of the infrastructure API is to provide a way to define both implementation-specific and standard attributes on a specific Gateway.
So far, the infrastructure API has been implemented on the Gateway, and there are only two fields: annotations
and labels
:
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
name: my-gateway
spec:
infrastructure:
labels:
foo: bar
annotations:
name: my-annotation
Infrastructure labels and annotations should be applied to all resources created in response to the Gateway. This only applies to automated deployments (i.e., provisioner mode), implementations that automatically deploy the data plane based on a Gateway. Other use cases for this API are Service type, Service IP, CPU memory requests, affinity rules, and Gateway routability (public, private, and cluster).
Role(s): Cluster Operator
Resource(s): Gateway
Status: Completed
Channel: Standard
Conformance Level: Implementation-specific
Example(s): GKE pre-shared certs
TLS options are a list of key/value pairs to enable extended TLS configuration for an implementation. This field is part of the GatewayTLSConfig defined on a Gateway Listener. Currently, there are no standard keys for TLS options, but the API may define a set of standard keys in the future.
Possible use cases: minimum TLS version and supported cipher suites.
Example:
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
name: my-gateway
spec:
listeners:
- name: https
port: 443
protocol: HTTPS
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: my-secret
namespace: certificate
options:
example.com/my-custom-option: custom-value
Role(s): Application Developer
Resource(s): HTTPRoute, GRPCRoute
Status: Completed
Channel: Standard
Conformance Level: Implementation-specific
Example(s): Easegress Filters.
Filters define processing steps that must be completed during the request or response lifecycle. They can be applied on the route.rule
or the route.rule.backendRef
. If applied on the backendRef
, the filter should be executed if and only if the request is being forwarded.
Possible use cases: request/response modification, authentication, rate-limiting, and traffic shaping.
Example:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: custom
spec:
hostnames:
- "example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /
filters:
- type: ExtensionRef
extensionRef:
group: my.group.io
kind: MyCustomFilter
name: example
backendRefs:
- name: backend
port: 80
---
apiVersion: example.com/v1alpha1
kind: MyCustomFilter
metadata:
name: example
spec:
// some values here that are applied during the request/response lifecycle
Role(s): Application Developer
Resource(s): xRoute
Status: Completed
Channel: Standard
Conformance Level: Implementation-specific
Example(s): ServiceImport GKE, ServiceImport Envoy Gateway.
BackendRefs defines the backend(s) where matching requests should be sent. The Gateway API supports BackendRefs of type Kubernetes Service. An implementation can add support for other types of backends. This extension point should be used for forwarding traffic to network endpoints other than Kubernetes Services.
Possible use cases: S3 bucket, lambda function, file-server.
Example:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: custom-backend
spec:
hostnames:
- "example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- group: my.group.io
kind: MyKind
name: backend
port: 80
Role(s): All roles
Resource(s): All resources
Status: Experimental
Channel: Experimental
Conformance Level: Extended/Implementation-specific
Example(s): Envoy Gateway BackendTrafficPolicy, GKE HealthCheckPolicy, BackendTLSPolicy
Policies are a Kubernetes object that augments the behavior of an object in a standard way. Policies can be attached to one object ("Direct Policy Attachment") or objects in a hierarchy ("Inherited Policy Attachment"). In both cases, Policies are implemented as CRDs, and they must include a single TargetRef
struct in the spec
to identify how and where to apply the Policy. The Policy GEP may be extended to supported multiple TargetRefs
and/or label selectors.
Policies do not need to support attaching to all resource Kinds. Implementations can choose which resources the Policy can be attached to.
A Direct Policy Attachment is a Policy that references a single object -- such as a Gateway or HTTPRoute. It is tightly bound to one instance of a particular Kind within a single Namespace or an instance of a single Kind at the cluster-scope. It affects only the object specified in its TargetRef.
A Direct Policy may target a subsection of a resource using the sectionName
field of the targetRef
. This allows the Policy to target specific matches within nested objects, such as a Listener within a Gateway or a specific Port on a Service.
Example of a TargetRef that targets an entire Gateway:
targetRef:
group: gateway.networking.k8s.io
kind: Gateway
name: my-gateway
namespace: default
Example of a TargetRef that targets the http
listener on a Gateway:
targetRef:
group: gateway.networking.k8s.io
kind: Gateway
name: my-gateway
sectionName: http
namespace: default
There's also an open GEP issue to add a name field to HTTPRouteRule and HTTPRouteMatch structs to allow users to identify different routes. Once implemented, Policies can target a specific rule or match in an HTTPRoute.
The BackendTLSPolicy is an example of a Direct Policy Attachment:
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: BackendTLSPolicy
metadata:
name: backend-tls
spec:
targetRef:
group: '' #
kind: Service
name: secure-app
namespace: default
tls:
caCertRefs:
- name: backend-cert
group: '' # Empty string means core - this is a standard convention
kind: ConfigMap
hostname: secure-app.example.com
This example targets the secure-app
Service in the default
Namespace. The TLS configuration in the backend-tls
Policy will be applied for all Routes that reference the secure-app
Service.
When to use Direct Policy Attachment:
- The number or scope of objects that need to be modified is limited or singular.
- No transitive information. The modifications only affect the single object that the Policy is bound to.
- The status should be reasonably easy to set in 1-2 Conditions.
- SHOULD only be used to target objects in the same Namespace as the Policy.
Direct Policy Attachment is simple. A Policy references the resource it wants to apply to. Access is granted with RBAC; anyone who has access to the Policy in the given Namespace can attach it to any resource within that Namespace.
The Policy defines which resource Kinds it can be attached to.
Inherited Policy Attachment is designed to allow settings to flow down a hierarchy. Inherited Policies may have the following fields in their specs:
defaults
: set the default value. Can be overridden by lower objects. Ex. A connection timeout default policy on a Gateway may be overridden by a connection timeout policy on an HTTPRoute.overrides
: cannot be overridden by lower objects. Ex. Setting a max client timeout to some non-infinite value at the Gateway to stop HTTPRoute owners from leaking connections over time.
When to use an Inherited Policy:
- The settings are bound to one object but affect other objects attached to it. For example, it affects HTTPRoutes attached to a single Gateway.
- The settings need to be able to be defaulted but can be overridden on a per-object basis.
- The settings must be enforced by one persona and not modifiable or removable by a lesser-privileged persona.
- Reporting (in terms of status), if the Policy is attached, is easy, but reporting what resources the Policy is being applied to is not and will require careful design.
Example:
kind: CDNCachingPolicy
metadata:
name: gw-policy
spec:
override:
cdn:
enabled: true
default:
cdn:
cachePolicy:
includeHost: true
includeProtocol: true
includeQueryString: true
targetRef:
kind: Gateway
name: example
---
kind: CDNCachingPolicy
metadata:
name: route-policy
spec:
default:
cdn:
cachePolicy:
includeQueryString: false
targetRef:
kind: HTTPRoute
name: example
The Policy attached to the Gateway has the following effects:
- It requires cdn to be enabled for all routes connected to the Gateway. This setting can not be overridden by the CDNCachingPolicy attached to the HTTPRoute.
- It provides a default configuration for the cdn. These settings can be overridden by the CDNCachingPolicy attached to the HTTPRoute.
The policy attached to the HTTPRoute changes the value of cdn.cachePolicy.includeQueryString
. All other default cdn configuration set on the Gateway policy remain unchanged. The effective policy on the HTTPRoute is:
cdn:
enabled: true
cachePolicy:
includeHost: true
includeProtocol: true
includeQueryString: false
The following image depicts the hierarchy of defaults
and overrides
:
Override values are given precedence from the top down. An override value attached to a GatewayClass will have the highest precedence overall. A lower object cannot override it. This provides users with a mechanism to enforce/require settings.
Default values are given precedence from the bottom up. A default attached to a Backend will have the highest precedence among default values.
Inherited Policies do not need to support policy attachment to each resource shown in the hierarchy. Implementations can choose which resources the Policy can attach to.
Direct | Indirect |
---|---|
Affects ONLY the object it targets | Affects more objects that the targeted object |
Requires no extra knowledge beyond the Policy and target object | Requires knowledge of resources other than the Policy and target object to understand the state of the system |
Does not include defaults or overrides | May include defaults and/or overrides |
Should only support attaching to a single Kind | Can support attaching to multiple Kinds |
May target a subsection of a resource using the sectionName field of the targetRef struct |
Does not target a subsection of a resource |
If a Policy can be used as an Inherited Policy, it MUST be treated as an Inherited Policy, regardless of whether a specific instance of the Policy is only affecting a single object.
Status and the Discoverability problem
Policy Attachment adds a new persona -- Policy Admin, whose responsibilities cut across all levels of the Gateway API. The Policy Admins actions need to be discoverable by all other Gateway API personas. For example, the Application Developer needs to be able to look at their routes and know what Policy settings are applied to them. The Cluster Operator needs to be able to determine the set of objects affected by a Policy so they can diagnose and fix cluster-wide issues. Finally, the Policy Admin needs to be able to validate their Policies and understand where the Policies are applied and what the effects are.
Ideally, implementations can write the complete resultant set of Policy available in the status of objects that the Policy affects. Unfortunately, this introduces a new set of challenges around writing status, which include status getting too large, the difficulty of building a common data representation for Policy, and the fan-out problem where updating one Policy object could trigger a waterfall of status updates for all affected objects.
The Metaresources and Policy Attachment GEP proposes a range of solutions that help address this problem but does not eliminate it.
Hard to implement
Policies are complex to implement because they can affect many objects in various ways -- i.e., all objects in a Namespace or all Routes attached to a Gateway. Inherited Policies introduce more complexity through the hierarchical nature of overrides and defaults. Computing the effective set of policy for a given object can be computationally challenging. In addition, we will need to define how different Policy types interact.
Subject to change
The Metaresources and Policy Attachment GEP is experimental. There's no guarantee that Policy Attachment will stay the same as the Gateway API receives feedback from implementors.
To identify the set of NGINX directives and parameters NGINX Gateway Fabric should implement first, we considered the features that the directives and parameters delivered, using the NGINX Ingress Controller's features as a guide. Once we had a list of features, we prioritized them into four categories: high-priority, medium-priority, low-priority, and active GEPs.
Features | Requires NGINX Plus |
---|---|
Log level and format | |
Passive health checks | |
Slow start. Prevent a recently recovered server from being overwhelmed on connections by restart | X |
Active health checks | X |
JSON Web Token (JWT) authentication | X |
OpenID Connect (OIDC) authentication. Allow users to authenticate with single-sign-on (SSO) | X |
Customize client settings. For example, client_max_body_size |
|
Upstream keepalives | |
Client keepalives | |
TLS settings. For example, TLS protocols and server ciphers | |
OpenTelemetry tracing | |
Connection timeouts |
Features | Requires NGINX Plus |
---|---|
Backup server | |
Load-balancing method | |
Load-balancing method (least time) | X |
Limit connections to a server | |
Request queuing. Queue requests if servers are unavailable | X |
Basic authentication. User/Password | |
PROXY protocol | |
Upstream zone size | |
Custom response. Return a custom response for a given path | |
External routing. Route to services outside the cluster | |
HTTP Strict Transport Security (HSTS) | |
API Key authentication | |
Next upstream retries. NGINX retries by trying the next server in the upstream group | |
Proxy Buffering |
Features | Requires NGINX Plus |
---|---|
Pass/hide headers to/from the client | |
Custom error pages | |
Access control | |
Windows Technology LAN Manager (NTLM) | X |
Backend client certificate verification | |
NGINX worker settings | |
HTTP/2 client requests | |
Set server name size | |
JSON Web Token (JWT) content-based routing | X |
QUIC/HTTP3 | |
Remote authentication request | |
Timeout for unresponsive clients |
The status field in the table describes the status of the GEP using the following terms:
- Experimental: GEP is implemented and in the experimental channel.
- Provisional: GEP is written but has yet to be implemented.
- Open: An issue exists, but a GEP still needs to be written.
Functionality | Status | GEP/Issue |
---|---|---|
Per-request timeouts | Experimental | GEP-1742 |
Retries. This is different from NGINX retries | Open | GEP-1731 |
Session persistence | Provisional | GEP-1619 |
Rate-limiting | Open | Issue 326 |
Cross-Origin Resource Sharing (CORS) | Open | Issue 1767 |
Gateway client certificate verification | Provisional | GEP-91 |
Authentication | Open | Issue 1494 |
To reduce the number of CRDs NGINX Gateway Fabric must maintain and users have to create, we grouped the high and medium-priority features into configuration categories.
For each group, we will begin implementation by focusing on the high-priority features of each group. Then, once the high-priority features are complete, we can move on to the medium-priority features.
The low-priority features are out of scope for the rest of the Enhancement Proposal but may be revisited once we make progress on the higher-priority features. In addition, most features with active GEPs are not included in the groups as we want to help move the GEPs forward toward standardization instead of creating our bespoke solutions.
When grouping the features, we considered the following factors:
- Use cases. Are there features that can be grouped by the use case they satisfy? For example, authentication or observability.
- Gateway API roles. Which features are the domain of the Cluster Operator? Which features are Application Developers responsible for?
- The NGINX contexts. Which contexts are the NGINX directives available in? For example, http, server, location, upstream, etc.
The following picture shows the nine groups we came up with:
The following section proposes an extension type (Policy, Filter, etc.), extension point (GatewayClass, HTTPRoute, etc.), and Gateway API role for all the groups. The goal of this Enhancement Proposal is not to design each extension but to propose a strategy for delivering critical NGINX features to our customers.
Extension type: ParametersRef
Resource type: CRD
Role(s): Cluster Operator
Extension point: GatewayClass
NGINX Context(s): main, http, stream
NGINX OSS features:
- Error log level
- Access log settings: format and disable
- PROXY protocol
- OTel Tracing: Exporter configuration which includes endpoint, interval, batch size, and batch count.
NGINX OSS directives:
proxy_protocol
access_log
error_log
log_format
otel_exporter
otel_service_name
otel_span_attr
: set global span attributes that will be merged with the span attributes set in the Observability extension.
NGINX Plus features:
- External Routing: enable/disable routing to ExternalName Services and set DNS resolver addresses.
NGINX Plus directives:
These features are grouped because they are all the responsibility of the Cluster Operator and should not be set or changed by Application Developers.
All the Gateways attached to the GatewayClass will inherit these settings. This is acceptable since NGINX Gateway Fabric supports a single Gateway per GatewayClass. However, once we support multiple Gateways, this could become an issue. It may force users to create multiple GatewayClasses in order to create Gateways with different settings.
Add support for:
- Stream access log settings: format and disable
- Worker settings: number of worker processes, connections, etc.
- Resolver settings: timeout, ipv6 on/off, cache timeout, etc.
- ParametersRef with ConfigMap: A ConfigMap is another resource type where a user can provide configuration options. However, unlike CRDs, ConfigMaps do not have built-in schema validation, versioning, or conversion webhooks.
- Direct Policy: A Direct Policy may also work for Gateway Settings. It can be attached to a Gateway and scoped to Cluster Operators through RBAC. It will Cluster Operators to apply settings for specific Gateways, instead of all Gateways.
Extension type: Filter
Resource type: CRD
Role(s): Application Developer
Extension point: HTTPRoute
NGINX Context(s): location
Features:
- Custom Responses. Return a preconfigured response.
NGINX directives:
- We could contribute this filter to the Gateway API if it is portable among enough implementations.
- Allow attaching to GRPCRouteRule once we add support for GRPCRoutes.
- Pursue standardization in Gateway API through the GEP process before implementing our API. While it's possible the Gateway API would accept a GEP for this Filter, it will take a significant amount of time and is not guaranteed. After implementing it ourselves, we can always contribute this Filter back to the API.
Extension type: TLS Options
Resource type: N/A. We will need to document supported key-values.
Role(s): Cluster Operator
Extension point: Gateway
NGINX Context(s): server
Features:
- TLS Settings: SSL protocols, preferred ciphers, DH parameters file
- HSTS
NGINX Directives:
These features are grouped because they are all TLS-related settings. TLS options are a good fit because they allow configuring these settings for Gateway Listeners and are restricted to the Cluster Operator role.
- Contribute back to Gateway API by standardizing the keys.
- Policy: A TLS or Security Policy that attaches to Gateway listeners may also work. However, that would involve maintaining an additional CRD, whereas with TLS options, we only need to document and validate a few key-value pairs.
- ParametersRef: While ParametersRef is also scoped to the Cluster Operator, it applies to all Gateways attached to the GatewayClass. Gateways can have multiple listeners with different protocols, and the TLS settings may not apply to all listeners. In addition, Cluster Operators may want to enforce different TLS settings for different TLS listeners.
Extension type: Inherited Policy
Resource type: CRD
Role(s): Cluster Operator
Extension point: Gateway, HTTPRoute
NGINX context(s): http, server, location
Features:
- Client max body size
- Client keepalive
NGINX directives:
These features are grouped because they all deal with client traffic.
An Inherited Policy fits this group best for the following reasons:
- A Cluster Operator may want to set defaults for client max body size or client keepalives.
- An Application Developer may want to override these defaults because of the unique attributes of their applications.
- Since these settings are available in the http, server, and location contexts, there is already inheritance involved. For example, setting the client max body size in the http context, sets the client max body size of all the servers and locations. While setting the client max body size of server
example.com
will override the size set in the http context. - If this policy is applied to a Gateway is will affect all the Routes attached to the Gateway, which is one of the traits of an Inherited Policy.
- Can add support for more
client_*
directives:client_body_buffer_size
,client_header_buffer_size
,keepalive_disable
, etc.
- Direct Policy: A Direct Policy isn't a good fit because the NGINX directives included in this policy are available at the http, server, and location contexts. NGINX's inheritance behavior among these contexts does not suit Direct Policy attachment. A client settings policy attached to a Gateway will affect all the Routes attached to the Gateway. This violates the Direct Policy requirement that the policy should only affect the object it attaches to. If we only support attaching this policy to an HTTPRoute, we could use a Direct Policy. However, we want to allow the Cluster Operator to configure defaults for these client settings, which means we need to support attaching to Gateways as well as HTTPRoutes.
Extension type: Direct Policy
Resource type: CRD
Role(s): Application Developer
Extension point: Backend
NGINX Context(s): upstream, location (for active health checks)
OSS Features:
- Upstream zone size
- Upstream keepalives
- Load-balancing method (all except
least_time
) - Limit connections to server
- Passive health checks
- Backup server
OSS NGINX directives/parameters:
zone
keepalive
keepalive_requests
keepalive_time
keepalive_timeout
ip_hash
,least_conn
,random
max_conns
fail_timeout
max_fails
backup
NGINX Plus Features:
- Request queueing
- Slow start
- Active health checks
- Load-balancing method (
least_time
)
NGINX Plus directives/parameters:
These features are grouped because they all apply to the upstream context and make sense when attached to a Backend.
- Split out health checks into a separate Policy: The upstream settings group is large and could be split into two Policies based on use cases. However, we want to limit the number of CRDs we maintain and reduce the overlap between CRDs. Two Policies that set similar directives and parameters in the upstream context could be complex to implement.
Extension type: Filter
Resource type: CRD
Role(s): Application Developer
Extension point: HTTPRoute Rule
NGINX context(s): location
OSS Features:
- Basic Authentication (User/Password)
- API Key Authentication
OSS NGINX directives:
NGINX Plus Features:
- JSON Web Token (JWT) Authentication
- OIDC Authentication
NGINX Plus directives:
While there might be a strong use case for defaults and overrides for authentication, we should begin by adding an authentication Filter, instead of jumping right to an Inherited Policy. This would allow us to roll out an authentication solution quickly to support the core use case for Application Developers. The use case for Cluster Operators applying authentication Policies is less well-known than the Application Developer's use case. Instead of leading with the Cluster Operator use case, we can begin with a Filter and wait for user feedback. Later, we can add a Policy and define how the Filter and Policy interact.
In addition, since a Filter is referenced from a routing rule, it makes it clear to the Application Developer that authentication is applied to the route, whereas due to the discoverability issues around Policies, it wouldn't be clear to the Application Developer that authentication exists for their routes. Furthermore, if the referenced Filter does not exist, NGINX will return a 502 and the protected path will not be exposed. This isn't the case for a Policy, since a missing Policy would have no effect on routes and the protected path will be exposed to all users.
- Add support for remote authentication or other authentication strategies.
- Consider interoperability between authentication and rate limiting, so that we can rate limit on per-user basis.
- Inherited Policy that attaches to Gateways and HTTPRoutes. Implementing authentication with an Inherited Policy will allow Cluster Operators to apply global authentication policies for applications and Application Developers to override that policy on a per-route basis. This would be more convenient for applications that use one authentication method for all of their routes. However, it may not be straightforward to apply the inheritance concepts (overrides/defaults) to some authentication methods. For example, JWT authentication may contain multiple fields: secret, realm, and token. These fields are interdependent and should not be overridden separately, but that may be difficult to enforce under the current Inheritance Policy guidelines.
- Wait for Gateway API GEP on authentication: There is an open GEP issue for authentication in the Gateway API repository. Instead of rolling out our own authentication solution, we could wait or even champion this GEP. The advantages of waiting are that if the GEP is implemented, we won't have to deprecate our bespoke authentication Policy in favor of the GEP's implementation, and we won't have to maintain an additional Policy. However, we don't know how long it will take the GEP to move from open to implementable. Authentication is a critical feature, and it may not be prudent to wait.
Extension type: Direct Policy
Resource type: CRD
Role(s): Application Developer
Extension point: HTTPRoute, HTTPRoute rule
NGINX context(s): http, server, location
Features:
- OTel Tracing: enable tracing, set sampler rate, span name, attributes, and context.
NGINX directives:
otel_trace
: enable tracing and set sampler rateotel_trace_context
: export, inject, propagate, ignore.otel_span_name
otel_span_attr
Tracing will be disabled by default. The Application Developer will be able to use this Policy to enable and configure tracing for their routes. This Policy will only be applied if the OTel endpoint as been set by the Cluster Operator on the Gateway Settings. We will also need to document that the collector architecture we support, where there is a single collector (receiver) but there can be many exporters and processors.
Add support for:
- OTel logging
- Metrics
- Allow attaching to Namespaces. With the current approach, every HTTPRoute will need its own Observability Policy. In other words, there is no way for multiple HTTPRoutes to share the same Observability Policy. If we allow attaching to Namespaces, all HTTPRoutes in a Namespace can share an Observability Policy.
- Combine with OTel settings in Gateway Settings for one OTel Policy: Rather than splitting tracing across two Policies, we could create a single tracing Policy. The issue with this approach is that some tracing settings -- such as exporter endpoint -- should be restricted to Cluster Operators, while settings like attributes should be available to Application Developers. If we combine these settings, RBAC will not be sufficient to restrict access across the settings. We will have to disallow certain fields based on the resource the Policy is attached to. This is a bad user experience.
- Inherited Policy: An Inherited Policy would be useful if there is a use case for the Cluster Operator enforcing or defaulting the OTel tracing settings included in this policy.
Extension type: Inherited Policy
Resource type: CRD
Role(s): Cluster Operator, Application Developer
Extension point: Gateway, HTTPRoute
NGINX context(s): http, server, location
Features:
- Proxy buffering
- Connection timeouts
- Next upstream retries
NGINX directives:
proxy_connect_timeout
proxy_read_timeout
proxy_send_timeout
proxy_next_upstream
proxy_next_upstream_timeout
proxy_next_upstream_retries
proxy_buffering
An Inherited Policy fits this group best for the following reasons (same reasons as Client Settings:
- A Cluster Operator may want to set defaults for proxy settings.
- An Application Developer may want to override these defaults because of the unique attributes of their application.
- Since these settings are available in the http, server, and location contexts, there is already inheritance involved. For example, setting the proxy connect timeout in the http context, sets the proxy connect timeout of all the servers and locations. While setting the proxy connect timeout of server
example.com
will override the size set in the http context. - If this policy is applied to a Gateway is will affect all the Routes attached to the Gateway, which is one of the traits of an Inherited Policy.
- Add other
proxy_*
directives
- Direct Policy: If there's no strong use case for the Cluster Operator setting sane defaults for these settings, then we can use a Direct Policy. The Direct Policy could attach to an HTTPRoute or HTTPRoute Rule, and the NGINX contexts would be server and location.
Each extension will be tested with a combination of unit and system tests. The details of the tests are out of scope for this Enhancement Proposal.
At a minimum, all extension points will need validation to prevent malicious or invalid NGINX configuration. In addition, each extension will have unique security considerations based on its behavior and design. The details of validation and other security concerns are out of scope for this Enhancement Proposal and will be provided in future Enhancement Proposals on a per-extension basis.
-
Data Plane CRD or ConfigMap: Rather than adding nine different extensions APIs, we could add a single data plane CRD or ConfigMap that contains all the NGINX features described in this proposal. This would allow us to quickly expose NGINX functionality to the user and simplify our API. However, this object would become large and unfocused, a laundry list of NGINX directives and parameters. Additionally, with a single resource, there's no way to restrict access to certain fields with RBAC, which doesn't fit well with the role-based nature of the Gateway API.
-
Add support for NGINX configuration injection: Adding nine extension APIs will take time. If we first add support for the injection of any NGINX configuration (see NGINX Ingress Controller snippets and templates for examples), then we would immediately support all the NGINX functionality included in this proposal. However, this feature comes with a couple of problems:
- Security. The configuration cannot be validated before it is applied. This means that invalid or even malicious configuration can be injected. As a result, not every Cluster Operator would want this functionality turned on. So, for some users, this functionality would not be available.
- Requires NGINX knowledge. Without a first-class API exposing and abstracting away NGINX configuration, the users must be knowledgeable about NGINX. This will be great for NGINX power users but much more complicated for those unfamiliar with NGINX.
- Lack of status. No helpful status is set, and users must parse NGINX error messages—potential for frustration and bad user experiences.
Problems aside, this feature will still be useful for our users and is something we should implement. Still, it does not eliminate the need for the extension APIs proposed in this document and should not be a higher priority.