Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tracing how-to guide #2026

Merged
merged 9 commits into from
May 30, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .yamllint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ ignore:
- charts/nginx-gateway-fabric/templates
- config/crd/bases/
- deploy/crds.yaml
- site/static

rules:
braces: enable
Expand Down
2 changes: 1 addition & 1 deletion site/content/how-to/maintenance/_index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Maintenance and Upgrades"
description:
weight: 400
weight: 500
linkTitle: "Maintenance and Upgrades"
menu:
docs:
Expand Down
2 changes: 1 addition & 1 deletion site/content/how-to/monitoring/_index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Monitoring and Troubleshooting"
description:
weight: 500
weight: 400
linkTitle: "Monitoring and Troubleshooting"
menu:
docs:
Expand Down
2 changes: 1 addition & 1 deletion site/content/how-to/monitoring/dashboard.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "NGINX Plus Dashboard"
description: "Learn how to view the NGINX Plus dashboard to see real-time metrics."
weight: 200
weight: 300
toc: true
docs: "DOCS-1417"
---
Expand Down
2 changes: 1 addition & 1 deletion site/content/how-to/monitoring/prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ In the Grafana UI menu, go to `Connections` then `Data sources`. Add your Promet

Download the following sample dashboard and Import as a new Dashboard in the Grafana UI.

{{< download "grafana-dashboard.json" "ngf-grafana-dashboard.json" >}}
- {{< download "grafana-dashboard.json" "ngf-grafana-dashboard.json" >}}

## Available metrics in NGINX Gateway Fabric

Expand Down
335 changes: 335 additions & 0 deletions site/content/how-to/monitoring/tracing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,335 @@
---
title: "Tracing"
description: "Learn how to configure tracing in NGINX Gateway Fabric."
sjberman marked this conversation as resolved.
Show resolved Hide resolved
weight: 200
toc: true
docs: "DOCS-000"
---

{{<custom-styles>}}
sjberman marked this conversation as resolved.
Show resolved Hide resolved

## Overview

NGINX Gateway Fabric supports tracing using [OpenTelemetry](https://opentelemetry.io/). The official [NGINX OpenTelemetry Module](https://github.com/nginxinc/nginx-otel) instruments the NGINX data plane to export traces to a configured collector. Tracing data can be exported to an OpenTelemetry Protocol (OTLP) exporter, such as the [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector). This collector can then export data to one or more upstream collectors like [Jaeger](https://www.jaegertracing.io/), [DataDog](https://docs.datadoghq.com/tracing/), and many others. This particular model is called the [Agent model](https://opentelemetry.io/docs/collector/deployment/agent/).
bjee19 marked this conversation as resolved.
Show resolved Hide resolved
sjberman marked this conversation as resolved.
Show resolved Hide resolved

In this guide, we are going enable tracing on our HTTPRoutes using NGINX Gateway Fabric. We will use the OpenTelemetry Collector and Jaeger to process and collect our traces.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

## Installing the Collectors
sjberman marked this conversation as resolved.
Show resolved Hide resolved

The first step is to install the collectors. NGINX Gateway Fabric will be configured to export to the OpenTelemetry Collector, which is configured to export to Jaeger. This model allows us to easily swap out the visualization collector (Jaeger) for something else if we want to, or add more collectors without needing to reconfigure NGINX Gateway Fabric. It is also possible to configure NGINX Gateway Fabric to export directly to Jaeger, if desired.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

First, create the namespace:
sjberman marked this conversation as resolved.
Show resolved Hide resolved

```shell
kubectl create namespace monitoring
```

Download the following files containing the configurations for the collectors:

- {{< download "otel-collector.yaml" "otel-collector.yaml" >}}
- {{< download "jaeger.yaml" "jaeger.yaml" >}}

{{< note >}}These collectors are for demo purposes and are not tuned for production use.{{< /note >}}
sjberman marked this conversation as resolved.
Show resolved Hide resolved

and install:
sjberman marked this conversation as resolved.
Show resolved Hide resolved

```shell
kubectl apply -f otel-collector.yaml -f jaeger.yaml -n monitoring
```

Ensure that the Pods are running:
sjberman marked this conversation as resolved.
Show resolved Hide resolved

```shell
kubectl -n monitoring get pods
```

```text
NAME READY STATUS RESTARTS AGE
jaeger-8469f69b86-bfpk9 1/1 Running 0 9s
otel-collector-f786b7dfd-h2x9l 1/1 Running 0 9s
```

Once running, you can access the Jaeger dashboard by using port-forwarding in the background:

```shell
kubectl port-forward -n monitoring svc/jaeger 16686:16686 &
```

Visit [http://127.0.0.1:16686](http://127.0.0.1:16686) to view the dashboard.

## Enabling Tracing
sjberman marked this conversation as resolved.
Show resolved Hide resolved

Enabling tracing requires two pieces of configuration.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

- `NginxProxy`: This resource contains global settings relating to the NGINX data plane. It is created and managed by the [cluster operator](https://gateway-api.sigs.k8s.io/concepts/roles-and-personas/), and is referenced in the `parametersRef` field of the GatewayClass. This resource can be created and linked when we install NGINX Gateway Fabric using its helm chart, or it can be added later. In this guide we will install the resource using the helm chart, but will also show what it looks like in case you want to add it after installation.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

The `NginxProxy` resource contains configuration for the collector, and applies to all Gateways and routes under the GatewayClass. It does not enable tracing, but is a prerequisite to the next piece of configuration.

- `ObservabilityPolicy`: This resource is a [Policy](https://gateway-api.sigs.k8s.io/reference/policy-attachment/) that targets HTTPRoutes or GRPCRoutes. It is created by the [application developer](https://gateway-api.sigs.k8s.io/concepts/roles-and-personas/) and enables tracing for a specific route or routes. It requires the `NginxProxy` resource to exist in order to complete the tracing configuration.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

TODO(sberman): link to reference docs
ciarams87 marked this conversation as resolved.
Show resolved Hide resolved
sjberman marked this conversation as resolved.
Show resolved Hide resolved

### Installing NGINX Gateway Fabric with global tracing config
sjberman marked this conversation as resolved.
Show resolved Hide resolved

{{< note >}}Ensure that you've already [installed the Gateway API resources]({{< relref "installation/installing-ngf/helm.md#installing-the-gateway-api-resources" >}}).{{< /note >}}
sjberman marked this conversation as resolved.
Show resolved Hide resolved

Based on the collector we deployed above, we'll create the following `values.yaml` file to install NGINX Gateway Fabric:
sjberman marked this conversation as resolved.
Show resolved Hide resolved

```yaml
cat <<EOT > values.yaml
nginx:
config:
telemetry:
exporter:
endpoint: otel-collector.tracing.svc:4317
spanAttributes:
- key: cluster-attribute-key
value: cluster-attribute-value
EOT
```

We've set the endpoint and added a demo attribute that will be added to all tracing spans.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

To install:

```shell
helm install ngf oci://ghcr.io/nginxinc/charts/nginx-gateway-fabric --create-namespace -n nginx-gateway -f values.yaml
```

As a result, we should see the following configurations:
sjberman marked this conversation as resolved.
Show resolved Hide resolved

```shell
kubectl get nginxproxies.gateway.nginx.org ngf-proxy-config -o yaml
```

```yaml
apiVersion: gateway.nginx.org/v1alpha1
kind: NginxProxy
metadata:
name: ngf-proxy-config
spec:
telemetry:
exporter:
endpoint: otel-collector.tracing.svc:4317
spanAttributes:
- key: cluster-attribute-key
value: cluster-attribute-value
```

```shell
kubectl get gatewayclasses.gateway.networking.k8s.io nginx -o yaml
```

```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: nginx
spec:
controllerName: gateway.nginx.org/nginx-gateway-controller
parametersRef:
group: gateway.nginx.org
kind: NginxProxy
name: ngf-proxy-config
status:
conditions:
- lastTransitionTime: "2024-05-22T15:18:35Z"
message: GatewayClass is accepted
observedGeneration: 1
reason: Accepted
status: "True"
type: Accepted
- lastTransitionTime: "2024-05-22T15:18:35Z"
message: Gateway API CRD versions are supported
observedGeneration: 1
reason: SupportedVersion
status: "True"
type: SupportedVersion
- lastTransitionTime: "2024-05-22T15:18:35Z"
message: parametersRef resource is resolved
observedGeneration: 1
reason: ResolvedRefs
status: "True"
type: ResolvedRefs
```

If you already had NGINX Gateway Fabric installed, then you can simply create the `NginxProxy` resource and link it in the GatewayClass `parametersRef` like shown above, using:
sjberman marked this conversation as resolved.
Show resolved Hide resolved

```shell
kubectl edit gatewayclasses.gateway.networking.k8s.io nginx
```

Next you'll want to [Expose NGINX Gateway Fabric]({{< relref "installation/expose-nginx-gateway-fabric.md" >}}) and save the public IP address and port of NGINX Gateway Fabric into shell variables:
sjberman marked this conversation as resolved.
Show resolved Hide resolved

```text
GW_IP=XXX.YYY.ZZZ.III
GW_PORT=<port number>
```

Now we can create our application, route, and tracing policy.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

### Create the application and route

Create the basic **coffee** application:

```yaml
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: coffee
spec:
replicas: 2
selector:
matchLabels:
app: coffee
template:
metadata:
labels:
app: coffee
spec:
containers:
- name: coffee
image: nginxdemos/nginx-hello:plain-text
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: coffee
spec:
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
selector:
app: coffee
EOF
```

Next we'll create the Gateway resource and HTTPRoute for our app:
sjberman marked this conversation as resolved.
Show resolved Hide resolved

```yaml
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: cafe
spec:
gatewayClassName: nginx
listeners:
- name: http
port: 80
protocol: HTTP
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: coffee
spec:
parentRefs:
- name: cafe
hostnames:
- "cafe.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /coffee
backendRefs:
- name: coffee
port: 80
EOF
```

Let's ensure that traffic can flow to our application.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

{{< note >}}If you have a DNS record allocated for `cafe.example.com`, you can send the request directly to that hostname, without needing to resolve.{{< /note >}}

```shell
curl --resolve cafe.example.com:$GW_PORT:$GW_IP http://cafe.example.com:$GW_PORT/coffee
```

We should see a response from the coffee Pod.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

```text
Server address: 10.244.0.69:8080
Server name: coffee-6b8b6d6486-k5w5w
URI: /coffee
```

Assuming that you have access to the [Jaeger dashboard](http://127.0.0.1:16686) from earlier in the guide, you shouldn't see any tracing information yet. This means we need to create our `ObservabilityPolicy`.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

### Create the ObservabilityPolicy

To enable tracing for our coffee HTTPRoute, we create the following policy:
sjberman marked this conversation as resolved.
Show resolved Hide resolved

```yaml
kubectl apply -f - <<EOF
apiVersion: gateway.nginx.org/v1alpha1
kind: ObservabilityPolicy
metadata:
name: coffee
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: coffee
tracing:
strategy: ratio
ratio: 75
spanAttributes:
- key: coffee-key
value: coffee-value
EOF
```

This policy attaches to the coffee HTTPRoute and enables ratio-based tracing, where 75% of requests will be sampled. We've also included a span attribute to add extra data to the spans.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

Let's check the status of the policy:
sjberman marked this conversation as resolved.
Show resolved Hide resolved

```shell
kubectl describe observabilitypolicies.gateway.nginx.org coffee
```

```text
Status:
Ancestors:
Ancestor Ref:
Group: gateway.networking.k8s.io
Kind: HTTPRoute
Name: coffee
Namespace: default
Conditions:
Last Transition Time: 2024-05-23T18:13:03Z
Message: Policy is accepted
Observed Generation: 1
Reason: Accepted
Status: True
Type: Accepted
Controller Name: gateway.nginx.org/nginx-gateway-controller
```

The policy is accepted, so now let's send some more traffic. Run the following command multiple times.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

```shell
curl --resolve cafe.example.com:$GW_PORT:$GW_IP http://cafe.example.com:$GW_PORT/coffee
```

Once complete, let's refresh the Jaeger dashboard. We should now see a service entry called `ngf:default:cafe`, and a few traces. The service name by default is `ngf:<gateway-namespace>:<gateway-name>`.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

{{<img src="img/jaeger-trace-overview.png" alt="">}}

<br></br>

If we click into one of the traces, we can see the attributes.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

{{<img src="img/jaeger-trace-attributes.png" alt="">}}

As you can see, the trace includes the attribute from the global NginxProxy resource, set by the cluster operator, as well as the attribute from the ObservabilityPolicy, set by the application developer.
sjberman marked this conversation as resolved.
Show resolved Hide resolved

## Further Reading
sjberman marked this conversation as resolved.
Show resolved Hide resolved

TODO(sberman): link to reference docs again
sjberman marked this conversation as resolved.
Show resolved Hide resolved
3 changes: 1 addition & 2 deletions site/content/how-to/monitoring/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
title: "Troubleshooting"

weight: 300
weight: 400
toc: true
docs: "DOCS-1419"
---
Expand Down
Binary file added site/static/img/jaeger-trace-attributes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added site/static/img/jaeger-trace-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading