Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add documentation #11

Merged
merged 8 commits into from
Jun 28, 2024
231 changes: 201 additions & 30 deletions docs/workshop.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ feedback link:
## Introduction

This workshop aims to introduce how to make a Java application fully observable with:
* Proper logs with insightful information
* Logs with insightful information
* Metrics with [Prometheus](https://prometheus.io/)
* [Distributed Tracing](https://blog.touret.info/2023/09/05/distributed-tracing-opentelemetry-camel-artemis/)

Expand Down Expand Up @@ -162,7 +162,17 @@ The "infrastructure stack" is composed of the following components:
* One [Configuration server](https://docs.spring.io/spring-cloud-config/) is also used to centralise the configuration of our microservices.
* The following microservices: API Gateway, Merchant BO, Fraud Detect, Smart Bank Gateway

To run it, execute the following command
If you run your application on GitPod, the following step is automatically started at the startup.

Otherwise, to run it on your desktop, execute the following commands

``` bash
$ bash scripts/download-agent.sh
```

``` bash
$ ./gradlew tasks
```

``` bash
$ docker compose up -d --build --remove-orphans
Expand All @@ -173,22 +183,10 @@ To check if all the services are up, you can run this command:
$ docker compose ps -a
```
And check the status of every service.

### Start the rest of our microservices

You can now start the application with the following commands.
For each you must start a new terminal in VSCode.

#### The REST Easy Pay Service
Run the following command:

```bash
$ ./gradlew :easypay-service:bootRun -x test
```

#### Validation

Open the [Eureka](https://cloud.spring.io/spring-cloud-netflix/) website started during the infrastructure setup
Open the [Eureka](https://cloud.spring.io/spring-cloud-netflix/) website started during the infrastructure setup.

If you run this workshop on your desktop, you can go to this URL: http://localhost:8761.
If you run it on GitPod, you can go to the corresponding URL (e.g., https://8761-worldline-observability-w98vrd59k5h.ws-eu114.gitpod.io) instead.
Expand Down Expand Up @@ -512,9 +510,7 @@ Restart the application activating the ``mdc`` profile and see how the logs look
> aside positive
>
> You can verify the MDC profile is applied by checking the presence of this log message:
> ```shell
The following 2 profiles are active: "default", "mdc"
```
> ``The following 2 profiles are active: "default", "mdc"``
>

### Adding more content in our logs
Expand All @@ -527,8 +523,56 @@ Run the following command:
$ k6 run -u 5 -d 5s k6/01-payment-only.js
```

Check then the logs to pinpoint some exceptions
Check then the logs to pinpoint some exceptions.

### Personal Identifiable Information (PII) bfuscation
For compliance and preventing personal data loss, we will obfuscate the card number in the logs:

In the Alloy configuration file (``docker/alloy/config.alloy``), add the [luhn stage](https://grafana.com/docs/alloy/latest/reference/components/loki.process/#stageluhn-block) into the ``jsonlogs`` loki process stage

``
stage.luhn {
replacement= "**DELETED**"
}
``

We will then have the following configuration for processing the JSON logs:

```
loki.process "jsonlogs" {
forward_to = [loki.write.endpoint.receiver]

stage.luhn {
replacement= "**DELETED**"
}

stage.json {
expressions = {
// timestamp = "timestamp",
application = "context.properties.applicationName",
instance = "context.properties.instance",
trace_id = "mdc.trace_id",
}
}

stage.labels {
values = {
application = "application",
instance = "instance",
trace_id = "trace_id",
}
}

}

```


Restart then Alloy:

```bash
$ docker restart collector
```
### Logs Correlation
> aside positive
>
Expand All @@ -551,14 +595,9 @@ Check out the Logging configuration in the ``docker/alloy/config.alloy`` file:

```json
////////////////////
// LOGS
// (1) LOGS
////////////////////

// CLASSIC LOGS FILES
local.file_match "logs" {
path_targets = [{"__path__" = "/logs/*.log", "exporter" = "LOGFILE"}]
}

loki.source.file "logfiles" {
targets = local.file_match.logs.targets
forward_to = [loki.write.endpoint.receiver]
Expand Down Expand Up @@ -667,10 +706,50 @@ Now get the prometheus metrics using this command:
http :8080/actuator/prometheus
```

You can also have an overview of all the prometheus endpoints metrics on the Prometheus dashboad .
You can also have an overview of all the prometheus endpoints metrics on the Prometheus dashboard.

Go to ``http://localhost:9090`` and explore the different endpoints in ``eureka-discovery``.


### How are scraped the metrics?

Check out the Prometheus (``docker/prometheus/prometheus.yml``) configuration file.
All the scraper's definitions are configured here.

For instance, here is the configuration of the configuration server:

```yaml
- job_name: prometheus-config-server
scrape_interval: 5s
scrape_timeout: 5s
metrics_path: /actuator/prometheus
static_configs:
- targets:
- config-server:8890
```

You can see it uses under the hood the endpoint we looked into earlier.

Prometheus reaches first Eureka to for discovering what are the servers to scrap.
It then scrapes all the plugged instances in the same way:

```yaml
# Discover targets from Eureka and scrape metrics from them (Whitebox monitoring)
- job_name: eureka-discovery
scrape_interval: 5s
scrape_timeout: 5s
eureka_sd_configs:
- server: http://discovery-server:8761/eureka (1)
refresh_interval: 5s
relabel_configs: (2)
- source_labels: [__meta_eureka_app_instance_metadata_metrics_path]
target_label: __metrics_path__
```
1. We plugged Prometheus to Eureka to explore all the metrics of the underlying systems
2. To pinpoint what is the service and its metric, and set up the final metric which will be stored into Prometheus, we sat up this matching.

### Let's explore the metrics

Go then to Grafana and start again a ``explore`` dashboard.

Select the ``Prometheus`` datasource.
Expand All @@ -692,10 +771,6 @@ Explore the dashboard, especially the Garbage collector and CPU statistics.

Look around the JDBC dashboard then and see what happens on the database connection pool.

> aside negative
>
> TODO Détailler

Now, let's go back to the Loki explore dashboard and see what happens:

Create a query with the following parameters:
Expand Down Expand Up @@ -757,5 +832,101 @@ Explore the corresponding SQL queries and their response times.

Finally, check the traces from different services (e.g., ``api-gateway``).

### Sampling

To avoid storing useless data into Tempo, we can sample the data in two ways:
* [Head Sampling](https://opentelemetry.io/docs/concepts/sampling/#head-sampling)
* [Tail Sampling](https://opentelemetry.io/docs/concepts/sampling/#head-sampling)

In this workshop, we will implement the latter.

In the alloy configuration file (``docker/alloy/config.alloy``), put this configuration just after the ``SAMPLING`` comment:
```
// SAMPLING
//
otelcol.processor.tail_sampling "actuator" {
policy {
name = "filter_http_url"
type = "string_attribute"
string_attribute {
key = "http.url"
values = ["/actuator/health", "/actuator/prometheus"]
enabled_regex_matching = true
invert_match = true
}
}

policy {
name = "filter_url_path"
type = "string_attribute"
string_attribute {
key = "url.path"
values = ["/actuator/health", "/actuator/prometheus"]
enabled_regex_matching = true
invert_match = true
}
}
```

This configuration will filter the [SPANs](https://opentelemetry.io/docs/concepts/signals/traces/#spans) created from ``/actuator`` API calls.

Restart then Alloy.

```bash
$ docker compose restart collector
```

## Correlate Traces, Logs
Duration: 0:15:00


Let's go back to the Grafana explore dashboard.
Select the ``Loki`` datasource
As a label filter, select ``easypay-service``
Run a query and select a log entry.

Now check you have a ``mdc`` JSON element which includes both [``trace_id``](https://www.w3.org/TR/trace-context/#trace-id) and [``span_id``](https://www.w3.org/TR/trace-context/#parent-id).
They will help us correlate our different requests logs and traces.

> aside positive
>
> These notions are part of the [W3C Trace Context Specification](https://www.w3.org/TR/trace-context/).

Now, go below in the Fields section.
You should see a ``Links`` sub-section with a ``View Trace`` button.

Click on it.
You will see the corresponding trace of this log.

Now you can correlate logs and metrics!
If you have any exceptions in your error logs, you can now check out where it happens and see the big picture of the transaction (as a customer point of view).

### How was it done?

When you enable the MDC on your logs, you always have filled the ``trace_id``.

Then to enable the link, we added the following configuration into the Alloy configuration file:

```yaml
stage.json { (1)
expressions = {
// timestamp = "timestamp",
application = "context.properties.applicationName",
instance = "context.properties.instance",
trace_id = "mdc.trace_id",
}
}

stage.labels { (2)
values = {
application = "application",
instance = "instance",
trace_id = "trace_id",
}
}
```

1. The first step extracts from the JSON file the ``trace_id`` field.
2. The label is then created to be eventually used on a Grafana dashboard.
3. _Et voila!_

Loading