worldline · alexandre-touret · Jun 28, 2024 · Jun 25, 2024 · Jun 26, 2024 · Jun 26, 2024
diff --git a/docs/workshop.md b/docs/workshop.md
@@ -11,7 +11,7 @@ feedback link:
 ## Introduction
 
 This workshop aims to introduce how to make a Java application fully observable with:
-* Proper logs with insightful information
+* Logs with insightful information
 * Metrics with [Prometheus](https://prometheus.io/)
 * [Distributed Tracing](https://blog.touret.info/2023/09/05/distributed-tracing-opentelemetry-camel-artemis/)
 
@@ -162,7 +162,17 @@ The "infrastructure stack" is composed of the following components:
 * One [Configuration server](https://docs.spring.io/spring-cloud-config/) is also used to centralise the configuration of our microservices.
 * The following microservices: API Gateway, Merchant BO, Fraud Detect, Smart Bank Gateway
 
-To run it, execute the following command
+If you run your application on GitPod, the following step is automatically started at the startup.
+
+Otherwise, to run it on your desktop, execute the following commands
+
+``` bash
+$ bash scripts/download-agent.sh
+```
+
+``` bash
+$ ./gradlew tasks
+```
 
 ``` bash
 $ docker compose up -d --build --remove-orphans
@@ -173,22 +183,10 @@ To check if all the services are up, you can run this command:
 $ docker compose ps -a
 ```
 And check the status of every service.
-
-### Start the rest of our microservices
-
-You can now start the application with the following commands.
-For each you must start a new terminal in VSCode.
-
-#### The REST Easy Pay Service
-Run the following command:
-
-```bash
-$ ./gradlew :easypay-service:bootRun -x test
-```
 
 #### Validation
 
-Open the [Eureka](https://cloud.spring.io/spring-cloud-netflix/) website started during the infrastructure setup
+Open the [Eureka](https://cloud.spring.io/spring-cloud-netflix/) website started during the infrastructure setup.
 
 If you run this workshop on your desktop, you can go to this URL: http://localhost:8761.
 If you run it on GitPod, you can go to the corresponding URL (e.g., https://8761-worldline-observability-w98vrd59k5h.ws-eu114.gitpod.io) instead.
@@ -512,9 +510,7 @@ Restart the application activating the ``mdc`` profile and see how the logs look
 > aside positive
 >
 > You can verify the MDC profile is applied by checking the presence of this log message:
-> ```shell
-The following 2 profiles are active: "default", "mdc"
-```
+> ``The following 2 profiles are active: "default", "mdc"``
 > 
 
 ### Adding more content in our logs
@@ -527,8 +523,56 @@ Run the following command:
 $ k6 run -u 5 -d 5s k6/01-payment-only.js
 ```
 
-Check then the logs to pinpoint some exceptions
+Check then the logs to pinpoint some exceptions.
+
+### Personal Identifiable Information (PII)  bfuscation
+For compliance and preventing personal data loss, we will obfuscate the card number in the logs:
+
+In the Alloy configuration file (``docker/alloy/config.alloy``), add the [luhn stage](https://grafana.com/docs/alloy/latest/reference/components/loki.process/#stageluhn-block) into the ``jsonlogs`` loki process stage
+
+``
+stage.luhn {
+replacement= "**DELETED**"
+}
+``
+
+We will then have the following configuration for processing the JSON logs:
+
+```
+loki.process "jsonlogs" {
+	forward_to = [loki.write.endpoint.receiver]
+
+	stage.luhn {
+    	    replacement= "**DELETED**"
+    	}
+
+	stage.json {
+		expressions = {
+			// timestamp   = "timestamp",
+			application = "context.properties.applicationName",
+			instance    = "context.properties.instance",
+			trace_id    = "mdc.trace_id",
+		}
+	}
+
+	stage.labels {
+		values = {
+			application = "application",
+			instance    = "instance",
+			trace_id    = "trace_id",
+		}
+	}
+
+}
+
+```
+
+
+Restart then Alloy: 
 
+```bash
+$ docker restart collector
+```
 ### Logs Correlation  
 > aside positive
 >
@@ -551,14 +595,9 @@ Check out the Logging configuration in the ``docker/alloy/config.alloy`` file:
 
 ```json
 ////////////////////
-// LOGS
+// (1) LOGS
 ////////////////////
 
-// CLASSIC LOGS FILES
-local.file_match "logs" {
-	path_targets = [{"__path__" = "/logs/*.log", "exporter" = "LOGFILE"}]
-}
-
 loki.source.file "logfiles" {
 	targets    = local.file_match.logs.targets
 	forward_to = [loki.write.endpoint.receiver]
@@ -667,10 +706,50 @@ Now get the prometheus metrics using this command:
 http :8080/actuator/prometheus
 ```
 
-You can also have an overview of all the prometheus endpoints metrics on the Prometheus dashboad . 
+You can also have an overview of all the prometheus endpoints metrics on the Prometheus dashboard. 
 
 Go to ``http://localhost:9090`` and explore the different endpoints in ``eureka-discovery``.
 
+
+### How are scraped the metrics?
+
+Check out the Prometheus (``docker/prometheus/prometheus.yml``) configuration file.
+All the scraper's definitions are configured here.
+
+For instance, here is the configuration of the configuration server:
+
+```yaml
+  - job_name: prometheus-config-server
+    scrape_interval: 5s
+    scrape_timeout: 5s
+    metrics_path: /actuator/prometheus
+    static_configs:
+      - targets:
+          - config-server:8890
+```
+
+You can see it uses under the hood the endpoint we looked into earlier.
+
+Prometheus reaches first Eureka to for discovering what are the servers to scrap.
+It then scrapes all the plugged instances in the same way:
+
+```yaml
+  # Discover targets from Eureka and scrape metrics from them (Whitebox monitoring)
+  - job_name: eureka-discovery
+    scrape_interval: 5s
+    scrape_timeout: 5s
+    eureka_sd_configs:
+      - server: http://discovery-server:8761/eureka (1)
+        refresh_interval: 5s
+    relabel_configs: (2)
+      - source_labels: [__meta_eureka_app_instance_metadata_metrics_path]
+        target_label: __metrics_path__
+```
+1. We plugged Prometheus to Eureka to explore all the metrics of the underlying systems
+2. To pinpoint what is the service and its metric, and set up the final metric which will be stored into Prometheus, we sat up this matching.
+
+### Let's explore the metrics
+
 Go then to Grafana and start again a ``explore`` dashboard.
 
 Select the ``Prometheus`` datasource.
@@ -692,10 +771,6 @@ Explore the dashboard, especially the Garbage collector and CPU statistics.
 
 Look around the JDBC dashboard then and see what happens on the database connection pool.
 
-> aside negative
->
-> TODO Détailler
-
 Now, let's go back to the Loki explore dashboard and see what happens:
 
 Create a query with the following parameters:
@@ -757,5 +832,101 @@ Explore the corresponding SQL queries and their response times.
 
 Finally, check the traces from different services (e.g., ``api-gateway``).
 
+### Sampling
+
+To avoid storing useless data into Tempo, we can sample the data in two ways:
+* [Head Sampling](https://opentelemetry.io/docs/concepts/sampling/#head-sampling)
+* [Tail Sampling](https://opentelemetry.io/docs/concepts/sampling/#head-sampling)
+
+In this workshop, we will implement the latter.
+
+In the alloy configuration file (``docker/alloy/config.alloy``), put this configuration just after the ``SAMPLING`` comment:
+```
+// SAMPLING
+//
+otelcol.processor.tail_sampling "actuator" {
+policy {
+name = "filter_http_url"
+type = "string_attribute"
+string_attribute {
+key = "http.url"
+values = ["/actuator/health", "/actuator/prometheus"]
+enabled_regex_matching = true
+invert_match = true
+}
+}
+
+	policy {
+		name = "filter_url_path"
+		type = "string_attribute"
+		string_attribute {
+			key = "url.path"
+			values = ["/actuator/health", "/actuator/prometheus"]
+			enabled_regex_matching = true
+			invert_match = true
+		}
+	}
+```
+
+This configuration will filter the [SPANs](https://opentelemetry.io/docs/concepts/signals/traces/#spans) created from ``/actuator`` API calls.
+
+Restart then Alloy.
+
+```bash
+$ docker compose restart collector
+```
+
 ## Correlate Traces, Logs
 Duration: 0:15:00
+
+
+Let's go back to the Grafana explore dashboard. 
+Select the ``Loki`` datasource
+As a label filter, select ``easypay-service``
+Run a query and select a log entry.
+
+Now check you have a ``mdc`` JSON element which includes both [``trace_id``](https://www.w3.org/TR/trace-context/#trace-id) and [``span_id``](https://www.w3.org/TR/trace-context/#parent-id).
+They will help us correlate our different requests logs and traces.
+
+> aside positive
+>
+> These notions are part of the [W3C Trace Context Specification](https://www.w3.org/TR/trace-context/).
+
+Now, go below in the Fields section. 
+You should see a ``Links`` sub-section with a ``View Trace`` button.
+
+Click on it.
+You will see the corresponding trace of this log.
+
+Now you can correlate logs and metrics!
+If you have any exceptions in your error logs, you can now check out where it happens and see the big picture of the transaction (as a customer point of view).
+
+### How was it done?
+
+When you enable the MDC on your logs, you always have filled the ``trace_id``.
+
+Then to enable the link, we added the following configuration into the Alloy configuration file:
+
+```yaml
+stage.json { (1)
+		expressions = {
+			// timestamp   = "timestamp",
+			application = "context.properties.applicationName",
+			instance    = "context.properties.instance",
+			trace_id    = "mdc.trace_id",
+		}
+	}
+
+	stage.labels { (2)
+		values = {
+			application = "application",
+			instance    = "instance",
+			trace_id    = "trace_id",
+		}
+	}
+```
+
+1. The first step extracts from the JSON file the ``trace_id`` field.
+2. The label is then created to be eventually used on a Grafana dashboard.
+3. _Et voila!_
+