Skip to content

Commit

Permalink
WIP implement MicroProfile Fault Tolerance 4.1
Browse files Browse the repository at this point in the history
This includes support for OpenTelemetry Metrics. For testing, this
commit uses SmallRye OpenTelemetry.
  • Loading branch information
Ladicek committed Oct 3, 2024
1 parent b9afadb commit ff5192c
Show file tree
Hide file tree
Showing 29 changed files with 923 additions and 145 deletions.
4 changes: 2 additions & 2 deletions doc/antora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ asciidoc:
smallrye-fault-tolerance-version: '6.4.1'

microprofile-fault-tolerance: MicroProfile Fault Tolerance
microprofile-fault-tolerance-version: '4.0.2'
microprofile-fault-tolerance-url: https://download.eclipse.org/microprofile/microprofile-fault-tolerance-4.0/microprofile-fault-tolerance-spec-4.0.html
microprofile-fault-tolerance-version: '4.1'
microprofile-fault-tolerance-url: https://download.eclipse.org/microprofile/microprofile-fault-tolerance-4.1/microprofile-fault-tolerance-spec-4.1.html

vertx4-version: '4.5.8'
15 changes: 13 additions & 2 deletions doc/modules/ROOT/pages/integration/metrics.adoc
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
= Metrics

{smallrye-fault-tolerance} provides support for MicroProfile Metrics and Micrometer.
{smallrye-fault-tolerance} provides support for MicroProfile Metrics, OpenTelemetry and Micrometer.
Alternatively, metrics may be completely disabled at the integration level.

As usual, this integration is based on CDI.
{smallrye-fault-tolerance} includes an internal interface `MetricsProvider` and 3 different implementations.
{smallrye-fault-tolerance} includes an internal interface `MetricsProvider` and 4 different implementations.
Exactly 1 bean of type `MetricsProvider` must exist.
An instance of that bean is used to interact with the metrics system.

Expand All @@ -21,6 +21,7 @@ In addition to a zero-parameter constructor, there's a constructor that takes a
`MetricsIntegration` is an enum with these values:

* `MICROPROFILE_METRICS`: use MicroProfile Metrics integration
* `OPENTELEMETRY`: use OpenTelemetry (MicroProfile Telemetry) integration
* `MICROMETER`: use Micrometer integration
* `NOOP`: no metrics

Expand All @@ -34,6 +35,7 @@ The integrator may select the metrics provider by making sure that the correct i
The existing metrics providers are:

* `io.smallrye.faulttolerance.metrics.MicroProfileMetricsProvider`
* `io.smallrye.faulttolerance.metrics.OpenTelemetryProvider`
* `io.smallrye.faulttolerance.metrics.MicrometerProvider`
* `io.smallrye.faulttolerance.metrics.NoopProvider`

Expand All @@ -56,6 +58,15 @@ If MicroProfile Metrics are used, the integrator must ensure that the following
* `org.eclipse.microprofile.metrics:microprofile-metrics-api`;
* some implementation of MicroProfile Metrics.

=== OpenTelemetry

If OpenTelemetry is used, the integrator must ensure that the following artifact is present:

* `io.opentelemetry:opentelemetry-api`.

Further, a bean of type `io.opentelemetry.api.metrics.Meter` must exist.
This bean is used to emit the actual metrics.

=== Micrometer

If Micrometer is used, the integrator must ensure that the following artifact is present:
Expand Down
7 changes: 4 additions & 3 deletions doc/modules/ROOT/pages/integration/programmatic-api.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ After `StandaloneFaultTolerance.shutdown()`, it is not possible to reinitialize
=== Metrics

In the standalone implementation, MicroProfile Metrics make no sense, as that is exclusively based on CDI.
It is however possible to integrate with Micrometer.
It is however possible to integrate with OpenTelemetry or Micrometer.

The `Configuration.metricsAdapter()` method must be implemented and return an instance of `io.smallrye.faulttolerance.standalone.MicrometerAdapter`.
The constructor of `MicrometerAdapter` accepts the Micrometer registry (`MeterRegistry`) to which metrics shall be emitted.
The `Configuration.metricsAdapter()` method must be implemented and return an instance of `io.smallrye.faulttolerance.standalone.OpenTelemetryAdapter` or `io.smallrye.faulttolerance.standalone.MicrometerAdapter`.
The constructor of `OpenTelemetryAdapter` accepts the `Meter` to which metrics shall be emitted.
The constructor of `MicrometerAdapter` accepts the `MeterRegistry` to which metrics shall be emitted.
35 changes: 28 additions & 7 deletions doc/modules/ROOT/pages/reference/bulkhead.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,10 @@ Bulkhead exposes the following metrics:
[cols="1,5"]
|===
| Name | `ft.bulkhead.calls.total`
| Type | `Counter`
| Type
a| * MP Metrics: `Counter`
* OpenTelemetry: `LongCounter`
* Micrometer: `Counter`
| Unit | None
| Description | The number of times the bulkhead logic was run. This is usually once per method call, but may be zero times if the circuit breaker or rate limit prevented execution or more than once if the method call was retried.
| Tags
Expand All @@ -78,7 +81,10 @@ a| * `method` - the fully qualified method name
[cols="1,5"]
|===
| Name | `ft.bulkhead.executionsRunning`
| Type | `Gauge<Long>`
| Type
a| * MP Metrics: `Gauge<Long>`
* OpenTelemetry: `LongUpDownCounter`
* Micrometer: `Gauge`
| Unit | None
| Description | Number of currently running executions.
| Tags
Expand All @@ -88,7 +94,10 @@ a| * `method` - the fully qualified method name
[cols="1,5"]
|===
| Name | `ft.bulkhead.executionsWaiting`
| Type | `Gauge<Long>`
| Type
a| * MP Metrics: `Gauge<Long>`
* OpenTelemetry: `LongUpDownCounter`
* Micrometer: `Gauge`
| Unit | None
| Description | Number of executions currently waiting in the queue.
| Tags
Expand All @@ -99,8 +108,14 @@ a| * `method` - the fully qualified method name
[cols="1,5"]
|===
| Name | `ft.bulkhead.runningDuration`
| Type | `Histogram`
| Unit | Nanoseconds
| Type
a| * MP Metrics: `Histogram`
* OpenTelemetry: `DoubleHistogram` with explicit bucket boundaries `[0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10]`
* Micrometer: `Timer`
| Unit
a| * MP Metrics: nanoseconds
* OpenTelemetry: seconds
* Micrometer: nanoseconds
| Description | Histogram of the time that method executions spent running.
| Tags
a| * `method` - the fully qualified method name
Expand All @@ -109,8 +124,14 @@ a| * `method` - the fully qualified method name
[cols="1,5"]
|===
| Name | `ft.bulkhead.waitingDuration`
| Type | `Histogram`
| Unit | Nanoseconds
| Type
a| * MP Metrics: `Histogram`
* OpenTelemetry: `DoubleHistogram` with explicit bucket boundaries `[0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10]`
* Micrometer: `Timer`
| Unit
a| * MP Metrics: nanoseconds
* OpenTelemetry: seconds
* Micrometer: nanoseconds
| Description | Histogram of the time that method executions spent waiting in the queue.
| Tags
a| * `method` - the fully qualified method name
Expand Down
25 changes: 20 additions & 5 deletions doc/modules/ROOT/pages/reference/circuit-breaker.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,10 @@ Circuit breaker exposes the following metrics:
[cols="1,5"]
|===
| Name | `ft.circuitbreaker.calls.total`
| Type | `Counter`
| Type
a| * MP Metrics: `Counter`
* OpenTelemetry: `LongCounter`
* Micrometer: `Counter`
| Unit | None
| Description | The number of times the circuit breaker logic was run. This is usually once per method call, but may be more than once if the method call is retried.
| Tags
Expand All @@ -145,8 +148,14 @@ a| * `method` - the fully qualified method name
[cols="1,5"]
|===
| Name | `ft.circuitbreaker.state.total`
| Type | `Gauge<Long>`
| Unit | Nanoseconds
| Type
a| * MP Metrics: `Gauge<Long>`
* OpenTelemetry: `LongCounter`
* Micrometer: `TimeGauge`
| Unit
a| * MP Metrics: nanoseconds
* OpenTelemetry: nanoseconds
* Micrometer: nanoseconds
| Description | Amount of time the circuit breaker has spent in each state
| Tags
a| * `method` - the fully qualified method name
Expand All @@ -157,7 +166,10 @@ a| * `method` - the fully qualified method name
[cols="1,5"]
|===
| Name | `ft.circuitbreaker.opened.total`
| Type | `Counter`
| Type
a| * MP Metrics: `Counter`
* OpenTelemetry: `LongCounter`
* Micrometer: `Counter`
| Unit | None
| Description | Number of times the circuit breaker has moved from closed state to open state
| Tags
Expand All @@ -169,7 +181,10 @@ a| * `method` - the fully qualified method name
| Name | `ft.circuitbreaker.state.current`
2+a|
include::partial$srye-feature.adoc[]
| Type | `Gauge<Long>` (`0` or `1`)
| Type
a| * MP Metrics: `Gauge<Long>`
* OpenTelemetry: `LongUpDownCounter`
* Micrometer: `Gauge`
| Unit | None
| Description | Whether the circuit breaker is currently in given state (`1`) or not (`0`)
| Tags
Expand Down
103 changes: 11 additions & 92 deletions doc/modules/ROOT/pages/reference/metrics.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
= Metrics

{smallrye-fault-tolerance} exposes metrics to MicroProfile Metrics, as {microprofile-fault-tolerance-url}#_integration_with_microprofile_metrics[specified] by {microprofile-fault-tolerance}.
{smallrye-fault-tolerance} exposes metrics, as {microprofile-fault-tolerance-url}#_integration_with_microprofile_metrics_and_microprofile_telemetry[specified] by {microprofile-fault-tolerance}.
[[general]]
== General Metrics
Expand All @@ -10,7 +10,10 @@ For all methods guarded with some fault tolerance strategy, the following metric
[cols="1,5"]
|===
| Name | `ft.invocations.total`
| Type | `Counter`
| Type
a| * MP Metrics: `Counter`
* OpenTelemetry: `LongCounter`
* Micrometer: `Counter`
| Unit | None
| Description | The number of times the method was called.
| Tags
Expand Down Expand Up @@ -42,7 +45,10 @@ The behavior of the timer thread can be observed through the following metrics:
[cols="1,5"]
|===
| Name | `ft.timer.scheduled`
| Type | `Gauge<Integer>`
| Type
a| * MP Metrics: `Gauge<Integer>`
* OpenTelemetry: `LongUpDownCounter`
* Micrometer: `Gauge`
| Unit | None
| Description | The number of tasks that are currently scheduled (for future execution) on the timer.
| Tags
Expand All @@ -51,95 +57,8 @@ a| * `id` - the ID of the timer, to distinguish multiple timers in a multi-appli

== Micrometer Support

In addition to the MicroProfile Metrics support, {smallrye-fault-tolerance} also provides support for https://micrometer.io/[Micrometer].
The set of metrics emitted to Micrometer is the same as the set of metrics emitted to MicroProfile Metrics, using the same metric names and tags.
Metric types are mapped as closely as possible:

|===
| Name | MicroProfile Metrics | Micrometer | Note

| `ft.invocations.total`
| counter
| counter
|

| `ft.retry.calls.total`
| counter
| counter
|

| `ft.retry.retries.total`
| counter
| counter
|

| `ft.timeout.calls.total`
| counter
| counter
|

| `ft.timeout.executionDuration`
| histogram
| timer
|

| `ft.circuitbreaker.calls.total`
| counter
| counter
|

| `ft.circuitbreaker.state.total`
| gauge
| time gauge
|

| `ft.circuitbreaker.state.current`
| gauge
| gauge
| *

| `ft.circuitbreaker.opened.total`
| counter
| counter
|

| `ft.bulkhead.calls.total`
| counter
| counter
|

| `ft.bulkhead.executionsRunning`
| gauge
| gauge
|

| `ft.bulkhead.executionsWaiting`
| gauge
| gauge
|

| `ft.bulkhead.runningDuration`
| histogram
| timer
|

| `ft.bulkhead.waitingDuration`
| histogram
| timer
|

| `ft.ratelimit.calls.total`
| counter
| counter
| *

| `ft.timer.scheduled`
| gauge
| gauge
| *
|===

{empty}* This is a {smallrye-fault-tolerance} feature, not specified by {microprofile-fault-tolerance}.
In addition to the MicroProfile Metrics and OpenTelemetry support (as specified by {microprofile-fault-tolerance}), {smallrye-fault-tolerance} also provides support for https://micrometer.io/[Micrometer].
The set of metrics emitted to Micrometer is the same, using the same metric names and tags.

Note that distribution summaries in Micrometer, including timers, do not emit quantiles by default.
Micrometer recommends that libraries should not configure them out of the box, so if you need them, you should use a `MeterFilter`.
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/ROOT/pages/reference/programmatic-api.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,7 @@ private static final FaultTolerance<String> guarded = FaultTolerance.<String>cre
<1> A description of `hello` is set, it will be used as a value of the `method` tag in all metrics.

It is possible to create multiple `FaultTolerance` objects with the same description.
In this case, it won't be possbile to distinguish the different `FaultTolerance` objects in metrics; their values will be aggregated.
In this case, it won't be possible to distinguish the different `FaultTolerance` objects in metrics; their values will be aggregated.

If no description is provided, a random UUID is used.

Expand Down
5 changes: 4 additions & 1 deletion doc/modules/ROOT/pages/reference/rate-limit.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,10 @@ Rate limit exposes the following metrics:
[cols="1,5"]
|===
| Name | `ft.ratelimit.calls.total`
| Type | `Counter`
| Type
a| * MP Metrics: `Counter`
* OpenTelemetry: `LongCounter`
* Micrometer: `Counter`
| Unit | None
| Description | The number of times the rate limit logic was run. This is usually once per method call, but may be zero times if the circuit breaker prevented execution or more than once if the method call was retried.
| Tags
Expand Down
10 changes: 8 additions & 2 deletions doc/modules/ROOT/pages/reference/retry.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,10 @@ Retry exposes the following metrics:
[cols="1,5"]
|===
| Name | `ft.retry.calls.total`
| Type | `Counter`
| Type
a| * MP Metrics: `Counter`
* OpenTelemetry: `LongCounter`
* Micrometer: `Counter`
| Unit | None
| Description | The number of times the retry logic was run. This will always be once per method call.
| Tags
Expand All @@ -112,7 +115,10 @@ a| * `method` - the fully qualified method name
[cols="1,5"]
|===
| Name | `ft.retry.retries.total`
| Type | `Counter`
| Type
a| * MP Metrics: `Counter`
* OpenTelemetry: `LongCounter`
* Micrometer: `Counter`
| Unit | None
| Description | The number of times the method was retried
| Tags
Expand Down
15 changes: 12 additions & 3 deletions doc/modules/ROOT/pages/reference/timeout.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,10 @@ Timeout exposes the following metrics:
[cols="1,5"]
|===
| Name | `ft.timeout.calls.total`
| Type | `Counter`
| Type
a| * MP Metrics: `Counter`
* OpenTelemetry: `LongCounter`
* Micrometer: `Counter`
| Unit | None
| Description | The number of times the timeout logic was run. This is usually once per method call, but may be zero times if the circuit breaker or rate limit prevents execution or more than once if the method is retried.
| Tags
Expand All @@ -60,8 +63,14 @@ a| * `method` - the fully qualified method name
[cols="1,5"]
|===
| Name | `ft.timeout.executionDuration`
| Type | `Histogram`
| Unit | Nanoseconds
| Type
a| * MP Metrics: `Histogram`
* OpenTelemetry: `DoubleHistogram` with explicit bucket boundaries `[0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10]`
* Micrometer: `Timer`
| Unit
a| * MP Metrics: nanoseconds
* OpenTelemetry: seconds
* Micrometer: nanoseconds
| Description | Histogram of execution times for the method
| Tags
a| * `method` - the fully qualified method name
Expand Down
Loading

0 comments on commit ff5192c

Please sign in to comment.