Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding EventMeter Trigger #3812

Merged
merged 23 commits into from
Mar 15, 2023
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
7e3a455
In progress of integrating SDM collection rule
kkeirstead Feb 8, 2023
5ab78d7
In progress of integrating SDM collection rule
kkeirstead Feb 8, 2023
d2ee379
Building - not passing tests yet.
kkeirstead Feb 9, 2023
3c76bf1
Trigger is working and tests are passing
kkeirstead Feb 14, 2023
aac2249
Converting to a single value histogram design
kkeirstead Feb 21, 2023
fb6b854
Fixing up docs, tweaks/fixes for tests, switched percentile to int fr…
kkeirstead Feb 22, 2023
5067204
Cleaning up for PR
kkeirstead Feb 24, 2023
6883809
Docs fixes
kkeirstead Feb 28, 2023
b5b9edd
Fixed merge conflict
kkeirstead Feb 28, 2023
39e5d71
Using MeterName/InstrumentName as defaults, with ProviderName/Counter…
kkeirstead Mar 1, 2023
92c94ea
Docs update
kkeirstead Mar 1, 2023
12085a3
PR Feedback
kkeirstead Mar 7, 2023
23ff8f0
Merge branch 'main' into kkeirstead/SDM_CR_New
kkeirstead Mar 8, 2023
f47c33f
Experimenting with test instability.
kkeirstead Mar 10, 2023
8f016d2
Merge branch 'kkeirstead/SDM_CR_New' of https://github.com/kkeirstead…
kkeirstead Mar 10, 2023
02194ae
Still experimenting with test failures.
kkeirstead Mar 10, 2023
2f51c65
Switching back to only using .net 7 tfm for the problematic tests.
kkeirstead Mar 10, 2023
a729f01
Switched over to using EventMeter instead of SystemDiagnosticsMetrics…
kkeirstead Mar 14, 2023
0a07fa8
Update documentation/configuration/collection-rule-configuration.md
kkeirstead Mar 15, 2023
e02870f
Update documentation/configuration/collection-rule-configuration.md
kkeirstead Mar 15, 2023
153f92d
Update documentation/collectionrules/collectionrules.md
kkeirstead Mar 15, 2023
ef47c21
Update OptionsDisplayStrings.resx
kkeirstead Mar 15, 2023
a017074
Get the current TFM for a test.
kkeirstead Mar 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions documentation/collectionrules/collectionruleexamples.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,87 @@ This rule, named "LargeGCHeapSize", will trigger when the GC Heap Size exceeds 1

This rule, named "HighCpuUsage", will trigger when a process named "MyProcessName" causes CPU usage to exceed 60% for greater than 10 seconds. If the rule is triggered, a Cpu trace will be collected for the default duration (30 seconds), and egressed to the specified `Egress` provider (in this case, `artifacts` has been configured to save the trace to the local filesystem). There is a default `ActionCount` limit stating that this rule may only be triggered 5 times.

## Collect Logs - Custom Histogram Metrics (`SystemDiagnosticsMetrics` Trigger) (7.1+)

<details>
<summary>JSON</summary>

```json
{
"HighHistogramValues": {
"Trigger": {
"Type": "SystemDiagnosticsMetrics",
"Settings": {
"MeterName": "MyCustomMeter",
"InstrumentName": "MyCustomHistogram",
"HistogramPercentile": "95",
"GreaterThan": 175
}
},
"Actions": [
{
"Type": "CollectLogs",
"Settings": {
"Egress": "artifacts",
"DefaultLevel": "Warning",
"UseAppFilters": false,
"Duration": "00:00:30"
}
}
]
}
}
```
</details>

<details>
<summary>Kubernetes ConfigMap</summary>

```yaml
CollectionRules__HighHistogramValues__Trigger__Type: "SystemDiagnosticsMetrics"
CollectionRules__HighHistogramValues__Trigger__Settings__MeterName: "MyCustomMeter"
CollectionRules__HighHistogramValues__Trigger__Settings__InstrumentName: "MyCustomHistogram"
CollectionRules__HighHistogramValues__Trigger__Settings__HistogramPercentile: "95"
CollectionRules__HighHistogramValues__Trigger__Settings__GreaterThan: "175"
CollectionRules__HighHistogramValues__Actions__0__Type: "CollectLogs"
CollectionRules__HighHistogramValues__Actions__0__Settings__Egress: "artifacts"
CollectionRules__HighHistogramValues__Actions__0__Settings__DefaultLevel: "Warning"
CollectionRules__HighHistogramValues__Actions__0__Settings__UseAppFilters: "false"
CollectionRules__HighHistogramValues__Actions__0__Settings__Duration: "00:00:30"
```
</details>

<details>
<summary>Kubernetes Environment Variables</summary>

```yaml
- name: DotnetMonitor_CollectionRules__HighHistogramValues__Trigger__Type
value: "SystemDiagnosticsMetrics"
- name: DotnetMonitor_CollectionRules__HighHistogramValues__Trigger__Settings_MeterName
value: "MyCustomMeter"
- name: DotnetMonitor_CollectionRules__HighHistogramValues__Trigger__Settings__InstrumentName
value: "MyCustomHistogram"
- name: DotnetMonitor_CollectionRules__HighHistogramValues__Trigger__Settings__HistogramPercentile
value: "95"
- name: DotnetMonitor_CollectionRules__HighHistogramValues__Trigger__Settings__GreaterThan
value: "175"
- name: DotnetMonitor_CollectionRules__HighHistogramValues__Actions__0__Type
value: "CollectLogs"
- name: DotnetMonitor_CollectionRules__HighHistogramValues__Actions__0__Settings__Egress
value: "artifacts"
- name: DotnetMonitor_CollectionRules__HighHistogramValues__Actions__0__Settings__DefaultLevel
value: "Warning"
- name: DotnetMonitor_CollectionRules__HighHistogramValues__Actions__0__Settings__UseAppFilters
value: "false"
- name: DotnetMonitor_CollectionRules__HighHistogramValues__Actions__0__Settings__Duration
value: "00:00:30"
```
</details>

### Explanation

This rule, named "HighHistogramValues", will trigger when the custom histogram's values for the 95th percentile exceed the specified threshold (175) throughout the default sliding window duration (1 minute). If the rule is triggered, logs will be collected and egressed to the specified `Egress` provider (in this case, `artifacts` has been configured to save the logs to the local filesystem). There is a default `ActionCount` limit stating that this rule may only be triggered 5 times.

## Collect Dump - 4xx Response Status (`AspNetResponseStatus` Trigger)

<details>
Expand Down
1 change: 1 addition & 0 deletions documentation/collectionrules/collectionrules.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ The following are the currently available triggers:
| [AspNetRequestDuration](../configuration/collection-rule-configuration.md#aspnetrequestduration-trigger) | Event Pipe | Satisfied when the number of HTTP requests have response times longer than the threshold duration. |
| [AspNetResponseStatus](../configuration/collection-rule-configuration.md#aspnetresponsestatus-trigger) | Event Pipe | Satisfied when the number of HTTP responses that have status codes matching the pattern list is above the specified threshold. |
| [EventCounter](../configuration/collection-rule-configuration.md#eventcounter-trigger) | Event Pipe | Satisfied when the value of a counter falls above, below, or between the described threshold. |
| [SystemDiagnosticsMetrics](../configuration/collection-rule-configuration.md#systemdiagnosticsmetrics-trigger-71) | Event Pipe | Satisfied when the value of an instrument falls above, below, or between the described threshold. |
kkeirstead marked this conversation as resolved.
Show resolved Hide resolved

## Actions

Expand Down
106 changes: 106 additions & 0 deletions documentation/configuration/collection-rule-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Collection rules are specified in configuration as a named item under the `Colle
- [AspNetRequestDuration](#aspnetrequestduration-trigger)
- [AspNetResponseStatus](#aspnetresponsestatus-trigger)
- [EventCounter](#eventcounter-trigger)
- [SystemDiagnosticsMetrics](#systemdiagnosticsmetrics-trigger-71)
kkeirstead marked this conversation as resolved.
Show resolved Hide resolved
- [Trigger shortcuts](../collectionrules/triggershortcuts.md)
- [`Actions`](#actions) - The action to be be performed
- [CollectDump](#collectdump-action)
Expand Down Expand Up @@ -415,6 +416,111 @@ Usage that is satisfied when the CPU usage of the application is higher than 70%
```
</details>

### `SystemDiagnosticsMetrics` Trigger (7.1+)

A trigger that has its condition satisfied when the value of an instrument falls above, below, or between the described threshold value for a duration of time. Supported instruments include [Gauges](https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.metrics.observablegauge-1), [Counters](https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.metrics.counter-1), and [Histograms](https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.metrics.histogram-1).

#### Properties

| Name | Type | Required | Description | Default Value | Min Value | Max Value |
|---|---|---|---|---|---|---|
| `MeterName` | string | true | The name of the meter that provides the instrument information. | | | |
| `InstrumentName` | string | true | The name of the instrument to monitor. | | | |
| `GreaterThan` | double? | false | The threshold level the instrument must maintain (or higher) for the specified duration. Either `GreaterThan` or `LessThan` (or both) must be specified for non-histogram instruments. | `null` | | |
| `LessThan` | double? | false | The threshold level the instrument must maintain (or lower) for the specified duration. Either `GreaterThan` or `LessThan` (or both) must be specified for non-histogram instruments. | `null` | | |
| `SlidingWindowDuration` | TimeSpan? | false | The sliding time window in which the instrument must maintain its value as specified by the threshold levels in `GreaterThan` and/or `LessThan`. | `"00:01:00"` (one minute) | `"00:00:01"` (one second) | `"1.00:00:00"` (1 day) |
| `HistogramPercentile` | int? | false | The histogram percentile should be one of the instrument's published percentiles (by default: 50, 95, and 99) and is only specified when the instrument is a histogram. The provided percentile's value will be used to compare against `GreaterThan` and/or `LessThan`. | | 0 | 100 |

#### Example

Usage that is satisfied when the target application's custom gauge is greater than 20 for a 10 second window.

<details>
<summary>JSON</summary>

```json
{
"MeterName": "MyMeterName",
"InstrumentName": "MyGaugeName",
"GreaterThan": 20,
"SlidingWindowDuration": "00:00:10"
}
```
</details>

<details>
<summary>Kubernetes ConfigMap</summary>

```yaml
CollectionRules__RuleName__Trigger__Settings__MeterName: "MyMeterName"
CollectionRules__RuleName__Trigger__Settings__InstrumentName: "MyGaugeName"
CollectionRules__RuleName__Trigger__Settings__GreaterThan: "20"
CollectionRules__RuleName__Trigger__Settings__SlidingWindowDuration: "00:00:10"
```
</details>

<details>
<summary>Kubernetes Environment Variables</summary>

```yaml
- name: DotnetMonitor_CollectionRules__RuleName__Trigger__Settings__MeterName
value: "MyMeterName"
- name: DotnetMonitor_CollectionRules__RuleName__Trigger__Settings__InstrumentName
value: "MyGaugeName"
- name: DotnetMonitor_CollectionRules__RuleName__Trigger__Settings__GreaterThan
value: "20"
- name: DotnetMonitor_CollectionRules__RuleName__Trigger__Settings__SlidingWindowDuration
value: "00:00:10"
```
</details>

#### Example

Usage that is satisfied when the target application's custom histogram for a 10 second window has its 50th Percentile greater than 200:

<details>
<summary>JSON</summary>

```json
{
"MeterName": "MyMeterName",
"InstrumentName": "MyHistogramName",
"GreaterThan": 200,
"HistogramPercentile": 50,
"SlidingWindowDuration": "00:00:10"
}
```
</details>

<details>
<summary>Kubernetes ConfigMap</summary>

```yaml
CollectionRules__RuleName__Trigger__Settings__MeterName: "MyMeterName"
CollectionRules__RuleName__Trigger__Settings__InstrumentName: "MyHistogramName"
CollectionRules__RuleName__Trigger__Settings__GreaterThan: "200"
CollectionRules__RuleName__Trigger__Settings__HistogramPercentile: "50"
CollectionRules__RuleName__Trigger__Settings__SlidingWindowDuration: "00:00:10"
```
</details>

<details>
<summary>Kubernetes Environment Variables</summary>

```yaml
- name: DotnetMonitor_CollectionRules__RuleName__Trigger__Settings__MeterName
value: "MyMeterName"
- name: DotnetMonitor_CollectionRules__RuleName__Trigger__Settings__InstrumentName
value: "MyGaugeName"
- name: DotnetMonitor_CollectionRules__RuleName__Trigger__Settings__GreaterThan
value: "200"
- name: DotnetMonitor_CollectionRules__RuleName__Trigger__Settings__HistogramPercentile
value: "50"
- name: DotnetMonitor_CollectionRules__RuleName__Trigger__Settings__SlidingWindowDuration
value: "00:00:10"
```
</details>

### Built-In Default Triggers

These [trigger shortcuts](../collectionrules/triggershortcuts.md) simplify configuration for several common `EventCounter` providers.
Expand Down
68 changes: 68 additions & 0 deletions documentation/schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -724,6 +724,19 @@
}
}
},
{
"required": [
"Settings"
],
"properties": {
"Type": {
"const": "SystemDiagnosticsMetrics"
},
"Settings": {
"$ref": "#/definitions/SystemDiagnosticsMetricsOptions"
}
}
},
{
"properties": {
"Type": {
Expand Down Expand Up @@ -2197,6 +2210,7 @@
"CPUUsage",
"GCHeapSize",
"ThreadpoolQueueLength",
"SystemDiagnosticsMetrics",
"Startup"
]
},
Expand Down Expand Up @@ -2492,6 +2506,60 @@
}
}
},
"SystemDiagnosticsMetricsOptions": {
"type": "object",
"additionalProperties": false,
"required": [
"MeterName",
"InstrumentName"
],
"properties": {
"MeterName": {
"type": "string",
"description": "The name of the meter that provides the instrument information.",
"minLength": 1
},
"InstrumentName": {
"type": "string",
"description": "The name of the instrument to monitor.",
"minLength": 1
},
"GreaterThan": {
"type": [
"null",
"number"
],
"description": "The threshold level the instrument must maintain (or higher) for the specified duration. Either GreaterThan or LessThan (or both) must be specified.",
"format": "double"
},
"LessThan": {
"type": [
"null",
"number"
],
"description": "The threshold level the instrument must maintain (or lower) for the specified duration. Either GreaterThan or LessThan (or both) must be specified.",
"format": "double"
},
"SlidingWindowDuration": {
"type": [
"null",
"string"
],
"description": "The sliding time window in which the instrument must maintain its value as specified by the threshold levels in GreaterThan and LessThan.",
"format": "duration"
},
"HistogramPercentile": {
"type": [
"integer",
"null"
],
"description": "When monitoring a histogram, this dictates which percentile to compare against using the value in GreaterThan/LessThan - by default, the percentile can be 50, 95, or 99.",
"format": "int32",
"maximum": 100.0,
"minimum": 0.0
}
}
},
"JsonConsoleFormatterOptions": {
"type": "object",
"additionalProperties": false,
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -807,6 +807,30 @@
<value>The maximum number of time series that can be tracked. Each unique combination of provider name, metric name, and dimension values counts as one time series. Tracking more time series uses more memory in the target process so this bound guards against unintentional high memory use.</value>
<comment>The description provided for the MaxTimeSeries parameter on MetricsOptions.</comment>
</data>
<data name="DisplayAttributeDescription_SystemDiagnosticsMetricsOptions_InstrumentName" xml:space="preserve">
<value>The name of the instrument to monitor.</value>
<comment>The description provided for the InstrumentName parameter on SystemDiagnosticsMetricsOptions.</comment>
</data>
<data name="DisplayAttributeDescription_SystemDiagnosticsMetricsOptions_GreaterThan" xml:space="preserve">
<value>The threshold level the instrument must maintain (or higher) for the specified duration. Either GreaterThan or LessThan (or both) must be specified.</value>
<comment>The description provided for the GreaterThan parameter on SystemDiagnosticsMetricsOptions.</comment>
</data>
<data name="DisplayAttributeDescription_SystemDiagnosticsMetricsOptions_HistogramPercentile" xml:space="preserve">
<value>When monitoring a histogram, this dictates which percentile to compare against using the value in GreaterThan/LessThan - by default, the percentile can be 50, 95, or 99.</value>
<comment>The description provided for the HistogramPercentile parameter on SystemDiagnosticsMetricsOptions.</comment>
</data>
<data name="DisplayAttributeDescription_SystemDiagnosticsMetricsOptions_LessThan" xml:space="preserve">
<value>The threshold level the instrument must maintain (or lower) for the specified duration. Either GreaterThan or LessThan (or both) must be specified.</value>
<comment>The description provided for the LessThan parameter on SystemDiagnosticsMetricsOptions.</comment>
</data>
<data name="DisplayAttributeDescription_SystemDiagnosticsMetricsOptions_MeterName" xml:space="preserve">
<value>The name of the meter that provides the instrument information.</value>
<comment>The description provided for the MeterName parameter on SystemDiagnosticsMetricsOptions.</comment>
</data>
<data name="DisplayAttributeDescription_SystemDiagnosticsMetricsOptions_SlidingWindowDuration" xml:space="preserve">
<value>The sliding time window in which the instrument must maintain its value as specified by the threshold levels in GreaterThan and LessThan.</value>
<comment>The description provided for the SlidingWindowDuration parameter on SystemDiagnosticsMetricsOptions.</comment>
</data>
<data name="DisplayAttributeDescription_GlobalCounterOptions_Providers" xml:space="preserve">
<value>Dictionary of provider names and their global configuration.</value>
</data>
Expand Down
Loading