[Metricbeat] Add aggregation aligner as a config param for stackdriver metricset in GCP #17719

kaiyan-sheng · 2020-04-15T02:10:40Z

What does this PR do?

This PR is to add aligner config parameter for stackdriver metriset under metrics.
Add suffix to metric names to show what aligner is used. For example: cpu.utilization.avg for ALIGN_MEAN, cpu.utilization.sum for ALIGN_SUM and cpu.utilization.value for ALIGN_NONE.
Take into account GCP has a ingest delay for monitoring metrics to show up in StackDriver.
Also user can specify collection period to be 1 min as the minimum instead of 5 min.
Fixed metricbeat/docs/fields.asciidoc to include mappings from googlecloud module.
Removed data-*.json files that are not generated by integration test TestData for each metricset.

Why is it important?

ListMetricDescriptors API is used to get metadata sample period and ingest delay for each metric type once at the start of this module. If sample period is smaller than the collection period, aggregation will be used in ListTimeSeries API. By default, aligner is ALIGN_NONE. This means if user specify this Metricbeat collection period to be 5m and the metric type sample period is 60s, then Metricbeat will return 5 raw data points (1 for each minute) in one ListTimeSeries API call. This will save cost significantly if user does not mind the extra delay. If user wants to only return one aggregated metric per collection period, aligner can be specified, such as ALIGN_MEAN, ALIGN_SUM and etc.

Monitoring collects one measurement each minute (the sampling rate), but it can take up to 4 minutes before you can retrieve the data (latency). In order to make sure the collection is successful, we delay collection startTime and endTime for a number of minutes defined by ingest delay every time. Instead of hardcoding ingest delay to 4 minute, this is obtained from ListMetricDescriptors API for each metric type.

Assume ingest delay = 4-minute, sample period = 1-minute and collection period = 1-minute, when querying GCP API timeSeries.list, the time interval changed to:

current timestamp	startTime	endTime
01:00	00:55	00:56
01:01	00:56	00:57
01:02	00:57	00:58
01:03	00:58	00:59
01:04	00:59	01:00

Therefore, data collection will always have a delay. This is consistent with monitoring in GCP portal.

Assume ingest delay = 4-minute, sample period = 5-minute, aggregation aligner is ALIGN_MEAN and collection period = 5-minute, when querying GCP API timeSeries.list, the time interval changed to:

current timestamp	startTime	endTime
01:00	00:51	00:56
01:05	00:56	01:01
01:10	01:01	01:06
01:15	01:06	01:11
01:20	01:11	01:16

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Two test cases here:

Use config below and you should see 5 metrics every 5 minutes:

- module: googlecloud
  metricsets:
    - stackdriver
  zone: "europe-west1-c"
  project_id: elastic-observability
  credentials_file_path: "/Users/kaiyansheng/Downloads/elastic-observability.json"
  exclude_labels: false
  period: 300s
  stackdriver:
    service: compute
    metrics:
      - aligner: ALIGN_MEAN
        metric_types:
          - "compute.googleapis.com/instance/cpu/usage_time"
          - "compute.googleapis.com/instance/cpu/utilization"
      - aligner: ALIGN_SUM
        metric_types:
          - "compute.googleapis.com/instance/uptime"

and output event looks like this:

{
  "_source": {
    "@timestamp": "2020-04-21T22:18:48.000Z",
    "googlecloud": {
      "stackdriver": {
        "instance": {
          "uptime": {
            "sum": 300
          },
          "cpu": {
            "usage_time": {
              "avg": 136.33796208361164
            },
            "utilization": {
              "avg": 0.5680748420150485
            }
          }
        }
      }
    }
  }
}

Use config below and you should see 1 metric every 1 minute:

- module: googlecloud
  metricsets:
    - stackdriver
  zone: "europe-west1-c"
  project_id: elastic-observability
  credentials_file_path: "/Users/kaiyansheng/Downloads/elastic-observability.json"
  exclude_labels: false
  period: 60s
  stackdriver:
    service: compute
    metrics:
      - metric_types:
          - "compute.googleapis.com/instance/cpu/usage_time"
          - "compute.googleapis.com/instance/cpu/utilization"
          - "compute.googleapis.com/instance/uptime"

and output event looks like this:

{
  "_source": {
    "@timestamp": "2020-04-21T22:31:00.000Z",
    "cloud.availability_zone": "europe-west1-c",
    "googlecloud": {
      "stackdriver": {
        "instance": {
          "uptime": {
            "raw": 60
          },
          "cpu": {
            "usage_time": {
              "raw": 148.0562956505455
            },
            "utilization": {
              "raw": 0.616901231877273
            }
          }
        }
      }
    }
  }
}

Related issues

Closes [Metricbeat] Investigate collection period=1m for googlecloud module #17141

TODOs

This PR is getting too big so I will list things need to be done in separate PRs:

Investigate distribution type value in google cloud:

beats/x-pack/metricbeat/module/googlecloud/stackdriver/response_parser.go

Line 99 in 95626b8

case *monitoring.TypedValue_DistributionValue:
change aligner to aligners for a list of strings
move service into metrics config and change config to look like this:

- module: googlecloud
  metricsets:
    - stackdriver
  zone: "europe-west1-c"
  project_id: elastic-observability
  credentials_file_path: "/Users/kaiyansheng/Downloads/elastic-observability.json"
  exclude_labels: false
  period: 60s
  metrics:
    - service: compute
      metric_types:
        - "compute.googleapis.com/instance/cpu/usage_time"
        - "compute.googleapis.com/instance/cpu/utilization"
        - "compute.googleapis.com/instance/uptime"

Improve TestData to generate data.json files for different metric types inside each metricset.
Pagination for ListTimeSeries API results.

elasticmachine · 2020-04-15T02:11:11Z

Pinging @elastic/integrations-platforms (Team:Platforms)

exekias · 2020-04-16T07:48:52Z

I like the new approach! is it working as expected? The PR description will need to be updated

…riod

exekias · 2020-04-22T13:38:56Z

x-pack/metricbeat/module/googlecloud/stackdriver/metricset.go

+
+//stackDriverConfig holds a configuration specific for stackdriver metricset.
+type stackDriverConfig struct {
+	MetricTypes []string `config:"metric_types" validate:"required"`


these are not actually types but metric names right? how about using names here?

Hmmmm to be consistent with GCP, it is metric.type. metric.type is a part of metric selector, which includes its DNS name prefix. Here is an exmaple:

metric.type = "compute.googleapis.com/instance/cpu/usage_time" AND metric.labels.instance_name = "my-instance-name"

x-pack/metricbeat/module/googlecloud/constants.go

…riod

kaiyan-sheng · 2020-04-24T13:28:42Z

CI failures are not seem to be related.

…r metricset in GCP (#17719) (#17979) * Add metricDescriptor to get sample period and ingest delay time * add aggregation for ListTimeSeriesRequest * Add aligner into metric name suffix (eg: .avg, .sum) (cherry picked from commit 98f02e1)

Collect one metric per collection period

efe5a19

kaiyan-sheng self-assigned this Apr 15, 2020

kaiyan-sheng added Metricbeat Metricbeat needs_backport PR is waiting to be backported to other branches. Team:Platforms Label for the Integrations - Platforms team test-plan Add this PR to be manual test plan labels Apr 15, 2020

kaiyan-sheng requested review from sayden and exekias April 15, 2020 15:45

kaiyan-sheng added the in progress Pull request is currently in progress. label Apr 15, 2020

Add metricDescriptor to get sample period and ingest delay time

99c8b05

kaiyan-sheng added 7 commits April 16, 2020 14:33

add aggregation for ListTimeSeriesRequest

f7609c6

update documentation change

c9b34c0

add googlecloud configuration example in doc

5d42e04

Merge remote-tracking branch 'upstream/master' into gcp_collection_pe…

7655378

…riod

add perSeriesAligner into config example

96b49fd

add aligner as a metadata label

c0a44bf

add changelog

19647da

andresrc added [zube]: Inbox [zube]: In Progress and removed [zube]: Inbox labels Apr 19, 2020

kaiyan-sheng added 4 commits April 20, 2020 10:50

Merge remote-tracking branch 'upstream/master' into gcp_collection_pe…

caddfd6

…riod

add metric_types and aligner config params under metrics for stackdriver

839fdf5

update doc

b1c98a9

Add aligner into metric name suffix (eg: .avg, .sum)

d2370e7

kaiyan-sheng changed the title ~~[Metricbeat] Collect one metric per collection period for GCP~~ [Metricbeat] Add aggregation aligner as a config param for stackdriver metricset in GCP Apr 21, 2020

add documentation for stackdriver metricset

418aea0

kaiyan-sheng added review and removed [zube]: In Progress labels Apr 21, 2020

kaiyan-sheng removed the in progress Pull request is currently in progress. label Apr 21, 2020

exekias reviewed Apr 22, 2020

View reviewed changes

x-pack/metricbeat/module/googlecloud/constants.go Outdated Show resolved Hide resolved

exekias reviewed Apr 22, 2020

View reviewed changes

x-pack/metricbeat/module/googlecloud/constants.go Show resolved Hide resolved

kaiyan-sheng added 4 commits April 22, 2020 10:00

change .raw to .value suffix and regenerate data.json files

c792ce9

fix unit test for .value suffix

78a3ad5

change .raw to .value in fields.yml

26c4652

Merge remote-tracking branch 'upstream/master' into gcp_collection_pe…

c39386b

…riod

exekias approved these changes Apr 23, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into gcp_collection_pe…

ccd299d

…riod

kaiyan-sheng merged commit 98f02e1 into elastic:master Apr 24, 2020

kaiyan-sheng deleted the gcp_collection_period branch April 24, 2020 14:11

kaiyan-sheng mentioned this pull request Apr 24, 2020

Cherry-pick #17719 to 7.x: [Metricbeat] Add aggregation aligner as a config param for stackdriver metricset in GCP #17979

Merged

6 tasks

kaiyan-sheng added v7.8.0 and removed needs_backport PR is waiting to be backported to other branches. labels Apr 24, 2020

andresrc added the test-plan-added This PR has been added to the test plan label May 3, 2020

kaiyan-sheng mentioned this pull request May 8, 2020

[Metricbeat] Add stackdriver metricset for googlecloud module #16785

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metricbeat] Add aggregation aligner as a config param for stackdriver metricset in GCP #17719

[Metricbeat] Add aggregation aligner as a config param for stackdriver metricset in GCP #17719

kaiyan-sheng commented Apr 15, 2020 •

edited

Loading

elasticmachine commented Apr 15, 2020

exekias commented Apr 16, 2020

exekias Apr 22, 2020

kaiyan-sheng Apr 22, 2020

kaiyan-sheng commented Apr 24, 2020

[Metricbeat] Add aggregation aligner as a config param for stackdriver metricset in GCP #17719

[Metricbeat] Add aggregation aligner as a config param for stackdriver metricset in GCP #17719

Conversation

kaiyan-sheng commented Apr 15, 2020 • edited Loading

What does this PR do?

Why is it important?

Checklist

How to test this PR locally

Use config below and you should see 5 metrics every 5 minutes:

Use config below and you should see 1 metric every 1 minute:

Related issues

TODOs

elasticmachine commented Apr 15, 2020

exekias commented Apr 16, 2020

exekias Apr 22, 2020

Choose a reason for hiding this comment

kaiyan-sheng Apr 22, 2020

Choose a reason for hiding this comment

kaiyan-sheng commented Apr 24, 2020

kaiyan-sheng commented Apr 15, 2020 •

edited

Loading