Performance issue of Cloudwatch output plugin #4317

david7482 · 2018-06-19T03:59:10Z

Relevant telegraf.conf:

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "30s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  quiet = false
  logfile = "/tmp/telegraf.log"
  hostname = ""
  omit_hostname = true

[[outputs.cloudwatch]]
  region = "us-east-1"
  namespace = "influxdata/telegraf"

[[inputs.socket_listener]]
  service_address = "udp://:8094"
  data_format = "influx"

System info:

Telegraf v1.7.0 (git: release-1.7 f4d22dd)
Linux: Ubuntu 16.04
Kernel: 4.4.0-1022-aws

Steps to reproduce:

Run telegraf with cloudwatch output plugin and enable log file
Send lots of metrics (single field) to telegraf via socket listener
Check telegraf logs:

2018-06-19T03:35:00Z D! Attempting connection to output: cloudwatch
2018-06-19T03:35:01Z D! Successfully connected to output: cloudwatch
2018-06-19T03:35:01Z I! Starting Telegraf v1.7.0
2018-06-19T03:35:01Z I! Loaded inputs: inputs.socket_listener
2018-06-19T03:35:01Z I! Loaded aggregators: 
2018-06-19T03:35:01Z I! Loaded processors: 
2018-06-19T03:35:01Z I! Loaded outputs: cloudwatch
2018-06-19T03:35:01Z I! Tags enabled: 
2018-06-19T03:35:01Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"", Flush Interval:30s 
2018-06-19T03:39:54Z D! Output [cloudwatch] wrote batch of 1000 metrics in 4m34.193655997s
2018-06-19T03:39:54Z D! Output [cloudwatch] buffer fullness: 101 / 10000 metrics. 
2018-06-19T03:40:22Z D! Output [cloudwatch] wrote batch of 101 metrics in 28.176575449s

Expected behavior:

cloudwatch output plugin should not takes 4mins to send 1000 metrics

Actual behavior:

cloudwatch output plugin takes about 4mins to send 1000 metrics

Additional info:

Cloudwatch API has the abaility to send metrics in batch. And, cloudwatch output plugin also utilizes this to send multiple fields within one metrics. However, it still sends metrics one by one. If we have batch of 1000 metrics and each one only has one field, it would invoke PutMetricData() for 1000 times which is not efficient and also generates extra cost.

I think we should generate all cloudwatch.MetricDatum at once then we could partition them and send to Cloudwatch in batch. If this apporach is acceptable, I could help to PR it.

The text was updated successfully, but these errors were encountered:

danielnelson · 2018-06-19T18:22:03Z

Yes, we would love a pull request for this.

glinton added the performance problems with decreased performance or enhancements that improve performance label Jun 19, 2018

danielnelson added the area/aws AWS plugins including cloudwatch, ecs, kinesis label Jun 19, 2018

david7482 pushed a commit to david7482/telegraf that referenced this issue Jun 19, 2018

Improve cloudwatch output performance (influxdata#4317)

be3e0e8

david7482 added a commit to david7482/telegraf that referenced this issue Jun 19, 2018

Improve cloudwatch output performance (influxdata#4317)

f4ab3f1

david7482 mentioned this issue Jun 19, 2018

Improve cloudwatch output performance (#4317) #4320

Merged

3 tasks

glinton closed this as completed in #4320 Jul 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue of Cloudwatch output plugin #4317

Performance issue of Cloudwatch output plugin #4317

david7482 commented Jun 19, 2018

danielnelson commented Jun 19, 2018

Performance issue of Cloudwatch output plugin #4317

Performance issue of Cloudwatch output plugin #4317

Comments

david7482 commented Jun 19, 2018

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

danielnelson commented Jun 19, 2018