Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue of Cloudwatch output plugin #4317

Closed
david7482 opened this issue Jun 19, 2018 · 1 comment
Closed

Performance issue of Cloudwatch output plugin #4317

david7482 opened this issue Jun 19, 2018 · 1 comment
Labels
area/aws AWS plugins including cloudwatch, ecs, kinesis performance problems with decreased performance or enhancements that improve performance

Comments

@david7482
Copy link
Contributor

Relevant telegraf.conf:

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "30s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  quiet = false
  logfile = "/tmp/telegraf.log"
  hostname = ""
  omit_hostname = true

[[outputs.cloudwatch]]
  region = "us-east-1"
  namespace = "influxdata/telegraf"

[[inputs.socket_listener]]
  service_address = "udp://:8094"
  data_format = "influx"

System info:

Telegraf v1.7.0 (git: release-1.7 f4d22dd)
Linux: Ubuntu 16.04
Kernel: 4.4.0-1022-aws

Steps to reproduce:

  1. Run telegraf with cloudwatch output plugin and enable log file
  2. Send lots of metrics (single field) to telegraf via socket listener
  3. Check telegraf logs:
2018-06-19T03:35:00Z D! Attempting connection to output: cloudwatch
2018-06-19T03:35:01Z D! Successfully connected to output: cloudwatch
2018-06-19T03:35:01Z I! Starting Telegraf v1.7.0
2018-06-19T03:35:01Z I! Loaded inputs: inputs.socket_listener
2018-06-19T03:35:01Z I! Loaded aggregators: 
2018-06-19T03:35:01Z I! Loaded processors: 
2018-06-19T03:35:01Z I! Loaded outputs: cloudwatch
2018-06-19T03:35:01Z I! Tags enabled: 
2018-06-19T03:35:01Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"", Flush Interval:30s 
2018-06-19T03:39:54Z D! Output [cloudwatch] wrote batch of 1000 metrics in 4m34.193655997s
2018-06-19T03:39:54Z D! Output [cloudwatch] buffer fullness: 101 / 10000 metrics. 
2018-06-19T03:40:22Z D! Output [cloudwatch] wrote batch of 101 metrics in 28.176575449s

Expected behavior:

cloudwatch output plugin should not takes 4mins to send 1000 metrics

Actual behavior:

cloudwatch output plugin takes about 4mins to send 1000 metrics

Additional info:

Cloudwatch API has the abaility to send metrics in batch. And, cloudwatch output plugin also utilizes this to send multiple fields within one metrics. However, it still sends metrics one by one. If we have batch of 1000 metrics and each one only has one field, it would invoke PutMetricData() for 1000 times which is not efficient and also generates extra cost.

I think we should generate all cloudwatch.MetricDatum at once then we could partition them and send to Cloudwatch in batch. If this apporach is acceptable, I could help to PR it.

@glinton glinton added the performance problems with decreased performance or enhancements that improve performance label Jun 19, 2018
@danielnelson
Copy link
Contributor

Yes, we would love a pull request for this.

@danielnelson danielnelson added the area/aws AWS plugins including cloudwatch, ecs, kinesis label Jun 19, 2018
david7482 pushed a commit to david7482/telegraf that referenced this issue Jun 19, 2018
david7482 added a commit to david7482/telegraf that referenced this issue Jun 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/aws AWS plugins including cloudwatch, ecs, kinesis performance problems with decreased performance or enhancements that improve performance
Projects
None yet
Development

No branches or pull requests

3 participants