Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_statsd: new input plugin #1546

Closed
talawahtech opened this issue Sep 1, 2019 · 10 comments
Closed

in_statsd: new input plugin #1546

talawahtech opened this issue Sep 1, 2019 · 10 comments

Comments

@talawahtech
Copy link

talawahtech commented Sep 1, 2019

Is your feature request related to a problem? Please describe.
I am currently interested in collecting and storing performance logs from cadvisor using its ability to export data to statsd. But this would also be a general solutions for storing performance logs from other systems that support statsd as an output format.

Describe the solution you'd like
A new input plugin that accepts data in the statsd format. Similar in concept to in_collectd.

Additional context
There are cases where it is preferable to store performance related data in a "logging" backend instead of a traditional metrics backend. Oftentimes this allows for better correlation against related data and more sophisticated querying.

@edsiper edsiper added this to the Fluent Bit v1.4 milestone Oct 1, 2019
@edsiper edsiper changed the title New input plugin: in_statsd in_statsd: new input plugin Oct 21, 2019
@fujimotos
Copy link
Member

I've been working on this issue since today, and will update
the status as it proceeds.

@fujimotos
Copy link
Member

@eduardo I reached the point that I can receive and parse incoming statsd
packets. For instance, I can parse an input metric foo|99:c into:

{"bucket":"foo","value":99,"type":c}

Now I'm seeing that there are two possible ways to go. One way is to emit
each packet as a separate record per received. In this case, we'll get
records like below:

{"bucket":"foo","value":99,"type":c}

Another way is to aggregate records in in_statsd and emits the resulting
statistics at a certain interval (just as a normal statsd server would do).
In this case, we'll get records like the following:

{ counters:
   { 'statsd.bad_lines_seen': 0,
     'statsd.packets_received': 1,
     'statsd.metrics_received': 1
     'foo': 99},
  timers: {},
  gauges: { 'statsd.timestamp_lag': 0 },
  timer_data: {},
  counter_rates:
   { 'statsd.bad_lines_seen': 0,
     'statsd.packets_received': 0.1,
     'statsd.metrics_received': 0.1,
     'foo': 9.9 },
  sets: {},
  pctThreshold: [ 90 ] }

Which way do you think is appropriate? The easier route is the former
(= emits incoming records as is), but I think it depends on how the new
plugin is supposed to behave.

@fujimotos
Copy link
Member

After some thought, I've come to the conclusion that we should process
incoming metrics in in_statsd, since many messages in statsd do not
make much sense without some aggregation.

For instance, according to the spec, foo:+99|g means "add 99 to the
latest gauge value of foo". So naturally it is expected of in_statsd to
keep the state of gauges and produce the consolidated results.

Indeed, this is what many statsd-compatible servers do. For example,
If a client inputs the following datagrams:

baa:99|g
baa:+1|g
baa:-10|g

a statsd-compatible server is expected to produce the following output:

stats.gauges.baa 90

I have spent much of today implementing in_statsd to emit events as just
described above (and almost done it). I think I can post a PR shortly...

@edsiper
Copy link
Member

edsiper commented Nov 14, 2019

from a consumer perspective, what makes easier to process the metrics or aggregate them later in a database?, what do you see in some use cases ?

@fujimotos
Copy link
Member

@edsiper I did a bit of research on this matter and posted a PR to #1741.
I think this one is the most compatible solution with statsd, and should make
Fluent Bit on par with other products in this area.

I'd greatly appreciate if I can get some feedback from you on that patch.

@fujimotos
Copy link
Member

@talawahtech If possible, can you check #1741 and give us some feedback?
I'd like to hear your opinion on how the plugin is expected to work.

@fujimotos
Copy link
Member

from a consumer perspective, what makes easier to process the metrics or aggregate them later in a database?, what do you see in some use cases ?

@edsiper After re-reading your comment, I posted a simpler patch at #1756,
which focuses on handling statsd protocol messages.

It can emit incoming metrics as msgpack records without loss of information,
and should meet @talawahtech's following requirement well.

There are cases where it is preferable to store performance related data in a "logging" backend instead of a traditional metrics backend. Oftentimes this allows for better correlation against related data and more sophisticated querying.

I'm looking forward to your review and feedback!

@edsiper
Copy link
Member

edsiper commented Nov 26, 2019

FYI: PR #1756 reviewed and commented

@talawahtech
Copy link
Author

@fujimotos yes, I think the approach in #1756 is more in line with how I plan to use it. I haven't gotten a chance to test it out yet, but I will try and provide feedback when I do. Thanks for working on this!

@edsiper
Copy link
Member

edsiper commented Jan 23, 2020

Implemented by @fujimotos on #1756

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants