adds an alternate windows performance counter input plugin #1629

dzrw · 2016-08-12T08:58:20Z

After getting Telegraf installed as a Windows Service earlier today, I noticed that the win_perf_counters plugin was generating a large amount of difficult to query series. So, I built this alternate input plugin that works to minimize the number of series generated and simplify queries.

From the README.md,

wpc vs win_perf_counters

The win_perf_counters plugin generates tags and fields using native Windows names. This can make it difficult to compare common measurements across heterogenous environments because Windows names tend towards complexity. For example, on Windows the performance counter "\Processor(*)%% User Time" is equivalent to the Linux metric "cpu.usage_user" - good luck displaying both series on the same plot in Grafana.

Additionally, win_perf_counters can generate an large number of series in an InfluxDB database due to the inclusion of the Windows Performance Counter Object Name (e.g. Processor, Processor Information, Memory, etc) in the tag list. According to the Hardware Sizing Guidelines, series cardinality strongly affects the amount of RAM required by the InfluxDB server. Therefore, there is a risk that heavily instrumented Windows machines can unduly impact the provisioning requirements of the InfluxDB server simply due to the use of win_perf_counters.

The wpc plugin mitigates these two potential issues by making Performance Counter queries field names explicit, and by transparently regrouping fully-qualified Performance Counter queries by instance to minimize the number of points generated.

I'm open to suggestions for improving it further.

Required for all PRs:

CHANGELOG.md updated
Sign CLA (if not already signed)
README.md updated (if adding a new plugin)

sparrc · 2016-08-12T13:24:41Z

@politician thank you for the contribution but I don't think I can merge this in it's current state.

If you would like to change the windows perf counters, we will need to do a straight replacement. Windows support is still in an "experimental" state so currently it's OK to make breaking changes to the measurement schema.

FWIW, I completely support the idea behind this PR, but I was under the impression that windows users prefer the verbose and complicated names. This being said, we will need to open up a discussion and get input from other win_perf_counters users.

cc @TheFlyingCorpse @butitsnotme @ricardclau @steverweber @elvarb @cwegener @G-regL please let us know your thoughts on normalizing the win_perf_counters field and measurement names for simplicity of querying, and to be more similar to the schema of the linux plugins.

G-regL · 2016-08-12T16:10:50Z

...but I was under the impression that windows users prefer the verbose and complicated names.

I'm inclined to agree with you Cam. I'm a user of both platforms in my environment, and I'm happy to use the names provided by each.

I can't speak to the use of Influx as a TSDB, but with Graphite, I use a relay to rewrite metrics names I don't like into ones I do. I also use tagexclude on the agent end to drop some of the more redundant or useless tags.

I actually rather like the current plugin.

That said, if I had to make a change, it would be geared towards the source of the metrics.
Instead of the current library, which uses the Performance Data Helper, I'd move to the StackExchange WMI library so that you can collect the output of raw WMI calls to any class. Sadly I don't have sufficient Go-fu to change the current code-base to use that. Besides, it would be a huge breaking change and I'm not sure the value of such a large change is really there.

sparrc · 2016-08-12T16:18:30Z

@G-regL if you use the regular plugins (inputs.cpu, inputs.mem, etc.), those should use WMI.

The reason I decided to default to win_perf_counters on windows is that many other windows users told me (and there is plenty to read on the internet about this) that WMI is notorious for using large amounts of system resources itself.

G-regL · 2016-08-12T16:52:07Z

@sparrc
I'll admit that I haven't even tried the regular plugins on Windows, but it still doesn't offer the level of control over which metrics, from which classes are being pulled.

win_perf_counters does that better, but being able to build your own WQL queries to be run against the system would be the ultimate.

dzrw · 2016-08-12T16:58:49Z

@G-regL have you tried using a PowerShell script from the exec plugin?

G-regL · 2016-08-12T17:01:22Z

@politician, no. I hadn't thought of it, and now that I have, I think it would be slower than having something built-in. Something to test though I suppose.

steverweber · 2016-08-12T20:33:52Z

We run a mixed environment of, mac, windows, Linux, systems... it be best if the metric names were uniform on all the os types. This will simplify queries to display the data... +1

butitsnotme · 2016-08-12T20:36:39Z

I am inclined to believe that telegraf should produce as close to the same set of metrics across all platforms as possible, including using the same names. This means less re-work of the data to be able to compare across platforms.

@politician I have a pull request in progress which will remove carriage returns allowing the data from a powershell script (or other program) to be processed on Windows. See pull request #1606.

dzrw · 2016-08-12T20:52:03Z

Thanks for the quick replies - it seems like there are two camps forming: folks who like origin names or can post-process metrics, and folks that prefer uniform names or don't want to post-process metrics.

This discussion might be raising the need for general transform plugins that can alter/regroup metrics before they're emitted to an output plugin. But short of that sort of large change, here are a couple of other ideas that I was playing around with before settling on the wpc approach:

a dedicated iis plugin with support for a PreserveWindowsNames boolean (default: false)
a win_lua plugin that exposes PDH to Lua scripts. I tend to agree with @G-regL's observation that invoking PowerShell or WSH every 10s is probably really slow; however, Lua contexts can be cached.
writing a server that gathers and transforms PDH counters and makes them available to a telegraf TCP input (which kind of usurps the role of telegraf, but provides maximum flexibility to support my needs).

That said, I suppose I could make the field rewriting aspect optional. It doesn't sound like the series minimization code is contentious.

steverweber · 2016-08-12T21:09:00Z

https://github.com/mozilla-services/heka
seems to have most of that already... perhaps should re-evaluate who is creating the wheel.

dzrw · 2016-08-12T21:10:46Z

@steverweber I'll admit to never getting heka to actually work. On Windows or Linux, even with the default "Hello, World" example. On the other hand, telegraf worked right out of the box.

steverweber · 2016-08-12T21:14:25Z

i also found heka kinda frustrating to get working... that's why i'm here :) Seemed more simple.
However it does seem telegraf is starting be reworked to support some of the more advanced things that heka has.

elvarb · 2016-08-12T21:40:10Z

Graphite powershell https://github.com/MattHodge/Graphite-PowerShell-Functions has this feature of renaming metrics, it's just in the powershell code but very easy to modify there.

It is one way of doing this, have telegraf rename metrics before they are sent.

Regarding unified naming conventions between platforms I would be extremely cautious. Not all platforms report basic metrics on the same format, cpu load for example.

sparrc · 2016-08-15T14:39:24Z

@steverweber please keep on-topic, your opinion about merging telegraf & heka has been heard many times by the telegraf committee (of one).....I think you can guess by now that it's not going to happen.

sparrc · 2016-08-15T14:50:06Z

@politician what is an iis plugin?

I would support having the ability to specify arbitrary WQL statements

and lastly, remember that most of the regular system plugins work and produce the same names as the linux plugins (inputs.cpu, inputs.mem, etc). These were not made the default because WMI is resource-intensive.

if anyone has time & expertise to rewrite the code behind these to use windows perf counters instead of WMI, I'm sure that @shirou would appreciate it a lot: https://github.com/shirou/gopsutil

dzrw · 2016-08-15T14:54:11Z

@politician what is an iis plugin?

@sparrc A primary use case is monitoring the standard Windows HTTP server, IIS. So, I briefly considered building a dedicated plugin for it (cf. nginx, etc).

sparrc · 2016-08-15T17:19:30Z

My preference is for the format suggested here, but we would need to replace win_perf_counters rather than maintaining two plugins doing essentially the same thing.

Having a plugin-like interface for modifying metrics as they pass through the system is in the pipeline, and a high-priority.

ricardclau · 2016-08-15T18:45:53Z

Sorry about the delay answering here

In our case, we never compare Linux and Windows metrics as they do completely different things in our setup so win_perf_counters is totally fine for us. I agree the names are a bit cumbersome but this is just how Windows stores them.

On the other hand, I agree, it is very difficult to show the same metric (even something as simple as Free Memory) for both Win and Linux hosts in the same Grafana dashboard.

If you ask me, I am happy with the way win_perf_counters plugin works but if you go ahead with this new plugin (which makes total sense, as Windows support is experimental) I would appreciate some comments with an easy migration guide for the telegraf.conf files.

We have hundreds of servers reporting metrics to our Grafana / InfluxDB setups and CI/CD pipelines to generate and install dashboards and this change can be a bit tricky for us :)

toni-moreno · 2016-08-16T08:54:10Z

Hi to everybody.

I would like to contribute in this discussion.

We are currently working with Graphite Powershell https://github.com/MattHodge/Graphite-PowerShell-Functions . We are now renaming metric names to something more user friendly.

And We would like a lot to have this new capability also in telegraf. ( I think is really important to users like us that will need a migration from Graphite Powershell to telegraf in the future ) .

We have also need any way to get data from other windows sources , like WMI , we need by example to get the total physical memory in the system. ( not available in native performance counters in windows 2008/2012 servers ).

Thank you very much.

shirou · 2016-08-16T09:39:39Z

Hi all, gopsutil author here.

I noticed lxn/win is now not using cgo. Since gopsutil has "pure golang" policy, I could not use lxn/win, but now it looks changed.

I am thinking about gopsutil change to use lxn/win. But if someone make a PR, I really appreciate.

(Sorry not directly related to telegraf itself)

dzrw · 2016-08-16T19:21:03Z

@sparrc It sounds like there is a general consensus in favor of the following:

We want an optional mechanism for rewriting metrics in an output plugin independent way.
We want win_perf_counters to remain as a general tool for querying Windows Performance Counters.
We want a different windows plugin that supports querying WMI.

There hasn't been enough discussion to develop a consensus around the following questions:

Should we add support to coalesce related points?
Should we change the configuration of win_perf_counters to use fully-qualified counter queries or leave it as it?

Having a plugin-like interface for modifying metrics as they pass through the system is in the pipeline, and a high-priority.

I'd love to take a look at the progress on this - can you point me to any commits?

sparrc · 2016-08-17T11:27:01Z

Should we add support to coalesce related points?

Can't this be done already via configuration?

Should we change the configuration of win_perf_counters to use fully-qualified counter queries or leave it as it?

I'm not sure....what would be the benefit? can you provide and example of how that would look vs. the current plugin?

I'd love to take a look at the progress on this - can you point me to any commits?

there is none so far

dzrw · 2016-08-19T01:56:29Z

coalesce related points

I should have been more specific. The current plugin will coalesce points by objectname via configuration, but wpc goes further by discarding objectname and compressing queries based on instance alone. In this way, I can jam more data into each metric yet reduce series cardinality by, in some cases, 20%. That's important for me because I'm using InfluxDB.

Should we change the configuration of win_perf_counters to use fully-qualified counter queries or leave it as it?
I'm not sure....what would be the benefit? can you provide and example of how that would look vs. the current plugin?

The proposed wpc plugin uses fully-qualified queries as a means to jam more metrics into fewer series. The current win_perf_counters plugin issues queries by Object Name which means that the same metrics are spread out over a larger number of series. This potentially complicates comparisons or requires post-processing to ameliorate. Fixing the issue at the source seemed like a cheap win.

The sample below is slightly modified from the README.md.

 # A plugin to collect stats from Windows Performance Counters
 [[inputs.wpc]]
  ## If the system being polled for data does not have a particular Counter at startup 
  ## of the Telegraf agent, it will not be gathered.
  # Prints all matching performance counters (useful for debugging)
  # PrintValid = false

  [[inputs.wpc.template]]
    # Processor usage, alternative to native.
    Counters = [
      # Use double-backslashes to work around a TOML parsing issue.
      [ "usage_idle", "\\Processor(_Total)\\%% Idle Time" ],
      [ "usage_user", "\\Processor(_Total)\\%% User Time" ],
      [ "usage_system", "\\Processor(_Total)\\%% Processor Time" ],
      [ "available_bytes", "\\Memory\\Available Bytes" ]
    ]
    Measurement = "win_system"
    # Print out when the performance counter is missing from object, counter or instance.
    # WarnOnMissing = false

The current win_perf_counters plugin cannot mix Memory and Processor objects into the same metric (without post-processing). There are more substantial gains to be had when querying the IIS & .NET performance counters. The number of series is negatively correlated with InfluxDB performance (in fact, the documentation says that it's exponential), so that seems like something to avoid.

sparrc · 2016-08-31T10:22:52Z

The current win_perf_counters plugin cannot mix Memory and Processor objects into the same metric (without post-processing). There are more substantial gains to be had when querying the IIS & .NET performance counters. The number of series is negatively correlated with InfluxDB performance (in fact, the documentation says that it's exponential), so that seems like something to avoid.

@politician putting all of your fields into a single measurement is not an encouraged way to setup your influxdb schema, and in fact fields do contribute to cardinality in InfluxDB. I believe the documentation might be a bit inaccurate because a "measurement" is considered the combination of the "measurement name" + "field name"

It's important to note that it's exponential but where

the exponent is between one and two:

So adding just a few more series to separate CPU and memory usage shouldn't have any significant impact.

The other consideration is that if the fields are part of the same series, then they can never be differentiated from each other, meaning that you can't separate out CPU usage of the various CPUs. It works if you are only differentiating based on the hostname, but it falls apart if you want any more granularity beyond that.

sparrc · 2016-09-05T13:27:29Z

I'm closing this for now as I don't want to merge duplicate plugins.

If there is something lacking in the current win_perf_counters plugin, the proper way to go about requesting/discussing changes would be to open an issue. Then we can come to a consensus over whether we can introduce breaking changes if they would be of use to the community.

dzrw · 2016-09-05T19:56:41Z

@sparrc Is there a contrib repository for plugins like this?

sparrc · 2016-09-05T20:08:31Z

not at the moment, no, Go doesn't have a very good facility for doing this unfortunately.

adds an alternate windows performance counter input plugin

2934f86

sparrc closed this Sep 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds an alternate windows performance counter input plugin #1629

adds an alternate windows performance counter input plugin #1629

dzrw commented Aug 12, 2016 •

edited

Loading

sparrc commented Aug 12, 2016

G-regL commented Aug 12, 2016

sparrc commented Aug 12, 2016

G-regL commented Aug 12, 2016

dzrw commented Aug 12, 2016

G-regL commented Aug 12, 2016

steverweber commented Aug 12, 2016 •

edited

Loading

butitsnotme commented Aug 12, 2016

dzrw commented Aug 12, 2016 •

edited

Loading

steverweber commented Aug 12, 2016

dzrw commented Aug 12, 2016

steverweber commented Aug 12, 2016

elvarb commented Aug 12, 2016

sparrc commented Aug 15, 2016

sparrc commented Aug 15, 2016

dzrw commented Aug 15, 2016 •

edited

Loading

sparrc commented Aug 15, 2016

ricardclau commented Aug 15, 2016

toni-moreno commented Aug 16, 2016

shirou commented Aug 16, 2016

dzrw commented Aug 16, 2016

sparrc commented Aug 17, 2016

dzrw commented Aug 19, 2016 •

edited

Loading

sparrc commented Aug 31, 2016 •

edited

Loading

sparrc commented Sep 5, 2016

dzrw commented Sep 5, 2016

sparrc commented Sep 5, 2016

adds an alternate windows performance counter input plugin #1629

adds an alternate windows performance counter input plugin #1629

Conversation

dzrw commented Aug 12, 2016 • edited Loading

Required for all PRs:

sparrc commented Aug 12, 2016

G-regL commented Aug 12, 2016

sparrc commented Aug 12, 2016

G-regL commented Aug 12, 2016

dzrw commented Aug 12, 2016

G-regL commented Aug 12, 2016

steverweber commented Aug 12, 2016 • edited Loading

butitsnotme commented Aug 12, 2016

dzrw commented Aug 12, 2016 • edited Loading

steverweber commented Aug 12, 2016

dzrw commented Aug 12, 2016

steverweber commented Aug 12, 2016

elvarb commented Aug 12, 2016

sparrc commented Aug 15, 2016

sparrc commented Aug 15, 2016

dzrw commented Aug 15, 2016 • edited Loading

sparrc commented Aug 15, 2016

ricardclau commented Aug 15, 2016

toni-moreno commented Aug 16, 2016

shirou commented Aug 16, 2016

dzrw commented Aug 16, 2016

sparrc commented Aug 17, 2016

dzrw commented Aug 19, 2016 • edited Loading

sparrc commented Aug 31, 2016 • edited Loading

sparrc commented Sep 5, 2016

dzrw commented Sep 5, 2016

sparrc commented Sep 5, 2016

dzrw commented Aug 12, 2016 •

edited

Loading

steverweber commented Aug 12, 2016 •

edited

Loading

dzrw commented Aug 12, 2016 •

edited

Loading

dzrw commented Aug 15, 2016 •

edited

Loading

dzrw commented Aug 19, 2016 •

edited

Loading

sparrc commented Aug 31, 2016 •

edited

Loading