Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use perflib for cpu collector #335

Merged
merged 12 commits into from
Aug 4, 2019
Merged

Use perflib for cpu collector #335

merged 12 commits into from
Aug 4, 2019

Conversation

carlpett
Copy link
Collaborator

@carlpett carlpett commented May 6, 2019

This is a draft implementation of how we could start switching collectors over to perflib. Main structural change is introducing a "scrape context" which for now just carries the perflib data, but could later be used for cancellations etc.
This scrape context is populated with a "Global" perflib reading at the start of the scrape, and collectors can then pull from this reading.
I have implemented a simple tag-based unmarshaller of perflib data, so the collector merely builds a struct with tags matching the counters, calls unmarshal and can then get down to pushing metrics back the channel.

@carlpett carlpett requested a review from martinlindhe May 6, 2019 18:32
}

const (
perfCounterRawcountHex uint32 = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write these numbers in hex as there's less symmetry in their decimal representation

@martinlindhe
Copy link
Collaborator

Looks good, anything left to work out for the cpu collection over perflib?

Also looks like CI chokes on perflib not vendored

@carlpett
Copy link
Collaborator Author

carlpett commented May 7, 2019

Yeah, haven't vendored yet because I'm waiting for a PR to be merged, figured I could get some feedback first :)
If you agree about the design I don't think there is much more work needed.

@carlpett
Copy link
Collaborator Author

... Except docs, I just realized as I was about to merge. This adds a bunch new metrics.

return indexed, nil
}

const (
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was only looking at the perflib package.

@theimdal
Copy link

Hi @carlpett.

Will these perflib metrics be completely new or will they replace the old WMI metrics? I ask this question because I think there will be some work to transition from the old wmi metrics to these new ones, dashboards, alerts and so on. This is a transition we must do, because of loosing a lot of metrics during peek loads.

@carlpett
Copy link
Collaborator Author

@theimdal They will be identical to the current metrics, so you should not need to make any changes at all!

@poblahblahblah
Copy link
Contributor

This looks great. We've got a fairly large windows install base and have been hitting a lot of WMI issues lately, so for us this is definitely a good direction to take this project.

@nbellowe
Copy link
Contributor

I'm excited about this concept!
I've seen other collectors besides the cpu collector experiencing wmi timeout issues when the system is loaded. Are there plans/desire to read raw perfcounters in other collectors as well?

@carlpett
Copy link
Collaborator Author

@nbellowe Yes, I would expect the majority of collectors to be converted once we feel confident in the results. There are a few collectors which don't have any direct raw counter, eg cs, so we'll see how we deal with those.

For now, we'd really appreciate it if you have time to test out the prerelease (here) on a few systems and see if it makes the situation better for you! Especially the new --scrape.max-duration flag to allow the exporter to return data even when WMI has stalled seems relevant to your problems.

@petedishman
Copy link

@carlpett
Hi, I've just tried out this build but am actually seeing a big performance drop compared with 0.6.0?
Requests with 0.6.0 are taking around 500ms, whereas 0.7.999-preview2 is taking around 1750-2000ms

I'm running both instances with these arguments:

 --collector.textfile.directory="C:\Application\Monitoring\WmiExporter" --telemetry.addr="127.0.0.1:9182" --telemetry.path="/wmi/metrics" --collectors.enabled="cpu,iis,logical_disk,cs,memory,net,os,process,system" --collector.process.processes-where="Name LIKE 'php-cgi%' OR Name LIKE 'w3wp%' OR Name Like 'mysqld'"

I wondered if the process filtering could be an issue, but didn't really see any difference after removing that (and the process collector)

Is there any extra information I can provide?

@carlpett
Copy link
Collaborator Author

@petedishman Thanks for trying it out! I've opened a new issue (#345) where we can investigate further.

@poblahblahblah
Copy link
Contributor

How is this looking overall? Is this a path wmi_exporter will want to continue down?

@carlpett
Copy link
Collaborator Author

@poblahblahblah Yes, I believe so. It fixes a lot of issues we've been seeing over the years, so I definitely want to move ahead with it. I've been somewhat swamped the last few weeks/months though, and haven't had time to put the finishing touches in on this. Hope to have a free weekend soon.

@carlpett carlpett changed the title WIP: Use perflib for cpu collector Use perflib for cpu collector Aug 3, 2019
@carlpett
Copy link
Collaborator Author

carlpett commented Aug 3, 2019

Sigh, thought I was ready to merge, but managed to find a data race.

@carlpett
Copy link
Collaborator Author

carlpett commented Aug 3, 2019

This now passes quite a long period of the Go race detector at least, and is probably easier to reason about than the previous nest of goroutines and channels.

@carlpett
Copy link
Collaborator Author

carlpett commented Aug 4, 2019

@martinlindhe Please have a look when you have time

Copy link
Collaborator

@martinlindhe martinlindhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, looking forward to see how this can improve the exporter!

@leoluk
Copy link

leoluk commented Aug 4, 2019

🎉 Glad to see perflib being used!

@carlpett carlpett merged commit e880889 into master Aug 4, 2019
@carlpett
Copy link
Collaborator Author

carlpett commented Aug 4, 2019

🎉

@carlpett carlpett deleted the perflib-cpu branch August 4, 2019 12:30
@carlpett carlpett mentioned this pull request Aug 8, 2019
anubhavg-icpl pushed a commit to anubhavg-icpl/windows_exporter that referenced this pull request Sep 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants