Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wmi_exporter footprint #375

Closed
szymon3 opened this issue Aug 8, 2019 · 6 comments
Closed

wmi_exporter footprint #375

szymon3 opened this issue Aug 8, 2019 · 6 comments
Labels

Comments

@szymon3
Copy link

szymon3 commented Aug 8, 2019

Hi,

Recently I've started using wmi_exporter on Windows Server 2016 with following specification:
image

After checking performance impact of monitoring I realized that it's using quite a lot CPU, please see below chart (chart: 10 minutes, samples every 2 seconds; wmi_exporter.exe running with only one collector enabled - textfile):
image

In my opinion it's not as low footprint as it should be. It doesn't matter if it's running as .exe file or service and I also made a test with higher number of collectors enabled, CPU usage was similar.
Do you have any ideas how I can lower the footprint? Is it possible to lower metrics scrape interval to, e.g. 15 seconds?

Here's a dump of metrics collected:

Metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0.0011394
go_gc_duration_seconds_sum 0.0053104
go_gc_duration_seconds_count 138
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 13
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.156316e+07
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 6.56148712e+08
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.515449e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 576160
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 1.311744e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.156316e+07
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 1.8243584e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.4983168e+07
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 165553
# HELP go_memstats_heap_released_bytes_total Total number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes_total counter
go_memstats_heap_released_bytes_total 1.4336e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 3.3226752e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.5652489989488754e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 741713
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 13632
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 174528
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 180224
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 1.8465584e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.333567e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 327680
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 327680
# HELP go_memstats_sys_bytes Number of bytes obtained by system. Sum of all system allocations.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 3.79118e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds counter
process_start_time_seconds 1.565248979e+09
# HELP wmi_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which wmi_exporter was built.
# TYPE wmi_exporter_build_info gauge
wmi_exporter_build_info{branch="master",goversion="go1.12.3",revision="d01c66986cec25928693b123d0b2155220fdd540",version="0.8.0"} 1
# HELP wmi_exporter_collector_duration_seconds wmi_exporter: Duration of a collection.
# TYPE wmi_exporter_collector_duration_seconds gauge
wmi_exporter_collector_duration_seconds{collector="textfile"} 0
# HELP wmi_exporter_collector_success wmi_exporter: Whether the collector was successful.
# TYPE wmi_exporter_collector_success gauge
wmi_exporter_collector_success{collector="textfile"} 1
# HELP wmi_exporter_collector_timeout wmi_exporter: Whether the collector timed out.
# TYPE wmi_exporter_collector_timeout gauge
wmi_exporter_collector_timeout{collector="textfile"} 0
# HELP wmi_exporter_perflib_snapshot_duration_seconds Duration of perflib snapshot capture
# TYPE wmi_exporter_perflib_snapshot_duration_seconds gauge
wmi_exporter_perflib_snapshot_duration_seconds 0.320124
# HELP wmi_object_size Size of specified object in bytes.
# TYPE wmi_object_size gauge
wmi_object_size{object="xxxx1"} 1.03079215104e+11
wmi_object_size{object="xxxx2"} 2.966421504e+09
wmi_object_size{object="xxxx3"} 0
wmi_object_size{object="xxxx4"} 3.028287488e+09
# HELP wmi_textfile_mtime_seconds Unixtime mtime of textfiles successfully read.
# TYPE wmi_textfile_mtime_seconds gauge
wmi_textfile_mtime_seconds{file="wmi_object_size.prom"} 1.565232472e+09
wmi_textfile_mtime_seconds{file="wmi_top_cpu_processes.prom"} 1.565188008e+09
wmi_textfile_mtime_seconds{file="wmi_top_memory_processes.prom"} 1.565188019e+09
# HELP wmi_textfile_scrape_error 1 if there was an error opening or reading a file, 0 otherwise
# TYPE wmi_textfile_scrape_error gauge
wmi_textfile_scrape_error 0
# HELP wmi_top_cpu_processes Name of process and CPU usage.
# TYPE wmi_top_cpu_processes gauge
wmi_top_cpu_processes{ProcessName="dwm"} 1.56206590188587
wmi_top_cpu_processes{ProcessName="mmc"} 15.6206590188587
wmi_top_cpu_processes{ProcessName="rdpclip"} 3.12413180377173
wmi_top_cpu_processes{ProcessName="taskmgr"} 6.24826360754346
# HELP wmi_top_memory_processes Name of process, PID and Memory usage.
# TYPE wmi_top_memory_processes gauge
wmi_top_memory_processes{PID="127044",ProcessName="msmdsrv"} 4.9071099904e+10
wmi_top_memory_processes{PID="140640",ProcessName="Ssms"} 2.4948736e+08
wmi_top_memory_processes{PID="161916",ProcessName="sqlservr"} 5.3298229248e+10
wmi_top_memory_processes{PID="210592",ProcessName="wmi_exporter"} 3.76348672e+08
wmi_top_memory_processes{PID="256192",ProcessName="ReportingServicesService"} 1.245364224e+09
wmi_top_memory_processes{PID="263548",ProcessName="RSPortal"} 5.99683072e+08
wmi_top_memory_processes{PID="2872",ProcessName="MsMpEng"} 2.41373184e+08
wmi_top_memory_processes{PID="287648",ProcessName="ReportingServicesService"} 9.19687168e+08
wmi_top_memory_processes{PID="3788",ProcessName="RSHostingService"} 1.174986752e+09
wmi_top_memory_processes{PID="90780",ProcessName="Ssms"} 3.33795328e+08

I appreciate your help!

@carlpett
Copy link
Collaborator

carlpett commented Aug 8, 2019

Hi @szymon3!
A scrape interval of two seconds is very aggressive. The default Prometheus setting is 1 minute. Even 15 seconds would be much less intensive.

Still, we might need to make some improvements. As of version 0.8, on each scrape, the exporter will fetch a snapshot of all performance counters in the system, which does have a slight overhead. We could try to make this more targeted.

So for your specific situation right now, I'd increase the interval (quite a bit).

@szymon3
Copy link
Author

szymon3 commented Aug 8, 2019

Hi @carlpett, thanks a lot for prompt reply. Probably I've explained it in wrong way. The 2 seconds interval was set in Performance Tracking tool (red chart, CPU usage of process in Windows). Prometheus is scraping wmi_exporter endpoint every 15 seconds, as you suggested.

Do you see any options how can I improve this situation at this moment? Downgrade to version <0.8?

@carlpett
Copy link
Collaborator

carlpett commented Aug 8, 2019

Ah, ok, sorry for misunderstanding! 15 seconds is more reasonable :)
I'm somewhat surprised still at how it could go so high. Let me run a few tests and see if I can reproduce.

@carlpett
Copy link
Collaborator

carlpett commented Aug 8, 2019

First oddness I can see is that the perfmon value looks quite high on my machine too, but it doesn't match other measurements.
I do a intense test now where I have a loop scraping as fast as it can. Perfmon says the wmi_exporter process has "% Processor time" 68% average. But looking at Task Manager, it says the entire system runs at ~40%. And the CPU column for wmi_exporter in the details tab is ~16%. Not sure what is most correct, but at least it isn't consistent.

There's another thing, though. wmi_exporter is showing up in your "top memory users", which is worrying. We shouldn't need several hundred megabytes of data. A memory leak often means higher processor usage, so I'll have a look at why this could happen.

@carlpett
Copy link
Collaborator

carlpett commented Aug 8, 2019

See #376 for the memory part.

Copy link

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Nov 25, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants