-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Terminal service & RemoteFx Collector #491
add Terminal service & RemoteFx Collector #491
Conversation
Nice! I'm especially looking forward to using the terminal services collector at work. From a brief look at the metrics available in perflib, I think the session metrics could be queried via perflib rather than WMI (though not the |
I am making required changed to use |
The preference to use perflib is due to performance and stability reasons. WMI calls are expensive, and sometimes return inconsistent metric data (#89 is a good example). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! I've added a few comments for some minor changes.
Let me know if you need any clarification.
collector/remote_fx.go
Outdated
ch <- prometheus.MustNewConstMetric( | ||
c.BaseTCPRTT, | ||
prometheus.GaugeValue, | ||
float64(d.BaseTCPRTT), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the float64()
cast on all metrics as the perflib metrics are already defined as float64 in perflibRemoteFxNetwork
and perflibRemoteFxGraphics
.
collector/remote_fx.go
Outdated
d.Name, | ||
) | ||
ch <- prometheus.MustNewConstMetric( | ||
c.TCPReceivedRate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've run the exporter on my local machine, and a number of the metrics are monotonically increasing, and should be changed to CounterValue
rather than the current GaugeValue
.
I noted that this was the case for net_total_received_rate
, net_total_sent_rate
, tcp_sent_rate
, tcp_received_rate
, net_total_sent_bytes
and net_total_received_bytes
. There's likely more metrics than need to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to investigate this further but when i checked not all counters are increasing. (exceptions are services
and console
sessions. Since these 2 are not actually remote sessions I have added filters for them. )
I will do further testing on this my side. If you have some time I whould appreciate if you can help me with this.
collector/remote_fx.go
Outdated
//gfx | ||
AverageEncodingTime: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "gfx_average_encoding_time"), | ||
"Average frame encoding time.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What unit does this metric represent? Seconds? Milliseconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Milliseconds updating help.
collector/remote_fx.go
Outdated
), | ||
FECRate: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "net_fec_rate"), | ||
"Forward Error Correction (FEC) percentage", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an error percentage for all traffic? I'm not sure what this metric is representing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation is not clear but all these counters are /session
collector/remote_fx.go
Outdated
), | ||
LossRate: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "net_loss_rate"), | ||
"Loss percentage", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a loss percentage for all traffic in the session?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation is not clear but all these counters are /session
docs/collector.terminal_services.md
Outdated
||| | ||
-|- | ||
Metric name prefix | `terminal_services` | ||
Data source | Perflib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to be Perflib/WMI as both sources are queried for this collector.
collector/terminal_services.go
Outdated
ch <- prometheus.MustNewConstMetric( | ||
c.HandleCount, | ||
prometheus.GaugeValue, | ||
float64(d.HandleCount), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant float64()
cast can be removed, as variables are initialized as float64
.
@asiyani thanks for the changes! I'll investigate further on which metrics should be changed to counters. I'll try to compile a list of metrics that can then be reviewed by others. |
Further investigation on counters. I will change counter type to following values. Please let me know if you find it different in your testing.
|
I've done some testing and can clarify some of the metrics:
I'm uncertain of @asiyani the only change left to make would be to rename the counter metrics. E.G. change @carlpett this looks almost complete, did you want to review? |
@breed808 when you got time please review this last change and let me know if any other issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Thank you for this work @asiyani, really nice! And thanks @breed808 for reviewing! Apologies for slow reactions from me.
|
Regarding This will remove confusion about |
Nice! 👏 |
|
@carlpett when you have a time, Please review this for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
Sorry, bit of a lack of time this week.
I took another look now, and came upon some things I missed last round. Mainly naming and units, tried to leave suggestions for those so you can hopefully batch that up to skip the more tedious parts if you agree with them.
collector/remote_fx.go
Outdated
nil, | ||
), | ||
TotalReceivedBytes: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "net_total_received_bytes"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat nitty, but by convention the _total
should be at the end
prometheus.BuildFQName(Namespace, subsystem, "net_total_received_bytes"), | |
prometheus.BuildFQName(Namespace, subsystem, "net_received_bytes_total"), |
collector/remote_fx.go
Outdated
nil, | ||
), | ||
TotalSentBytes: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "net_total_sent_bytes"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move _total
here too
prometheus.BuildFQName(Namespace, subsystem, "net_total_sent_bytes"), | |
prometheus.BuildFQName(Namespace, subsystem, "net_sent_bytes_total"), |
collector/remote_fx.go
Outdated
|
||
//gfx | ||
AverageEncodingTime: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "gfx_average_encoding_time"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have the unit as a suffix
prometheus.BuildFQName(Namespace, subsystem, "gfx_average_encoding_time"), | |
prometheus.BuildFQName(Namespace, subsystem, "gfx_average_encoding_time_seconds"), |
Are we sure the raw data is actually the average and not a counter of total seconds, by the way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, its a GaugeValue
collector/terminal_services.go
Outdated
nil, | ||
), | ||
PageFaultsPersec: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "page_fault_per_sec"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually per_sec or is the raw data a counter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it looks like total and counter. i will change it.
collector/terminal_services.go
Outdated
nil, | ||
), | ||
PercentPrivilegedTime: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "privileged_time_total"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the conventions, we should have the unit here as well
prometheus.BuildFQName(Namespace, subsystem, "privileged_time_total"), | |
prometheus.BuildFQName(Namespace, subsystem, "privileged_time_seconds_total"), |
collector/remote_fx.go
Outdated
return &RemoteFxCollector{ | ||
// net | ||
BaseTCPRTT: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "net_base_tcp_rtt"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a unit
prometheus.BuildFQName(Namespace, subsystem, "net_base_tcp_rtt"), | |
prometheus.BuildFQName(Namespace, subsystem, "net_base_tcp_rtt_seconds"), |
collector/remote_fx.go
Outdated
nil, | ||
), | ||
BaseUDPRTT: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "net_base_udp_rtt"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prometheus.BuildFQName(Namespace, subsystem, "net_base_udp_rtt"), | |
prometheus.BuildFQName(Namespace, subsystem, "net_base_udp_rtt_seconds"), |
collector/remote_fx.go
Outdated
nil, | ||
), | ||
CurrentTCPRTT: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "net_current_tcp_rtt"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prometheus.BuildFQName(Namespace, subsystem, "net_current_tcp_rtt"), | |
prometheus.BuildFQName(Namespace, subsystem, "net_current_tcp_rtt_seconds"), |
collector/remote_fx.go
Outdated
nil, | ||
), | ||
CurrentUDPRTT: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "net_current_udp_rtt"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prometheus.BuildFQName(Namespace, subsystem, "net_current_udp_rtt"), | |
prometheus.BuildFQName(Namespace, subsystem, "net_current_udp_rtt_seconds"), |
collector/remote_fx.go
Outdated
FramesSkippedPerSecondInsufficientClientResources: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "gfx_frames_skipped_insufficient_clt_res_total"), | ||
"Number of frames skipped per second due to insufficient client resources.", | ||
[]string{"session_name"}, | ||
nil, | ||
), | ||
FramesSkippedPerSecondInsufficientNetworkResources: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "gfx_frames_skipped_insufficient_net_res_total"), | ||
"Number of frames skipped per second due to insufficient network resources.", | ||
[]string{"session_name"}, | ||
nil, | ||
), | ||
FramesSkippedPerSecondInsufficientServerResources: prometheus.NewDesc( | ||
prometheus.BuildFQName(Namespace, subsystem, "gfx_frames_skipped_insufficient_srv_res_total"), | ||
"Number of frames skipped per second due to insufficient server resources.", | ||
[]string{"session_name"}, | ||
nil, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to merge these into a gfx_frames_skipped_total
with a reason
label? From an alerting perspective that might make it easier to write a query?
@carlpett I have made changes as you suggested. Intead of commit suggestions, I have done same changes in last commit as it it requires changes to the docs as well. |
Fantastic work @asiyani, thank you so very much for your contribution and patience! 🎉 |
Added collector for
Terminal service performance matrics
Win32_PerfRawData_LocalSessionManager_TerminalServices & Win32_PerfRawData_TermService_TerminalServicesSession
Connection Broker Performance matrics, Total session counts in RDS farm (Only if host is a connection Broker)
Win32_PerfRawData_RemoteDesktopConnectionBrokerPerformanceCounterProvider_RemoteDesktopConnectionBrokerCounterset
RemoteFx Network and Graphics Performance matrics
Win32_PerfRawData_Counters_RemoteFXNetwork & Win32_PerfRawData_Counters_RemoteFXGraphics metrics