You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whats the query to calculate triton model latency per request? Is it nv_inference_request_duration_us / nv_inference_exec_count + nv_inference_queue_duration_us
#7692
Open
jayakommuru opened this issue
Oct 11, 2024
· 1 comment
We are doing benchmarking of triton with different backends, but unable to get the metric the calculate the latency of each request (lets assume each request has batch size of b)
Is request latency = rate(nv_inference_request_duration_us[1m]) / rate(nv_inference_exec_count[1m]) + nv_inference_queue_duration_us?
Does nv_inference_request_duration_us include the queuing duration as well ? In documentation, it says its cumulative. can any one confirm?
Are compute_input and compute_output duration also included in the nv_inference_request_duration_us ?
The text was updated successfully, but these errors were encountered:
We are doing benchmarking of triton with different backends, but unable to get the metric the calculate the latency of each request (lets assume each request has batch size of
b
)The text was updated successfully, but these errors were encountered: