-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added client processing time #433
Added client processing time #433
Conversation
Signed-off-by: saimedhi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks clean!
Does the project document these metrics? If not we should add it.
It's unclear to me whether client time includes serialization/deserialization of data. Basically, if I/O and server processing were instant, would client time be the entire duration of a request?
I am not familiar with this codebase and am not a maintainer here, so leaving the PR to others.
@@ -1706,6 +1706,7 @@ def __call__(self): | |||
task.operation.name, | |||
self.summary_stats("throughput", t, op_type), | |||
self.single_latency(t, op_type), | |||
self.single_latency(t, op_type, metric_name="client_processing_time"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to call this client_time
? Or does this not include client waiting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, client_processing_time is calculated as (total request time - service time). Based on my understanding, it does not include client waiting.
@@ -1995,12 +1998,15 @@ def op_metrics(op_item, key, single_value=False): | |||
def v(self, d, k, default=None): | |||
return d.get(k, default) if d else default | |||
|
|||
def add_op_metrics(self, task, operation, throughput, latency, service_time, processing_time, error_rate, duration, meta): | |||
def add_op_metrics(self, task, operation, throughput, latency, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ripe for refactoring into an object passed around called Metrics
. For a future/separate PR.
|
||
processing_end = time.perf_counter() | ||
service_time = request_end - request_start | ||
client_processing_time = (client_request_end - client_request_start)-service_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some spaces around -
. Open an issue to add a formatter/linter.
Documentation is not available in the opensearch-benchmark repo but can be found on the opensearch documentation website: OpenSearch Benchmark User Guide. |
@gkamat, @IanHoang As we discussed I am posting (client processing time calculated in this PR) vs (Diff of Existing processing time and service time metrics). And also calculated % difference among both client processing time calculations. It is around 10%.
|
Based on this data, OSB's calculations are always higher than the requests context. Do you see similarities (where OSB's client processing time has 10% diff compared to these changes) when you run with http-logs and nyc_taxis (more commonly run workloads and workloads used for nightly runs)? |
workload: http_logs
|
workload:nyc_taxis
|
@saimedhi this is useful data. We will carry out some additional runs, and if the magnitude of the difference is consistently more than 7% or so as your runs seem to indicate, we'll look into changing the processing time computation to measure client processing time instead. There is likely no need to have two entries for similar metrics. |
@gkamat Are you planning to merge this? What remains to be done to get these metrics in? |
@dblock To provide some context: In our discussion with @saimedhi last week, we discovered another approach where "client processing time" can be derived by using existing data captured by OSB (as shown as follows) without introducing all the changes proposed by this PR:
To see if there's a difference between OSB's approach and Sai's approach from this PR, we asked Sai to perform a few runs to determine if there are any differences in Before we can merge this in, we'd like to run some additional tests this week to determine if we can replicate those differences across all workloads and testing setups. If we can reproduce Sai's numbers and decide to opt for those, it might be better to remove |
Isn't it useful to isolate client processing time, server processing time, and network delays, as three separate numbers? |
Yes. I am after metrics that support claims in issues such as opensearch-project/opensearch-py#446. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am late to this but a few questions/comments:
- This PR is about client processing time, are we also targeting to publish metrics for various operations like bulk, search etc. from client side? This will help in catching bottlenecks for different APIs if they are happening from the client side.
- Are we making it a standard to add these metrics in opensearch-benchmark or are we still debating if they can be added in the client vs in opensearch-benchmark?
We have been delayed in reviewing this PR due to OSB 1.2.0 release activities, but should have updates later this week. @VachaShah, OSB already publishes latency and service time metrics to the metrics data store and the standard output for each operation like bulk and query, see sample output in https://opensearch.org/docs/latest/benchmark/quickstart. All this data is from the client (OSB) side. This PR relates only to exposing the fraction of the latency numbers that are expended in the backend client being used (currently this is opensearch-py, but we will have other clients supported in the future.) Another aspect is exposing server processing time, which will be addressed as a separate issue and PR. |
Description
Introduced a new metric: Client Processing Time.
Client Processing Time: The delta between total request time and service time.
Total Request Time: Defined as the duration between the runner sending a request to the client OpenSearch-py and receiving the response.
Service Time: Represents the interval from the server receiving the request to the server sending the response. Note: There was a discrepancy in the documentation regarding Service Time, and I have clarified my observations in the associated PR.
Issues Resolved
#432