-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Improve measurement calculation in benchmark #413
Comments
Thanks, Vishnu! For context, the issue that I noticed is that measurements are captured for pods from a previous run when they are cleaned up during a following run. Which means latency measurements for a run can include pods from the previous run without GC. This appears to be one use of the "latency < 0" check because the measurements I see all have latencies v1=0 and v2>0. I agree with the approach of filtering by UUID of the current run, but also (what I thought might be) a simple re-order of newMeasurementFactory and Cleanup would suffice. https://github.com/cloud-bulldozer/kube-burner/blob/master/pkg/burner/job.go#L87-L119 |
Yes Andrew! We just need to pick the labels correctly. |
Makes sense, as discussed internally, this case only happens when one of the previous workloads is not garbage collected. It's curious, seems like some pods like Also, if we reuse the UUID by chance, the issue could still happen. But I'd say that is very unlikely and not recommended. |
Acknowleged! Thinking to generate a unique runID for that specific run in the program runtime and use it for labelling run specific resources. |
Yeah - I would prefer not using the UUID. As we can mimic the same behavior you call out here by setting |
If a user uses the same UUID for two runs and gets wrong results bc of this edge case, I think the bad measurements are on them. :) We expect UUID to be unique for a run. (that's the second 'U') |
I would argue the same thing if they had |
Argue that gc=false is an edge case so I deserve bad measurements? Running with gc=false is valid. I occasionally want to check the state of a workload before it gets cleaned up, so I disable GC. I want a UUID to be random, so I let the tool generate it for me. We can make an assumption that the UUID is the unique identifier for a run. No need for an additional UID. Reset. My initial complaint for this issue wasn't about filtering pods for the run, but changing the order of operations: The bug I wanted to highlight was creating podMeasurements before cleaning up workloads, and is getting lost in this discussion. |
This is all I wanted: 68dae0f |
No, but in this situation if you forget to delete the namespace, it is on you.
No argument there.
Well, with that statement, we should remove the user-provided UUID -- so we don't get in the situation of the user providing the same UUID for both runs (similar, if they ran with
ack - ok. |
I wish it was this simple. But doing a cleanup early would impact other job types like Related PR: #421 |
Bug Description
Version: latest
Git Commit: ffa2d6a
Build Date: 2023-08-08-21:35:14
Go Version: go1.20.4
OS/Arch: linux amd64
Describe the bug
In our current implementation, we register measurements on all the pods that are present in the namespace specific to a job in our benchmark run. The problem with this approach is, name space is not unique always and we are landing into situations where we are also considering pods from a previous run that are present in a namespace with same name for latency calculations.
To Reproduce
Execute initial run with
Initial run logs: https://gist.github.com/vishnuchalla/efdac4f963d9a1292bf9fadd0e4ec039
Execute a follow up run with
Follow up run logs: https://gist.github.com/vishnuchalla/1e41f9f501ac71a3218514a6904aafb9
Now from both the runs we must be able to find a pod that is being considered for measurements. For example:
Pod perfapp-1-0-7c449cc684-h5v2c is ready
Expected behavior
Measurements should only be calculated on the resources that are specific to that benchmark run.
Screenshots or output
Initial run logs: https://gist.github.com/vishnuchalla/efdac4f963d9a1292bf9fadd0e4ec039
Follow up run logs: https://gist.github.com/vishnuchalla/1e41f9f501ac71a3218514a6904aafb9
Additional context
I think we should ideally have UUID label attached to all our resources that are created in a benchmark, so that we can easily distinguish them among others and will be easy to perform any kind of action on them programatically.
The text was updated successfully, but these errors were encountered: