-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalability and Performance Test Cases for Ephemeral Volumes #601
Scalability and Performance Test Cases for Ephemeral Volumes #601
Conversation
Signed-off-by: Mucahit Kurt <[email protected]>
Welcome @mucahitkurt! |
Hi @mucahitkurt. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
cc @wojtek-t |
@mucahitkurt - thanks a lot for this PR /assign |
{{$GROUP := "pod-with-ephemeral-volume-startup-latency"}} | ||
|
||
#Variables | ||
# TODO think about this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ownerless/issue-less todo, please fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear in comments but I added this and other TODOs because I need feedbacks about these parameters and their values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above.
I actually believe that we need two separate tests:
- first focus only on the first two cases from the KEP (which by design are running on a very small cluster)
- separately address the third case, which I think has to be written a bit differently
Assuming we're focuing on (1) here, I would vote for simplifying this case and not think about namespaces, etc. at all here. I would be much easier to understand then,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll delete the parameters that I need for the (2) tests
name: pod-with-ephemeral-volume-startup-latency | ||
automanagedNamespaces: {{$namespaces}} | ||
tuningSets: | ||
# TODO different options? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uniformqps make sense to me
Params: | ||
desiredPodCount: {{$totalPods}} | ||
labelSelector: group = {{$GROUP}} | ||
# TODO how can i determine this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
Good approach is to start with something big enough, run the test a few times and see what is the distribution of podWithStorage startup-latency and then adjust the timeout making it big enough that it should never time out if everything goes well but small enough that it would catch issues in a reasonable time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one is also determined with the test rollout process.
@@ -0,0 +1,97 @@ | |||
#Constants | |||
{{$NODES_PER_NAMESPACE := DefaultParam .NODES_PER_NAMESPACE 1}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ, should the test scale with the size of the cluster and what was the cluster size you had in your mind when designing this test?
We're running scalability tests on clusters up to 5K nodes, if this test was run on such cluster as is, it would create 5K namespaces with 1 pod in each namespace? Is it want you wanted to achieve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there will be three different cluster usage scenarios, 2 of them will use 1 node and one of them will be cluster wide.
There will be a test rollout phase before these tests are run regularly, to determine the thresholds, but I don't know how that rollout process will be executed, before PR merge or after?, maybe @msau42 knows. I thought as these parameters can be changed manually during rollout phase, so I didn't create different override files for 1 node case and cluster wide case. Because the same thing is also valid for parameters like how many volumes will be created per node, or how many volumes will be created per node. Of course I can create different override files with the max limits for each volume type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be a test rollout phase before these tests are run regularly, to determine the thresholds, but I don't know how that rollout process will be executed, before PR merge or after?
We should mark this test as more experimental (maybe under experimental/ directory) and we can do that after the PR is merged. But I'm fine either way too.
@mm4tt - the first two cases, are more about performance than scalability, and will be run against very small (1-node?) cluster. That is probably important part of the answer to the question.
@mucahitkurt - I think that we should focus on the first 2 case first (as the usecase there is completely different). And leave the third case (where the tests may potentially be visibly different) as a second thing. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covering only first 2 cases makes sense to me, I can create a different PR for the 3rd case after this PR is merged. I'll put these test cases under an experimental
directory and 1_node
directory.
@@ -0,0 +1,97 @@ | |||
#Constants | |||
{{$NODES_PER_NAMESPACE := DefaultParam .NODES_PER_NAMESPACE 1}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be a test rollout phase before these tests are run regularly, to determine the thresholds, but I don't know how that rollout process will be executed, before PR merge or after?
We should mark this test as more experimental (maybe under experimental/ directory) and we can do that after the PR is merged. But I'm fine either way too.
@mm4tt - the first two cases, are more about performance than scalability, and will be run against very small (1-node?) cluster. That is probably important part of the answer to the question.
@mucahitkurt - I think that we should focus on the first 2 case first (as the usecase there is completely different). And leave the third case (where the tests may potentially be visibly different) as a second thing. WDYT?
{{$GROUP := "pod-with-ephemeral-volume-startup-latency"}} | ||
|
||
#Variables | ||
# TODO think about this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above.
I actually believe that we need two separate tests:
- first focus only on the first two cases from the KEP (which by design are running on a very small cluster)
- separately address the third case, which I think has to be written a bit differently
Assuming we're focuing on (1) here, I would vote for simplifying this case and not think about namespaces, etc. at all here. I would be much easier to understand then,
name: pod-with-ephemeral-volume-startup-latency | ||
automanagedNamespaces: {{$namespaces}} | ||
tuningSets: | ||
# TODO different options? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uniformqps make sense to me
Method: APIResponsiveness | ||
Params: | ||
action: reset | ||
{{ if $PROVISION_VOLUME }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to gate provisioning volumes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the gate because for some volumes, like configmap and secret, I should pre-provision the volumes to use it inside the pod, but for some volumes, like downwardapi and emptyDir, I don't need volume pre-provisioning, volume settings are directly written inside pod template.
- basename: vol-{{$APP_NAME}} | ||
objectTemplatePath: {{$VOLUME_TYPE}}.yaml | ||
{{ end }} | ||
# Start measurements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for moving this up and doing measurement operations together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this, because KEP says that assuming that the volumes have already been provisioned
.
If I move this up, then measurement is started before volume provisioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to separate out provisioning time measurements from pod startup measurements because provisioning is generally a one-time bootstrapping operation (for an application).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I move this up, then measurement is started before volume provisioning.
yes - but the measurement only measures pod startup time. the fact that you create some other objects doesn't affect that at all.
We want to separate out provisioning time - but that means creating those volumes before we create a pod - which is happening in this test no matter when you start the measurement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does pod startup measurement begin? For ephemeral volumes, kubelet cannot mount API volumes until the API object is created, and containers cannot start until all volumes are mounted. For PVCs, scheduler cannot schedule the pod until the pvs are provisioned and bound to the PVC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "measurement start" action in the config is basically "start watching for pod creations and for those measure pod startup time".
Measuring pod startup time for individual pod starts when that particular pod is created, which in case of this config are created after all volumes already exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it up, thanks!
type: Opaque | ||
data: | ||
username: dXNlcm5hbWVfCg== | ||
password: cGFzc3dvcmRfCg== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you add end-of-lines in all files?
- name: {{.Name}} | ||
image: k8s.gcr.io/pause:3.1 | ||
imagePullPolicy: IfNotPresent | ||
ports: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we need ports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I'll delete
mountPath: /usr/share/{{$volumeIndex}} | ||
{{ end }} | ||
volumes: | ||
{{ range $volumeIndex, $vol := Seq .VolumesPerPod }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this really makes it easier to understand.
I agree that it would result in some duplication of code (or I should say config), but wouldn't having:
pod_with_secrets, pod_with_configmaps, pod_with_emptydirs as separate yamls, be simpler?
Longer-term, I filed: #619, but I don't know how easy that is to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do it, also will move the volume specific yaml files under the specific volume directory
4d2c1e9
to
6c39713
Compare
@wojtek-t thanks a lot for your comments! I tried to reply & fix them, PTAL. |
/test pull-perf-tests-clusterloader2 |
…or different volume types Signed-off-by: Mucahit Kurt <[email protected]>
6c39713
to
185997a
Compare
Params: | ||
action: start | ||
labelSelector: group = {{$GROUP}} | ||
threshold: {{$podStartupTimeout}}s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Mucahit Kurt <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostlly have couple more cosmetic comments, but I'm generally happy with how it looks like now.
{{ if $PROVISION_VOLUME }} | ||
# Provision volumes | ||
- phases: | ||
- namespaceRange: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krzysied - I was thinking about it in the past, but I don't know if we ever did that (and if I ever shared with you that I was thinking about it).
In majority of cases, we're setting "namespaceRange" to basically all automanaged-namespaces.
What I would like to do is to remove all of those and say:
"if you don't specify namespaceRange, we default to all automanaged namespaces"
Is that done? If not, would you be able to add that (should be quite simple to do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perf-tests/clusterloader2/pkg/test/simple_test_executor.go
Lines 312 to 315 in 459486e
if namespaceRange == nil { | |
// Returns "" which represents cluster level. | |
return []string{""} | |
} |
If namespaceRange is unset, the cluster level object will be created. IIRC currently we are not using this feature. The question is if we want to create non-namespace objects in the future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh ok - we definitely would like to be able to create non-namespaced objects (e.g. CRD is non-namespaced).
We should probably come up with some "allNamespaces" thing to reduce amount of boilerplate...
labelSelector: group = {{$GROUP}} | ||
# TODO decide this after test roll-out phase | ||
timeout: 15m | ||
- measurements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly as for starting measurement, can you also move it to the bottom?
[As I mentioned somewhere else, medium term, we will simply bake all the necessary thresholds into the measurement itself and we will bundle all SLO-related ones together to limit boilerplate. It will just be simpler to do then :)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok., I think deleting pods before the gather action of PodStartupLatency measurement has no effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PodStartupLatency measurement can only change if when pods are being created - so yes, that will have no effect.
@@ -0,0 +1,3 @@ | |||
PROVISION_VOLUME: true | |||
POD_TEMPLATE_PATH: "configmap/pod_with_ConfigMaps.yaml" | |||
VOLUME_TEMPLATE_PATH: "configmap/ConfigMap.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you somewhere use the assumption that the file names are capitalized ConfigMap (I don't see that, but maybe I'm missing something).
If not, it would be more natural to have:
configmap/config_map.yaml
configmap/pod_with_config_map.yaml
[Same for secrets and others]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was but no longer, I'll rename them, but volume names are not used as separate words generally, usages are like ConfigMap
, EmptyDir
etc. I'll rename them as like configmap/configmap.yaml
, configmap/pod_with_configmap.yaml
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that sounds fine, thanks!
Signed-off-by: Mucahit Kurt <[email protected]>
/lgtm |
I think we should create a test-suite similar to this one: Can someone some sig-storage pick it? /approve Holding to get agreement on next steps - feel free to cancel if you ~agree with the above. |
I have some questions about periodic job configuration, we have 2 test cases, one of them is max volumes for a pod case, other one is max pod-volume(1-1) for a node case. Currently I have one config file, and it seems for first test case, and volumes for pod value is 4(I added randomly). First question is, should I create two different override files for the 2 different test cases? And periodic job definitions will have two different override files, one for test case specific(max volume for a pod or max volume for a node), other one for volume type? So there are 4 ephemeral volume types, 2 test cases, and there will be 8 periodic job definitions? Second question is, what should my initial values be for the max volumes for a pod case(current value is 4), and max pod-volume(1-1) for a node case? Max limits is 100 pods per node, can I use this for both scenarios? |
This plan sounds good to me. For max pod-volume, 100 pods with 1 volume each sounds good to me. For max volumes in 1 pod, 100 volumes also sounds fine. Although, for the 2nd case, how is it measured? Does the test only measure 1 pod? Or does it measure multiple pods serially? |
I think that sounds most reasonable for now.
+1 |
Yes, currently it requires 8 periodic job definitions. Overrides are passed to the clusterloader as flags so it applies to the job scope, not the test scope. |
…ault value for volumes per pod Signed-off-by: Mucahit Kurt <[email protected]>
fa76e6e
to
377ab09
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: msau42, mucahitkurt, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@mucahitkurt - next time please squash the commits before merging |
Ok., sorry about that, I forgot. |
Related issue: #600
Different override files are used to test different ephemeral volume types.
For test cases and parameters you can check README file.
/sig storage
/assign @msau42