Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalability and Performance Test Cases for Ephemeral Volumes #601

Conversation

mucahitkurt
Copy link
Contributor

Related issue: #600

Different override files are used to test different ephemeral volume types.
For test cases and parameters you can check README file.

/sig storage
/assign @msau42

@k8s-ci-robot
Copy link
Contributor

Welcome @mucahitkurt!

It looks like this is your first PR to kubernetes/perf-tests 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/perf-tests has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Jul 3, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @mucahitkurt. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 3, 2019
@k8s-ci-robot k8s-ci-robot requested review from mborsz and mm4tt July 3, 2019 23:33
@msau42
Copy link
Member

msau42 commented Jul 3, 2019

/ok-to-test
cc @hantaowang

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 3, 2019
@msau42
Copy link
Member

msau42 commented Jul 3, 2019

cc @wojtek-t

@wojtek-t
Copy link
Member

wojtek-t commented Jul 5, 2019

@mucahitkurt - thanks a lot for this PR
Sorry, I'm a bit overloaded - I will take a look into it early next week. Sorry for delay.

/assign

{{$GROUP := "pod-with-ephemeral-volume-startup-latency"}}

#Variables
# TODO think about this!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ownerless/issue-less todo, please fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear in comments but I added this and other TODOs because I need feedbacks about these parameters and their values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above.
I actually believe that we need two separate tests:

  1. first focus only on the first two cases from the KEP (which by design are running on a very small cluster)
  2. separately address the third case, which I think has to be written a bit differently

Assuming we're focuing on (1) here, I would vote for simplifying this case and not think about namespaces, etc. at all here. I would be much easier to understand then,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll delete the parameters that I need for the (2) tests

name: pod-with-ephemeral-volume-startup-latency
automanagedNamespaces: {{$namespaces}}
tuningSets:
# TODO different options?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uniformqps make sense to me

Params:
desiredPodCount: {{$totalPods}}
labelSelector: group = {{$GROUP}}
# TODO how can i determine this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Good approach is to start with something big enough, run the test a few times and see what is the distribution of podWithStorage startup-latency and then adjust the timeout making it big enough that it should never time out if everything goes well but small enough that it would catch issues in a reasonable time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is also determined with the test rollout process.

@@ -0,0 +1,97 @@
#Constants
{{$NODES_PER_NAMESPACE := DefaultParam .NODES_PER_NAMESPACE 1}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ, should the test scale with the size of the cluster and what was the cluster size you had in your mind when designing this test?

We're running scalability tests on clusters up to 5K nodes, if this test was run on such cluster as is, it would create 5K namespaces with 1 pod in each namespace? Is it want you wanted to achieve?

Copy link
Contributor Author

@mucahitkurt mucahitkurt Jul 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there will be three different cluster usage scenarios, 2 of them will use 1 node and one of them will be cluster wide.

There will be a test rollout phase before these tests are run regularly, to determine the thresholds, but I don't know how that rollout process will be executed, before PR merge or after?, maybe @msau42 knows. I thought as these parameters can be changed manually during rollout phase, so I didn't create different override files for 1 node case and cluster wide case. Because the same thing is also valid for parameters like how many volumes will be created per node, or how many volumes will be created per node. Of course I can create different override files with the max limits for each volume type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a test rollout phase before these tests are run regularly, to determine the thresholds, but I don't know how that rollout process will be executed, before PR merge or after?

We should mark this test as more experimental (maybe under experimental/ directory) and we can do that after the PR is merged. But I'm fine either way too.

@mm4tt - the first two cases, are more about performance than scalability, and will be run against very small (1-node?) cluster. That is probably important part of the answer to the question.

@mucahitkurt - I think that we should focus on the first 2 case first (as the usecase there is completely different). And leave the third case (where the tests may potentially be visibly different) as a second thing. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covering only first 2 cases makes sense to me, I can create a different PR for the 3rd case after this PR is merged. I'll put these test cases under an experimental directory and 1_node directory.

@@ -0,0 +1,97 @@
#Constants
{{$NODES_PER_NAMESPACE := DefaultParam .NODES_PER_NAMESPACE 1}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a test rollout phase before these tests are run regularly, to determine the thresholds, but I don't know how that rollout process will be executed, before PR merge or after?

We should mark this test as more experimental (maybe under experimental/ directory) and we can do that after the PR is merged. But I'm fine either way too.

@mm4tt - the first two cases, are more about performance than scalability, and will be run against very small (1-node?) cluster. That is probably important part of the answer to the question.

@mucahitkurt - I think that we should focus on the first 2 case first (as the usecase there is completely different). And leave the third case (where the tests may potentially be visibly different) as a second thing. WDYT?

{{$GROUP := "pod-with-ephemeral-volume-startup-latency"}}

#Variables
# TODO think about this!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above.
I actually believe that we need two separate tests:

  1. first focus only on the first two cases from the KEP (which by design are running on a very small cluster)
  2. separately address the third case, which I think has to be written a bit differently

Assuming we're focuing on (1) here, I would vote for simplifying this case and not think about namespaces, etc. at all here. I would be much easier to understand then,

name: pod-with-ephemeral-volume-startup-latency
automanagedNamespaces: {{$namespaces}}
tuningSets:
# TODO different options?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uniformqps make sense to me

Method: APIResponsiveness
Params:
action: reset
{{ if $PROVISION_VOLUME }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to gate provisioning volumes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the gate because for some volumes, like configmap and secret, I should pre-provision the volumes to use it inside the pod, but for some volumes, like downwardapi and emptyDir, I don't need volume pre-provisioning, volume settings are directly written inside pod template.

- basename: vol-{{$APP_NAME}}
objectTemplatePath: {{$VOLUME_TYPE}}.yaml
{{ end }}
# Start measurements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for moving this up and doing measurement operations together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, because KEP says that assuming that the volumes have already been provisioned.

If I move this up, then measurement is started before volume provisioning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to separate out provisioning time measurements from pod startup measurements because provisioning is generally a one-time bootstrapping operation (for an application).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I move this up, then measurement is started before volume provisioning.

yes - but the measurement only measures pod startup time. the fact that you create some other objects doesn't affect that at all.

We want to separate out provisioning time - but that means creating those volumes before we create a pod - which is happening in this test no matter when you start the measurement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does pod startup measurement begin? For ephemeral volumes, kubelet cannot mount API volumes until the API object is created, and containers cannot start until all volumes are mounted. For PVCs, scheduler cannot schedule the pod until the pvs are provisioned and bound to the PVC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "measurement start" action in the config is basically "start watching for pod creations and for those measure pod startup time".
Measuring pod startup time for individual pod starts when that particular pod is created, which in case of this config are created after all volumes already exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it up, thanks!

type: Opaque
data:
username: dXNlcm5hbWVfCg==
password: cGFzc3dvcmRfCg==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you add end-of-lines in all files?

- name: {{.Name}}
image: k8s.gcr.io/pause:3.1
imagePullPolicy: IfNotPresent
ports:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need ports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I'll delete

mountPath: /usr/share/{{$volumeIndex}}
{{ end }}
volumes:
{{ range $volumeIndex, $vol := Seq .VolumesPerPod }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this really makes it easier to understand.

I agree that it would result in some duplication of code (or I should say config), but wouldn't having:
pod_with_secrets, pod_with_configmaps, pod_with_emptydirs as separate yamls, be simpler?

Longer-term, I filed: #619, but I don't know how easy that is to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do it, also will move the volume specific yaml files under the specific volume directory

@mucahitkurt mucahitkurt force-pushed the storage/scalability-performance-tests-pod-startup branch from 4d2c1e9 to 6c39713 Compare July 8, 2019 21:18
@mucahitkurt
Copy link
Contributor Author

@wojtek-t thanks a lot for your comments! I tried to reply & fix them, PTAL.

@mucahitkurt
Copy link
Contributor Author

/test pull-perf-tests-clusterloader2

…or different volume types

Signed-off-by: Mucahit Kurt <[email protected]>
@mucahitkurt mucahitkurt force-pushed the storage/scalability-performance-tests-pod-startup branch from 6c39713 to 185997a Compare July 9, 2019 04:18
Params:
action: start
labelSelector: group = {{$GROUP}}
threshold: {{$podStartupTimeout}}s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium-term (or maybe even not that medium), we should merge the logic of detecting whether the pod is stateful or stateless into the measurement and use thresholds automatically.

@mm4tt @oxddr - do you know if we have an issue opened for that?

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostlly have couple more cosmetic comments, but I'm generally happy with how it looks like now.

{{ if $PROVISION_VOLUME }}
# Provision volumes
- phases:
- namespaceRange:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krzysied - I was thinking about it in the past, but I don't know if we ever did that (and if I ever shared with you that I was thinking about it).
In majority of cases, we're setting "namespaceRange" to basically all automanaged-namespaces.
What I would like to do is to remove all of those and say:
"if you don't specify namespaceRange, we default to all automanaged namespaces"

Is that done? If not, would you be able to add that (should be quite simple to do).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if namespaceRange == nil {
// Returns "" which represents cluster level.
return []string{""}
}

If namespaceRange is unset, the cluster level object will be created. IIRC currently we are not using this feature. The question is if we want to create non-namespace objects in the future...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh ok - we definitely would like to be able to create non-namespaced objects (e.g. CRD is non-namespaced).

We should probably come up with some "allNamespaces" thing to reduce amount of boilerplate...

labelSelector: group = {{$GROUP}}
# TODO decide this after test roll-out phase
timeout: 15m
- measurements:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly as for starting measurement, can you also move it to the bottom?

[As I mentioned somewhere else, medium term, we will simply bake all the necessary thresholds into the measurement itself and we will bundle all SLO-related ones together to limit boilerplate. It will just be simpler to do then :)]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok., I think deleting pods before the gather action of PodStartupLatency measurement has no effect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodStartupLatency measurement can only change if when pods are being created - so yes, that will have no effect.

@@ -0,0 +1,3 @@
PROVISION_VOLUME: true
POD_TEMPLATE_PATH: "configmap/pod_with_ConfigMaps.yaml"
VOLUME_TEMPLATE_PATH: "configmap/ConfigMap.yaml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you somewhere use the assumption that the file names are capitalized ConfigMap (I don't see that, but maybe I'm missing something).
If not, it would be more natural to have:
configmap/config_map.yaml
configmap/pod_with_config_map.yaml

[Same for secrets and others]

Copy link
Contributor Author

@mucahitkurt mucahitkurt Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was but no longer, I'll rename them, but volume names are not used as separate words generally, usages are like ConfigMap, EmptyDir etc. I'll rename them as like configmap/configmap.yaml, configmap/pod_with_configmap.yaml etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that sounds fine, thanks!

@mucahitkurt
Copy link
Contributor Author

mucahitkurt commented Jul 12, 2019

Hi @wojtek-t, @msau42 ,

Is there anything that I should do with this PR, including test roll out phase, I don't know how that will be executed and there will be anything that I can contribute?

I'll open another PR for the ephemeral volume test cases that will be run on cluster wide. Thanks!

@msau42
Copy link
Member

msau42 commented Jul 12, 2019

/lgtm
Thanks for taking this on! @wojtek-t once this merges, how do you want to proceed on establishing a baseline?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 12, 2019
@wojtek-t
Copy link
Member

@msau42

Thanks for taking this on! @wojtek-t once this merges, how do you want to proceed on establishing a baseline?

I think we should create a test-suite similar to this one:
https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/sig-scalability/sig-scalability-periodic-jobs.yaml#L2
[It should probably live in the same place, we may want to add it under experimental tab for sig-scalability.]
After couple days, we should go over tests and see how the results look like (may be useful to expose them in perf-dash, @krzysied may help with that).

Can someone some sig-storage pick it?

/approve
/hold

Holding to get agreement on next steps - feel free to cancel if you ~agree with the above.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 15, 2019
@mucahitkurt
Copy link
Contributor Author

mucahitkurt commented Jul 15, 2019

@msau42

Thanks for taking this on! @wojtek-t once this merges, how do you want to proceed on establishing a baseline?

I think we should create a test-suite similar to this one:
https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/sig-scalability/sig-scalability-periodic-jobs.yaml#L2
[It should probably live in the same place, we may want to add it under experimental tab for sig-scalability.]
After couple days, we should go over tests and see how the results look like (may be useful to expose them in perf-dash, @krzysied may help with that).

Can someone some sig-storage pick it?

/approve
/hold

Holding to get agreement on next steps - feel free to cancel if you ~agree with the above.

@wojtek-t , @msau42

I have some questions about periodic job configuration, we have 2 test cases, one of them is max volumes for a pod case, other one is max pod-volume(1-1) for a node case. Currently I have one config file, and it seems for first test case, and volumes for pod value is 4(I added randomly).

First question is, should I create two different override files for the 2 different test cases? And periodic job definitions will have two different override files, one for test case specific(max volume for a pod or max volume for a node), other one for volume type? So there are 4 ephemeral volume types, 2 test cases, and there will be 8 periodic job definitions?

Second question is, what should my initial values be for the max volumes for a pod case(current value is 4), and max pod-volume(1-1) for a node case? Max limits is 100 pods per node, can I use this for both scenarios?

@msau42
Copy link
Member

msau42 commented Jul 15, 2019

This plan sounds good to me. For max pod-volume, 100 pods with 1 volume each sounds good to me. For max volumes in 1 pod, 100 volumes also sounds fine. Although, for the 2nd case, how is it measured? Does the test only measure 1 pod? Or does it measure multiple pods serially?

@wojtek-t
Copy link
Member

First question is, should I create two different override files for the 2 different test cases? And periodic job definitions will have two different override files, one for test case specific(max volume for a pod or max volume for a node), other one for volume type? So there are 4 ephemeral volume types, 2 test cases, and there will be 8 periodic job definitions?

I think that sounds most reasonable for now.
[Medium term, (assuming we already don't - @krzysied - what is the current state?) we should allow running the same test with different overrides as part of the same test job.]

This plan sounds good to me. For max pod-volume, 100 pods with 1 volume each sounds good to me. For max volumes in 1 pod, 100 volumes also sounds fine. Although, for the 2nd case, how is it measured? Does the test only measure 1 pod? Or does it measure multiple pods serially?

+1
I think that for now we only create 1 pod and measure startup for it. This has a drawback that it's not statistically valuable. We should try changing to "create+delete" N times and compute percentiles from that. But I'm fine with moving those changes to subsequent PR.

@krzysied
Copy link
Contributor

First question is, should I create two different override files for the 2 different test cases? And periodic job definitions will have two different override files, one for test case specific(max volume for a pod or max volume for a node), other one for volume type? So there are 4 ephemeral volume types, 2 test cases, and there will be 8 periodic job definitions?

I think that sounds most reasonable for now.
[Medium term, (assuming we already don't - @krzysied - what is the current state?) we should allow running the same test with different overrides as part of the same test job.]

Yes, currently it requires 8 periodic job definitions. Overrides are passed to the clusterloader as flags so it applies to the job scope, not the test scope.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 16, 2019
@mucahitkurt
Copy link
Contributor Author

@wojtek-t @msau42 I added a new override file for the max volumes per node case, and also updated the default volume number to 100.

I'll open a PR to test-infra for the 8 periodic job definitions.

…ault value for volumes per pod

Signed-off-by: Mucahit Kurt <[email protected]>
@msau42
Copy link
Member

msau42 commented Jul 16, 2019

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 16, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, mucahitkurt, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 934323e into kubernetes:master Jul 16, 2019
@wojtek-t
Copy link
Member

@mucahitkurt - next time please squash the commits before merging
[other than that lgtm]

@mucahitkurt
Copy link
Contributor Author

@mucahitkurt - next time please squash the commits before merging
[other than that lgtm]

Ok., sorry about that, I forgot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants