Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add persistent volume scale tests #657

Merged
merged 1 commit into from
Jul 30, 2019

Conversation

hantaowang
Copy link

@hantaowang hantaowang commented Jul 19, 2019

Experimental storage tests to test startup latency of booting pods with PVs. I tried staying as close to @masu's KEP as possible.

This PR depends on functionality from #656. I have it open so we can parallelize the review for both.

1_node_scale_by_pod
This test runs on 1 node, 100 pods per node, and 1 PV per pod. This stresses # of PVs per node and # of pods per node.

1_node_scale_by_pvcs_per_pod
This test runs on 1 node, 1 pod per node, and 100 PVs per pod. This stresses # of PVs per node and # of PVs per pod.

cluster_load_scale_by_nodes
This test runs on X nodes (X from 1 to 5000+). It has a fixed 10 pods per node and 1 PV per pod. This test simulates a realistic large cluster with a normal amount of PVs per node.

TODO (In future PR):

  • Implement the ability to boot the GCE PD CSI Driver to use as the storage backend.
  • Write test to boot a large cluster with a large amount of PVs, and the boot a couple more pods w/ PVs in this saturated cluster.

/assign @wojtek-t @davidz627
cc @msau42 @jingxu97 @verult

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 19, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @hantaowang. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 19, 2019
@k8s-ci-robot k8s-ci-robot requested review from oxddr and wojtek-t July 19, 2019 00:15
@davidz627
Copy link

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 19, 2019
@hantaowang
Copy link
Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 19, 2019
@hantaowang
Copy link
Author

Which tuning set do you recommend? And for the parallelism limited ts, what parallel limit should I set? I wonder if setting it equal to the # of namespaces is reasonable.

@davidz627
Copy link

/assign @msau42

{{$PVC_PER_POD := DefaultParam .PVC_PER_POD 2}}
{{$ENABLE_CHAOSMONKEY := DefaultParam .ENABLE_CHAOSMONKEY false}}
{{$TEST_NAME := DefaultParam .TEST_NAME "generic-test"}}
{{$PVC_SIZE := DefaultParam .PVC_SIZE "8Gi"}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we pick 8Gi

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a nice factor of 2 :)

In honestly there is not a reason behind it other than it seems like a reasonable choice for a single mid sized PV to be used as the default.

{{$TEST_NAME := DefaultParam .TEST_NAME "generic-test"}}
{{$PVC_SIZE := DefaultParam .PVC_SIZE "8Gi"}}
{{$PER_PVC_BOUND_SLO := DefaultParam .PVC_BOUND_SLO 2}}
{{$POD_STARTUP_SLO := DefaultParam .POD_STARTUP_SLO 100}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why default to 100 then set to 300 in every test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it, but this number is not set yet until I figure out a reasonable SLO for each test. I override it to 300 because that seemed reasonable enough as an upperbound for testing purposes.

{{$ENABLE_CHAOSMONKEY := DefaultParam .ENABLE_CHAOSMONKEY false}}
{{$TEST_NAME := DefaultParam .TEST_NAME "generic-test"}}
{{$PVC_SIZE := DefaultParam .PVC_SIZE "8Gi"}}
{{$PER_PVC_BOUND_SLO := DefaultParam .PVC_BOUND_SLO 2}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this in seconds? 2 seconds binding seems low. Not sure if this includes provisioning time or not though

Copy link
Author

@hantaowang hantaowang Jul 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provisioning is done in parallel to a degree. 2 sec is low if you are not provisioning that many, which is why I use a PVCBoundTime of min 60 sec.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry still confuse about the meaning of this parameter and the set of "2"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just means that because the backend can only provision a certain number of PVs at a time and thus is to a degree serial, I set an SLO for the per PV provision time at 2 seconds. So 100 PVs has a total SLO of 2 * 100 = 200 seconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 2 seconds is too fast, especially while we're in an experimental phase and are testing our limits. Let's do something like 30 seconds?

We're also limited by qps of k8s api server and gce api

templateFillMap:
PVCsPerPod: {{$PVC_PER_POD}}
- measurements:
- Identifier: WaitForRunningPVCOwningPods

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming nit: s/PVCOwningPods/PodsWithPVC?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Params:
action: start
nodeMode: {{$NODE_MODE}}
# Create PVCs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we really need this file.
This test is extremely similar to: https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/experimental/storage/pod-startup/ephemeral-volumes/cluster-wide/config.yaml

The only difference is that instead of creating ephemeral storage we create PVCs and wait for them.
But it should be possible to unify them (if you wait for PVC in ephemeral-storage scenario, it will be no-op anyway).

So instead of having two separate files to maintain, I would much rather prefer unifying those two together - I don't think the differences justify having a separate one.

@mucahitkurt - as the one who added the test I mentioned above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be happy to merge the two configs (in a separate PR). There definitely are similarities although I think the PV config will be a little bit more config due to the WaitForBoundPVCs and such.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the WaitForBoundPVCs is the only difference. And just adding it won't break ephemeral volumes, because simple there won't be anything to wait for there.
So unless I'm not missing anything, I'm against having two separate configs and merging them later - we should just do the right thing from the beginning.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would much prefer the base config be the one I wrote. There is a difference in that my config has an option to scale automatically based on # of nodes and overrides is used for different test variations (defined by the KEP), whereas @mucahitkurt’s work operates similar to configs for load and density where overrides is used to define the size of the cluster.

Its ok imo to not merge it yet, but it would take some time to work together and refactor how the configs work. Idealy, it should be the same base woth maybe a flag for pv vs ephemeral.

Copy link
Contributor

@mucahitkurt mucahitkurt Jul 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mucahitkurt’s work operates similar to configs for load and density where overrides is used to define the size of the cluster.

Sorry If I got it wrong, but actually I don't use overrides for the cluster size, e.g. 1_node and cluster_wide case, I use different configs under 1_node and cluster-wide, I started with one config but got feedback on the related PR to separate the configs for those cases for the sake of simplicity over duplication. But of course we can think that cluster-wide config already covers the 1 node case, as you already did. I think it would be better to use the same convention, one config or 2 configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it will cause to some limitations, but may be it would be better if we can get rid of root ephemeral-volumes directory and use the PVC as a different volume type

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it will cause to some limitations, but may be it would be better if we can get rid of root ephemeral-volumes directory and use the PVC as a different volume type

Yes - we probably should.

I would much prefer the base config be the one I wrote. There is a difference in that my config has an option to scale automatically based on # of nodes and overrides is used for different test variations (defined by the KEP), whereas @mucahitkurt’s work operates similar to configs for load and density where overrides is used to define the size of the cluster.

I'm not trying to say you can't modify @mucahitkurt config - we definitely need to. And I'm completely fine with modify something there.
I'm also fine with replacing the one added by @mucahitkurt with this one (though I need to review it more carefully at that point). I just don't see any reason to have both at this point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #688 merged, I think this file is no longer needed - just that one requires changes. Can you please update this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

{{$PVC_SIZE := DefaultParam .PVC_SIZE "8Gi"}}
{{$PER_PVC_BOUND_SLO := DefaultParam .PVC_BOUND_SLO 2}}
{{$POD_STARTUP_SLO := DefaultParam .POD_STARTUP_SLO 300}}
{{$OBJ_CREATION_QPS := DefaultParam .OBJ_CREATION_QPS 10}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OBJ_CREATION_QPS is a configurable variable that describes how fast to create API objects at. The default value of 10 QPS means to create it 10 queries per second. It is used to configure the UniformQPS tuning set (a thing that is responsible for creating k8s objects).

@msau42 @wojtek-t any thoughts on what the most appropriate tuning set to test with is? I am actually not sure if sequence of QPS means in a single namespace or across all namespaces.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it limits how many objects in total to create in one second, which includes PVC/PV, Pods etc?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes thats right

@hantaowang hantaowang force-pushed the pv-storage-tests branch 2 times, most recently from ac6bbab to e47a113 Compare July 25, 2019 00:58
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 25, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 25, 2019
@hantaowang
Copy link
Author

Once #656 is merged, please look at #688 so I can rebase this PR to use the same config as the ephemeral tests (Although I expect there will be 1 more PR after #688 to change that config into something that is more of a middle group between the current one and mine).

@hantaowang
Copy link
Author

hantaowang commented Jul 25, 2019

rebased and ready for review
/hold cancel
I incorporated some aspects of my original config that I felt were improvements.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 25, 2019
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level comment is that we don't want to expose to many knobs - it's just much harder to follow and if noone will be using them, it simply doesn't make sense - we can always extend the test.

{{$ENABLE_CHAOSMONKEY := DefaultParam .ENABLE_CHAOSMONKEY false}}
{{$TEST_NAME := DefaultParam .TEST_NAME "generic-test"}}
{{$GROUP := "pod-with-volumes-startup-latency"}}
{{$TUNING_SET := DefaultParam .TUNING_SET "UniformQPS"}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this - I don't think we want to allow such significant configuration

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most removed, I left test-name but I can remove that too.

{{$VOLUMES_PER_POD := .VOLUMES_PER_POD}}
{{$VOLUME_TEMPLATE_PATH := .VOLUME_TEMPLATE_PATH}}
{{$PROVISION_VOLUME := DefaultParam .PROVISION_VOLUME false}}
{{$VOL_SIZE := DefaultParam .VOL_SIZE "8Gi"}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to expose an option for it? I vote for being opinionated and setting it just to 8G.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with team, and we want to keep this know because different storage options have different min sizes (and will fail if you try to provision less). Also, different sizes have different provision times.

Zonal PV: min 5G
Regional PV: min 200 G
Filestore: min 1T

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This discussion also brought up an interesting point. We are not measuring provisioning time because we said that is bootstrapping behavior that only happens the first time.

We also format the filesystem the first time we mount the volume on any node and that is something this test will currently measure during pod startup. Is it something we want to include? If not, then we would need to bring up pods twice and only measure the second time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point - I opened #706 to not block this PR on it.

{{$GROUP := "pod-with-ephemeral-volume-startup-latency"}}
# SLO Variables
{{$POD_STARTUP_SLO := DefaultParam .POD_STARTUP_SLO 300}}
{{$PER_PVC_BOUND_SLO := DefaultParam .PVC_BOUND_SLO 2}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any SLO for pvc binding - let's not call it that way.
PVC_BINDING_TIME maybe?

The other comment - do we really need it? Can't we hardcode it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the PV storage option, hard coding will break the tests. Filestore takes forever to provision.

{{$PARALLELISM_LIMITED := DefaultParam .PARALLELISM_LIMITED 1}}
{{$ENABLE_CHAOSMONKEY := DefaultParam .ENABLE_CHAOSMONKEY false}}
{{$TEST_NAME := DefaultParam .TEST_NAME "generic-test"}}
{{$GROUP := "pod-with-volumes-startup-latency"}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not introduce unnecessary things.
Please use some shorter name group = volume seems good enough, and hardcode it in necessary places.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

{{$appName := .AppName}}
{{$volumesPerPod := .VolumesPerPod}}
apiVersion: v1
kind: Pod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same concern about using raw volumes instead of a controller is valid for this one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you wait on that PR until this one is merged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - let's have this one merged and then migrate to deployments all in one shot.


name: pod-with-ephemeral-volume-startup-latency

name: pv-storage-{{$TEST_NAME}}-{{$VOLUME_TYPE}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since pv doesn't cover ephemeral, it might be better just starting with storage- or volume-

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, set to storage-

@hantaowang
Copy link
Author

will squash later

{{$WAIT_FOR_PVC := DefaultParam .WAIT_FOR_PVC false}}
{{$POD_STARTUP_TIMEOUT := DefaultParam .POD_STARTUP_TIMEOUT "15m"}}
{{$POD_STARTUP_SLO := DefaultParam .POD_STARTUP_SLO 300}}
{{$PVC_BINDING_TIME := DefaultParam .PVC_BOUND_SLO 2}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to expose this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove this know for now, but we may need it later if we do other types of PD.

VolumesPerPod: {{$VOLUMES_PER_POD}}
AppName: {{$APP_NAME}}
AppName: Sequence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this AppName used for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just use Name instead of it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, this shouldnt even be here

{{$appName := .AppName}}
{{$volumesPerPod := .VolumesPerPod}}
apiVersion: v1
kind: Pod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - let's have this one merged and then migrate to deployments all in one shot.

requests:
storage: {{.VolSize}}
persistentVolumeReclaimPolicy: Delete
volumeBindingMode: Immediate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add end-of line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reclaim policy and binding mode are fields on the storageclass, not PVC.

Until we have a way to measure and subtract out provisioning time, a prerequisite of running this test using default storageclass is that the default storageclass needs to have binding mode = immediate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed those two lines and added an assumption to the top.

@hantaowang
Copy link
Author

@wojtek-t I've squashed the commits because I think this is in a good place to merge now. Please give your final comments or approval.

@hantaowang
Copy link
Author

/retest

@wojtek-t
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 30, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hantaowang, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 30, 2019
@k8s-ci-robot k8s-ci-robot merged commit c7ed37a into kubernetes:master Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants