-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add persistent volume scale tests #657
Conversation
Hi @hantaowang. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
/hold |
e01d378
to
b6f9857
Compare
Which tuning set do you recommend? And for the parallelism limited ts, what parallel limit should I set? I wonder if setting it equal to the # of namespaces is reasonable. |
/assign @msau42 |
{{$PVC_PER_POD := DefaultParam .PVC_PER_POD 2}} | ||
{{$ENABLE_CHAOSMONKEY := DefaultParam .ENABLE_CHAOSMONKEY false}} | ||
{{$TEST_NAME := DefaultParam .TEST_NAME "generic-test"}} | ||
{{$PVC_SIZE := DefaultParam .PVC_SIZE "8Gi"}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we pick 8Gi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a nice factor of 2 :)
In honestly there is not a reason behind it other than it seems like a reasonable choice for a single mid sized PV to be used as the default.
{{$TEST_NAME := DefaultParam .TEST_NAME "generic-test"}} | ||
{{$PVC_SIZE := DefaultParam .PVC_SIZE "8Gi"}} | ||
{{$PER_PVC_BOUND_SLO := DefaultParam .PVC_BOUND_SLO 2}} | ||
{{$POD_STARTUP_SLO := DefaultParam .POD_STARTUP_SLO 100}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why default to 100
then set to 300
in every test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it, but this number is not set yet until I figure out a reasonable SLO for each test. I override it to 300 because that seemed reasonable enough as an upperbound for testing purposes.
{{$ENABLE_CHAOSMONKEY := DefaultParam .ENABLE_CHAOSMONKEY false}} | ||
{{$TEST_NAME := DefaultParam .TEST_NAME "generic-test"}} | ||
{{$PVC_SIZE := DefaultParam .PVC_SIZE "8Gi"}} | ||
{{$PER_PVC_BOUND_SLO := DefaultParam .PVC_BOUND_SLO 2}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this in seconds? 2 seconds binding seems low. Not sure if this includes provisioning time or not though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provisioning is done in parallel to a degree. 2 sec is low if you are not provisioning that many, which is why I use a PVCBoundTime of min 60 sec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry still confuse about the meaning of this parameter and the set of "2"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just means that because the backend can only provision a certain number of PVs at a time and thus is to a degree serial, I set an SLO for the per PV provision time at 2 seconds. So 100 PVs has a total SLO of 2 * 100 = 200 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 2 seconds is too fast, especially while we're in an experimental phase and are testing our limits. Let's do something like 30 seconds?
We're also limited by qps of k8s api server and gce api
templateFillMap: | ||
PVCsPerPod: {{$PVC_PER_POD}} | ||
- measurements: | ||
- Identifier: WaitForRunningPVCOwningPods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming nit: s/PVCOwningPods/PodsWithPVC
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
Params: | ||
action: start | ||
nodeMode: {{$NODE_MODE}} | ||
# Create PVCs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we really need this file.
This test is extremely similar to: https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/experimental/storage/pod-startup/ephemeral-volumes/cluster-wide/config.yaml
The only difference is that instead of creating ephemeral storage we create PVCs and wait for them.
But it should be possible to unify them (if you wait for PVC in ephemeral-storage scenario, it will be no-op anyway).
So instead of having two separate files to maintain, I would much rather prefer unifying those two together - I don't think the differences justify having a separate one.
@mucahitkurt - as the one who added the test I mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be happy to merge the two configs (in a separate PR). There definitely are similarities although I think the PV config will be a little bit more config due to the WaitForBoundPVCs and such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the WaitForBoundPVCs is the only difference. And just adding it won't break ephemeral volumes, because simple there won't be anything to wait for there.
So unless I'm not missing anything, I'm against having two separate configs and merging them later - we should just do the right thing from the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would much prefer the base config be the one I wrote. There is a difference in that my config has an option to scale automatically based on # of nodes and overrides is used for different test variations (defined by the KEP), whereas @mucahitkurt’s work operates similar to configs for load and density where overrides is used to define the size of the cluster.
Its ok imo to not merge it yet, but it would take some time to work together and refactor how the configs work. Idealy, it should be the same base woth maybe a flag for pv vs ephemeral.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mucahitkurt’s work operates similar to configs for load and density where overrides is used to define the size of the cluster.
Sorry If I got it wrong, but actually I don't use overrides for the cluster size, e.g. 1_node and cluster_wide case, I use different configs under 1_node
and cluster-wide
, I started with one config but got feedback on the related PR to separate the configs for those cases for the sake of simplicity over duplication. But of course we can think that cluster-wide config already covers the 1 node case, as you already did. I think it would be better to use the same convention, one config or 2 configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it will cause to some limitations, but may be it would be better if we can get rid of root ephemeral-volumes
directory and use the PVC as a different volume type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it will cause to some limitations, but may be it would be better if we can get rid of root ephemeral-volumes directory and use the PVC as a different volume type
Yes - we probably should.
I would much prefer the base config be the one I wrote. There is a difference in that my config has an option to scale automatically based on # of nodes and overrides is used for different test variations (defined by the KEP), whereas @mucahitkurt’s work operates similar to configs for load and density where overrides is used to define the size of the cluster.
I'm not trying to say you can't modify @mucahitkurt config - we definitely need to. And I'm completely fine with modify something there.
I'm also fine with replacing the one added by @mucahitkurt with this one (though I need to review it more carefully at that point). I just don't see any reason to have both at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With #688 merged, I think this file is no longer needed - just that one requires changes. Can you please update this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
b6f9857
to
202ef25
Compare
{{$PVC_SIZE := DefaultParam .PVC_SIZE "8Gi"}} | ||
{{$PER_PVC_BOUND_SLO := DefaultParam .PVC_BOUND_SLO 2}} | ||
{{$POD_STARTUP_SLO := DefaultParam .POD_STARTUP_SLO 300}} | ||
{{$OBJ_CREATION_QPS := DefaultParam .OBJ_CREATION_QPS 10}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OBJ_CREATION_QPS is a configurable variable that describes how fast to create API objects at. The default value of 10 QPS means to create it 10 queries per second. It is used to configure the UniformQPS tuning set (a thing that is responsible for creating k8s objects).
@msau42 @wojtek-t any thoughts on what the most appropriate tuning set to test with is? I am actually not sure if sequence of QPS means in a single namespace or across all namespaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it limits how many objects in total to create in one second, which includes PVC/PV, Pods etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes thats right
ac6bbab
to
e47a113
Compare
e47a113
to
67c668d
Compare
67c668d
to
9fb4f66
Compare
rebased and ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-level comment is that we don't want to expose to many knobs - it's just much harder to follow and if noone will be using them, it simply doesn't make sense - we can always extend the test.
{{$ENABLE_CHAOSMONKEY := DefaultParam .ENABLE_CHAOSMONKEY false}} | ||
{{$TEST_NAME := DefaultParam .TEST_NAME "generic-test"}} | ||
{{$GROUP := "pod-with-volumes-startup-latency"}} | ||
{{$TUNING_SET := DefaultParam .TUNING_SET "UniformQPS"}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this - I don't think we want to allow such significant configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most removed, I left test-name but I can remove that too.
{{$VOLUMES_PER_POD := .VOLUMES_PER_POD}} | ||
{{$VOLUME_TEMPLATE_PATH := .VOLUME_TEMPLATE_PATH}} | ||
{{$PROVISION_VOLUME := DefaultParam .PROVISION_VOLUME false}} | ||
{{$VOL_SIZE := DefaultParam .VOL_SIZE "8Gi"}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to expose an option for it? I vote for being opinionated and setting it just to 8G.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke with team, and we want to keep this know because different storage options have different min sizes (and will fail if you try to provision less). Also, different sizes have different provision times.
Zonal PV: min 5G
Regional PV: min 200 G
Filestore: min 1T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This discussion also brought up an interesting point. We are not measuring provisioning time because we said that is bootstrapping behavior that only happens the first time.
We also format the filesystem the first time we mount the volume on any node and that is something this test will currently measure during pod startup. Is it something we want to include? If not, then we would need to bring up pods twice and only measure the second time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point - I opened #706 to not block this PR on it.
{{$GROUP := "pod-with-ephemeral-volume-startup-latency"}} | ||
# SLO Variables | ||
{{$POD_STARTUP_SLO := DefaultParam .POD_STARTUP_SLO 300}} | ||
{{$PER_PVC_BOUND_SLO := DefaultParam .PVC_BOUND_SLO 2}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have any SLO for pvc binding - let's not call it that way.
PVC_BINDING_TIME maybe?
The other comment - do we really need it? Can't we hardcode it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the PV storage option, hard coding will break the tests. Filestore takes forever to provision.
{{$PARALLELISM_LIMITED := DefaultParam .PARALLELISM_LIMITED 1}} | ||
{{$ENABLE_CHAOSMONKEY := DefaultParam .ENABLE_CHAOSMONKEY false}} | ||
{{$TEST_NAME := DefaultParam .TEST_NAME "generic-test"}} | ||
{{$GROUP := "pod-with-volumes-startup-latency"}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not introduce unnecessary things.
Please use some shorter name group = volume seems good enough, and hardcode it in necessary places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
{{$appName := .AppName}} | ||
{{$volumesPerPod := .VolumesPerPod}} | ||
apiVersion: v1 | ||
kind: Pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same concern about using raw volumes instead of a controller is valid for this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you wait on that PR until this one is merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - let's have this one merged and then migrate to deployments all in one shot.
|
||
name: pod-with-ephemeral-volume-startup-latency | ||
|
||
name: pv-storage-{{$TEST_NAME}}-{{$VOLUME_TYPE}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since pv
doesn't cover ephemeral
, it might be better just starting with storage-
or volume-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, set to storage-
will squash later |
{{$WAIT_FOR_PVC := DefaultParam .WAIT_FOR_PVC false}} | ||
{{$POD_STARTUP_TIMEOUT := DefaultParam .POD_STARTUP_TIMEOUT "15m"}} | ||
{{$POD_STARTUP_SLO := DefaultParam .POD_STARTUP_SLO 300}} | ||
{{$PVC_BINDING_TIME := DefaultParam .PVC_BOUND_SLO 2}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to expose this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove this know for now, but we may need it later if we do other types of PD.
VolumesPerPod: {{$VOLUMES_PER_POD}} | ||
AppName: {{$APP_NAME}} | ||
AppName: Sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this AppName used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just use Name instead of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, this shouldnt even be here
{{$appName := .AppName}} | ||
{{$volumesPerPod := .VolumesPerPod}} | ||
apiVersion: v1 | ||
kind: Pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - let's have this one merged and then migrate to deployments all in one shot.
requests: | ||
storage: {{.VolSize}} | ||
persistentVolumeReclaimPolicy: Delete | ||
volumeBindingMode: Immediate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add end-of line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reclaim policy and binding mode are fields on the storageclass, not PVC.
Until we have a way to measure and subtract out provisioning time, a prerequisite of running this test using default storageclass is that the default storageclass needs to have binding mode = immediate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed those two lines and added an assumption to the top.
48eef14
to
69472bc
Compare
69472bc
to
166fffe
Compare
@wojtek-t I've squashed the commits because I think this is in a good place to merge now. Please give your final comments or approval. |
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hantaowang, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Experimental storage tests to test startup latency of booting pods with PVs. I tried staying as close to @masu's KEP as possible.
This PR depends on functionality from #656. I have it open so we can parallelize the review for both.
1_node_scale_by_pod
This test runs on 1 node, 100 pods per node, and 1 PV per pod. This stresses # of PVs per node and # of pods per node.
1_node_scale_by_pvcs_per_pod
This test runs on 1 node, 1 pod per node, and 100 PVs per pod. This stresses # of PVs per node and # of PVs per pod.
cluster_load_scale_by_nodes
This test runs on X nodes (X from 1 to 5000+). It has a fixed 10 pods per node and 1 PV per pod. This test simulates a realistic large cluster with a normal amount of PVs per node.
TODO (In future PR):
/assign @wojtek-t @davidz627
cc @msau42 @jingxu97 @verult