Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage perf tests decrease volumes and pod number #686

Conversation

mucahitkurt
Copy link
Contributor

@mucahitkurt mucahitkurt commented Jul 24, 2019

We want to run our experimental tests with the new volumes and pod numbers;

  • 10 volumes per pod, for the max volumes per pod test case,
  • 25 pods per node, for the max volumes per node test case

and compare with the current test cases' pod_startup time results.

Current pod_startup time for p90 changes between 70 seconds-90 seconds for max volumes per node case.

Current pod_startup time for p90 changes between 22 seconds-24 seconds for configMap and secret, 2.5-3.75 seconds for emptyDir and Downward API for max volumes per pod case.

/sig storage
cc @wojtek-t
cc @msau42

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 24, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @mucahitkurt. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 24, 2019
@k8s-ci-robot k8s-ci-robot requested review from krzysied and mborsz July 24, 2019 21:34
@msau42
Copy link
Member

msau42 commented Jul 24, 2019

/ok-to-test

IIUC:

  • 1 volume per pod, 100 pods takes 70-90 seconds avg to startup?
  • 1 pod with 100 volumes takes 20-30 seconds to startup?

I'm very surprised that the 2nd case is faster than the multiple pods case.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 24, 2019
@msau42
Copy link
Member

msau42 commented Jul 24, 2019

I just looked at the results at http://perf-dash.k8s.io/#/?jobname=storage-max-emptydir-vol-per-pod&metriccategoryname=E2E&metricname=PodStartup&Metric=pod_startup

So those numbers above are for configmap and secret volume types.

Downward API and emptydir are closer to:

  • 70-80 seconds for 100 pods with 1 volume each
  • 2-3 seconds for 1 pod with 100 volumes

@@ -1,2 +1,2 @@
TOTAL_PODS: 100
TOTAL_PODS: 25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have different configs for different number of volumes/pods, so we can compare side by side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it would be better, it needs additional override files and job definitions. I can do it if it won't cause any side effect for the test infra, cc @wojtek-t

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total pods =100 actually I think should stay because it's roughly how many pods per node we support.
This test case is fairly useful in my opinion.

If we want to test 25 too, this should be a separate config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I mainly want to test lower numbers for debugging purposes and find the threshhold where pod startup latency starts to fall off the cliff.

@mucahitkurt
Copy link
Contributor Author

mucahitkurt commented Jul 25, 2019

I just looked at the results at http://perf-dash.k8s.io/#/?jobname=storage-max-emptydir-vol-per-pod&metriccategoryname=E2E&metricname=PodStartup&Metric=pod_startup

So those numbers above are for configmap and secret volume types.

Downward API and emptydir are closer to:

  • 70-80 seconds for 100 pods with 1 volume each
  • 2-3 seconds for 1 pod with 100 volumes

@msau42 Sorry I read it wrong for the downward API and emptyDir, they are under 5 seconds for the 100 volumes per pod case, and tests are green, I edited the description.

@@ -1,2 +1,2 @@
TOTAL_PODS: 100
TOTAL_PODS: 25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total pods =100 actually I think should stay because it's roughly how many pods per node we support.
This test case is fairly useful in my opinion.

If we want to test 25 too, this should be a separate config.

@@ -1,2 +1,2 @@
TOTAL_PODS: 1
VOLUMES_PER_POD: 100
VOLUMES_PER_POD: 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinion about this one. I would like to avoid having donzens of jobs and growing this matrix of scenarios, but I don't know if we should have 10 or 100 here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42 - WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realistically I don't think a single pod will have 100 volumes. 5-10 seems more realistic, but the initial purpose of this experiment is to find our current scaling thresholds.

I think emptydir and downward API volumes performance is acceptable. Unsure about secrets and configmaps. Again I want to find the threshold where performance starts to fall off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think emptydir and downward API volumes performance is acceptable. Unsure about secrets and configmaps. Again I want to find the threshold where performance starts to fall off.

I think at least part of the problem is in QPS limits - we need to GET each secret/configmap from apiserer before we even start the pod. So with 5qps limit, 100 volumes just means 20s at start.

I think that what we should seriously consider (at least as an experiment) is to bump it to something large (say 100+) and see what will happen. I would even say that it's probably first thing to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it meaningful to add the wait time for configmap and secret volumes be available to pod startup time? Should we add some barrier to wait they are available before starting to pod creation?

@msau42
Copy link
Member

msau42 commented Jul 25, 2019

@mucahitkurt this initial data is very useful! As we start to change/add new thresholds, can we keep a spreadsheet that summarizes the results?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 25, 2019
@wojtek-t
Copy link
Member

@mucahitkurt this initial data is very useful! As we start to change/add new thresholds, can we keep a spreadsheet that summarizes the results?

+100
Not sure what the best format is, it can even be some .md file somewhere (maybe even in this repo) - I guess I can live with anything.
But before we make this change, we definitely need to have this data somewhere.

@mucahitkurt
Copy link
Contributor Author

@mucahitkurt this initial data is very useful! As we start to change/add new thresholds, can we keep a spreadsheet that summarizes the results?

+100
Not sure what the best format is, it can even be some .md file somewhere (maybe even in this repo) - I guess I can live with anything.
But before we make this change, we definitely need to have this data somewhere.

I'm looking into how I can do this

@mucahitkurt mucahitkurt force-pushed the storaga/scalability/decrease-pod-vol-number-ephemeral-vol-tests branch from 2b06b07 to 672bfd7 Compare July 26, 2019 23:35
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 26, 2019
@mucahitkurt
Copy link
Contributor Author

@wojtek-t , @msau42

For test results data, I wrote a script like this and fetched the test results for ephemeral volumes podstartup latencies. I created a json file for the test results and added it to the PR, I thought that we can use it easily for further analysis, it's size is 1MB, I'm not sure whether this's a problem or not.

I created a different override file for the PODS_PER_NODE: 25 case.

I'm not sure about the final decision about VOLUMES_PER_POD: 10 case, should we add a different override file, or changing existing one, or should we test 100+ volumes case first, as @wojtek-t mentioned?

@mucahitkurt mucahitkurt force-pushed the storaga/scalability/decrease-pod-vol-number-ephemeral-vol-tests branch from 672bfd7 to fd6d4fd Compare July 31, 2019 20:38
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 31, 2019
number-of-node: 1
volume-types:
volume-type: configmap
before-qps-limit-bump: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of "before-qps-limit-bump" I would add specific value for qps-limit (default is 5, I bumped to 100)

The reason is that we may want to experiment with other values too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 or 10?

perc50: 22240.349315
perc90: 22240.349315
perc99: 22240.349315
test-artifacts: https://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/ci-kubernetes-storage-scalability-max-configmap-vol-per-pod/1156085407868260353/artifacts/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is tricky - because this dir (depending on the time) contains results from before-qps-bump, after it (and once new changes appear additional configurations).

So technically it would be more correct to move it to the level of "volume-type".
[So what i have on my mind is the structure:
volume-types:

- volume type: configmap
  test-artifacts: ...
  results:
    param1: value1 (e.g. qps)
    param2: value2
    pod-startup:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think test artifacts link points to all test run artifacts of the volume type? If so, actually I take a specific test run for the percentiles, because it seems that most of them have almost similar results, and the artifact link points to that build, e.g. a specific test run.

So if it's not correct taking one test run instead of an average of many, can it be like below?

- volume type: configmap
  results:
    test-artifacts: ...
      param1: value1 (e.g. qps)
      param2: value2
      pod-startup:

Copy link
Contributor Author

@mucahitkurt mucahitkurt Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got it wrong, I'll take the volume type as root and add different test scenario results under it, like below;

od-with-volumes-startup-latency-scenarios:
  volume-type: configmap
    test-artifacts: https://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/ci-kubernetes-storage-scalability-max-configmap-vol-per-pod/
      results:
        scenario-1:
          number-of-volumes-per-pod: 100
          number-of-pods-per-node: 1
          number-of-node: 1
          qps-limit: 5
          pod-startup:
            perc50: 22240.349315
            perc90: 22240.349315
            perc99: 22240.349315
        scenario-2:
          number-of-volumes-per-pod: 100
          number-of-pods-per-node: 1
          number-of-node: 1
          qps-limit: 10
          pod-startup:
            perc50: 2205.065107
            perc90: 2205.065107
            perc99: 2205.065107

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - that's exactly what I had on my mind - sorry for not being clear

@mucahitkurt mucahitkurt force-pushed the storaga/scalability/decrease-pod-vol-number-ephemeral-vol-tests branch from fd6d4fd to 753a6a6 Compare August 1, 2019 20:46
number-of-volumes-per-pod: 100
number-of-pods-per-node: 1
number-of-node: 1
qps-limit: 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100

Let me clarify, because in fact I was changing two different things separately:

These two things are mixed together in this file.

@@ -0,0 +1,170 @@
pod-with-volumes-startup-latency-scenarios:
volume-type: configmap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this a valid yaml?

I think that for slices/arrays, you have to have "-" in yaml. I mean:

  • volume-type

[not that this is particularly important]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's not valid, I try to create a valid one, thanks for heads up!

@mucahitkurt mucahitkurt force-pushed the storaga/scalability/decrease-pod-vol-number-ephemeral-vol-tests branch from 753a6a6 to 9754eef Compare August 2, 2019 19:13
- number-of-volumes-per-pod: 1
number-of-pods-per-node: 100
number-of-node: 1
kube-api-qps-limit: 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also switched from generating pods ~100pods/s to generating pods 10pods/s.

I think this should also be catched in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, I added that info as qps-pod-throughput

@mucahitkurt mucahitkurt force-pushed the storaga/scalability/decrease-pod-vol-number-ephemeral-vol-tests branch from 9754eef to 34aa066 Compare August 5, 2019 19:01
@@ -0,0 +1,3 @@
PODS_PER_NODE: 25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@wojtek-t
Copy link
Member

wojtek-t commented Aug 6, 2019

one last comment - lgtm once applied

thanks!

@mucahitkurt mucahitkurt force-pushed the storaga/scalability/decrease-pod-vol-number-ephemeral-vol-tests branch from 34aa066 to 8f4d242 Compare August 6, 2019 14:56
@mucahitkurt
Copy link
Contributor Author

@wojtek-t Should I wait for test-suite implementation to add this test case to periodic jobs? Otherwise I should define additional 4 jobs(I think PVC side is not activated yet).

@wojtek-t
Copy link
Member

wojtek-t commented Aug 6, 2019

@wojtek-t Should I wait for test-suite implementation to add this test case to periodic jobs? Otherwise I should define additional 4 jobs(I think PVC side is not activated yet).

Let's wait - I think we're pretty close.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 6, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mucahitkurt, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 6, 2019
@k8s-ci-robot k8s-ci-robot merged commit 86e6632 into kubernetes:master Aug 6, 2019
@mucahitkurt mucahitkurt deleted the storaga/scalability/decrease-pod-vol-number-ephemeral-vol-tests branch August 7, 2019 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants