Can the CSI RBD plugin catch up in terms of performance to native CLI calls? #449

ShyamsundarR · 2019-06-27T19:07:14Z

As a part of the discussion in PR #443 it was noted by @dillaman that native CLI calls outperform the plugin by a large factor.

As part of the discussion I noted that the plugin does more work than just an image create, and that there would be kubernetes factors to consider.

As a result this issue is opened to help with the analysis of native calls versus calls made by the plugin to understand and maybe improve the performance of the plugin.

Various experiments may need to be conducted to arrive at an answer, and this issue can hopefully help track the progress.

ShyamsundarR · 2019-06-27T19:24:12Z

First test, to compare numbers generated using PR #443 and the script below, that generate exact nature and order of calls as the plugin, is provided in this comment.

Script to mimic what the plugin does as of commit hash 59d3365d3b13f37a9b914d59d931c71269a21eec on master: raw-rbd-perf.sh.txt

Note: Script maybe a bit rough but works, execute a create like so # time ./raw-rbd-perf.sh create 20 and follow with a delete like so # time ./raw-rbd-perf.sh delete 20, repeat for multiple runs.

Note: Following numbers were generated on the same setup as numbers generated in this comment. Further the script was executed from within the container csi-rbdplugin in the pod csi-rbdplugin-provisioner-0, to keep things as close to the plugin.

Script times:
Executed 20 creates using the script, Run1 real 0m32.030s Run2 real 0m38.861s
Executed 20 deletes using the script, Run1 real 0m29.651s Run2 real 0m28.819s

Further, modified the script to not run the create/delete in parallel and instead do a serial create/delete (just for a baseline), and following are the results from the same,
Executed 20 serial creates using the script, Run1 real 2m0.101s Run2 real 2m48.814s
Executed 20 serial deletes using the script, Run1 real 4m25.355s Run2 real 3m56.247s

Observations:

Delete using the plugin was taking an average of 32 seconds, and using the native CLI/script takes about 29 seconds, so we are very close to what we can achieve in delete
Create shows a gap, the plugin takes about 55 seconds for 20 PVC creates and the native CLI/script takes 35 seconds. This needs to be possibly further analyzed. First order of analysis here would possibly be average time per create by the plugin and per create by the script to remove any kubernetes overheads

dillaman · 2019-06-27T19:47:32Z

RBD deletes are highly dependent on the image size since it requires issuing RADOS deletes for every possible backing object (regardless of whether or not they actually exists). Therefore, 1 GiB image deletion could be an order of magnitude faster than a 10 GiB image.

Longer term, that is why I am proposing moving long-running operations to a new MGR call so that the CSI can "fire and forget".

In terms of creation, I think if you re-implemented your script using the rados/rbd Python bindings, you will see a huge drop in time. This just brings up back to the previous talks about eventually replacing all CLI calls w/ golang API binding calls. For larger clusters, just the bootstrap time required for each CLI call (connect to MON, exchange keys, pull maps, etc) can be the vast majority of runtime.

ShyamsundarR · 2019-07-24T18:28:59Z

With a go lang program ceph-golib.go.txt that uses go-ceph on a vagrant based kube+Rook(ceph) setup on a laptop the following measures were taken,

NOTE: timing is coarse as it is based on date output before and after executing the command/script
NOTE: serial operations were only performed (i.e create/delete one at a time) (as I am now going to add the go routine parallelism to this)

Test details: 3 Runs per test, each run is an iteration of 25 creates or deletes (IOW 25 RBD images created and deleted in the end, including its associated RADOS OMap updates)

go-ceph based times for 3 runs:
Example command: ceph-golib -key=<key> -monitors=<mons> -operation delete -count 25
Create: 5/5/5 (seconds)
Average: 5 seconds
Delete: 26/29/29 (seconds)
Average: 28 seconds

using the script as provided earlier invoking the Ceph CLIs
Example command: raw-rbd-perf.sh create 25 2>&1 > /dev/null
Create: 30/29/33 (seconds)
Average: 31 seconds
Delete: 57/59/56
Average: 57 seconds

ShyamsundarR · 2019-07-29T14:09:52Z

On the same setup as the above comment, here are some more times based on parallel invocation of creates and deletes from the golang ceph-golib.go version of the program.

NOTE: All tests are run for 2 iterations for 25 objects (PVCs/creates/deletes) and average per run is mentioned below,

PVC creates in parallel when using CSI drivers (that invoke the various CLIs):
34 seconds for 25 creates in parallel
44 seconds for 25 deletes in parallel

golib program in parallel:
Create: 25 images : 2 seconds
Delete: 25 images : 10 seconds

Script in parallel:
create: 25 images : 24 seconds
delete: 25 images : 37 seconds

NOTE: Further tests would be based on testing in a real cluster to understand contribution factor of PV create/delete/attach times, to really understand if improving this aspect would yield the best value for effort.

Madhu-1 · 2019-08-01T15:46:24Z

I have modified ceph-csi code to use ceph -go library

tested in kube, I was able to create and delete 100 rbd PVC in less 2 minutes
100 PVC creation took 58 seconds
100 PVC deletion took 60 seconds

Note: This is parallel PVC operation

@humblec @ShyamsundarR @dillaman FYI

ShyamsundarR · 2019-08-01T16:45:42Z

I have modified ceph-csi code to use ceph -go library

tested in kube, I was able to create and delete 100 rbd PVC in less 2 minutes
100 PVC creation took 58 seconds
100 PVC deletion took 60 seconds

Note: This is parallel PVC operation

@humblec @ShyamsundarR @dillaman FYI

@Madhu-1 is it an improvement over running it without the ceph-go library on the same setup? Currently, with the data above, I am not able to figure out if it is an improvement or the numbers are the same (or worse), as our setups would be different.

Madhu-1 · 2019-08-02T10:49:31Z

using cephcsi:canary image.
tested in Kube,
100 PVC creation took 60 seconds
100 PVC deletion took 62 seconds

I don't see any performance improvement, we need to do the same testing in other setups to check are we gaining any performance improvement for using the ceph-go library.

cirros-create-with-pvc-without-io (copy).txt
attaching the script used for testing. @nehaberry Thanks for providing the script.

ShyamsundarR · 2019-08-07T12:59:36Z

Update:
Went through the code changes used for ceph-go bindings in CSI and between @Madhu-1 and me revisited it to use a single connection and IOContext (sample code here). Briefly tested this on the setup used to get numbers in this comment, and volume create time reported is on average 18 seconds (down from 34 seconds prior to this) for 25 parallel creates.

Need to do some further bench marking on a non-laptop setup and also using gRPC to provide data on how this can help CSI operations when operating on parallel requests.

humblec · 2019-08-07T13:20:07Z

Update:
Went through the code changes used for ceph-go bindings in CSI and between @Madhu-1 and me revisited it to use a single connection and IOContext (sample code here). Briefly tested this on the setup used to get numbers in this comment, and volume create time reported is on average 18 seconds (down from 34 seconds prior to this) for 25 parallel creates.

Need to do some further bench marking on a non-laptop setup and also using gRPC to provide data on how this can help CSI operations when operating on parallel requests.

@Madhu-1 @ShyamsundarR if we are seeing performance improvement on parallel requests and also if we think the ceph go client is stable enough for atleast happy path or create and delete Volume , lets bring it to the repo. I am fine with the same.

ShyamsundarR · 2019-08-08T20:44:45Z

Need to do some further bench marking on a non-laptop setup and also using gRPC to provide data on how this can help CSI operations when operating on parallel requests.

Completed the test and here are the results:

Legend:

<key>: <value>
Operation: time in seconds per operation
NOTE: For node plugins operation count and time is collected from 3 nodes, and hence represented as total-time/#calls-on-node + ... + ... and average is the prior total divided by 3

1) Test results with existing code and added Prometheus gRPC metrics:

Actions in parallel: n = 25

NodePublishVolume: 1.27/10 + 0.76/7 + 0.73/8 = 0.11
NodeStageVolume: 18.59/10 + 25.20/7 + 26.43/8 = 2.9
NodeUnpublishVolume: 0.26/10 + 0.18/7 + 0.24/8 = 0.03
NodeUnstageVolume: 1.16/10 + 0.88/7 + 1.05/8 = 0.12
CreateVolume: 262.25/25 = 10.5
DeleteVolume: 424.11/25 = 17

Actions in parallel: n = 1
CreateVolume: 1.225128874

Actions in parallel: n = 10
CreateVolume: 53.89/10 = 5.4

NOTE: to test improvements CreateVolume was picked up as that shows an increasing latency as the number of parallel requests go up (same with DeleteVolume in terms of trend)

2) Test results using this code, that uses ceph-go bindings for just the happy path in `CreateVolume`

Actions in parallel: n = 25
CreateVolume: 31.60/25 = 1.26

So, with ceph-go bindings and when dealing with parallel requests, there is a big performance improvement on a per-RPC response time. Based on this we should pursue getting this change done to the code base as such.

@Madhu-1 you wanted to further understand how this impacts PVC creation time overall, and how it will benefit from the same and if throttling and other mechanisms will trip this up. Let me know how you want to proceed testing the same and any help required towards it.

Madhu-1 · 2019-08-12T04:07:20Z

Need to do some further bench marking on a non-laptop setup and also using gRPC to provide data on how this can help CSI operations when operating on parallel requests.

Completed the test and here are the results:

Legend:

<key>: <value>
Operation: time in seconds per operation
NOTE: For node plugins operation count and time is collected from 3 nodes, and hence represented as total-time/#calls-on-node + ... + ... and average is the prior total divided by 3

1) Test results with existing code and added Prometheus gRPC metrics:

Actions in parallel: n = 25

NodePublishVolume: 1.27/10 + 0.76/7 + 0.73/8 = 0.11
NodeStageVolume: 18.59/10 + 25.20/7 + 26.43/8 = 2.9
NodeUnpublishVolume: 0.26/10 + 0.18/7 + 0.24/8 = 0.03
NodeUnstageVolume: 1.16/10 + 0.88/7 + 1.05/8 = 0.12
CreateVolume: 262.25/25 = 10.5
DeleteVolume: 424.11/25 = 17

Actions in parallel: n = 1
CreateVolume: 1.225128874

Actions in parallel: n = 10
CreateVolume: 53.89/10 = 5.4

NOTE: to test improvements CreateVolume was picked up as that shows an increasing latency as the number of parallel requests go up (same with DeleteVolume in terms of trend)

2) Test results using this code, that uses ceph-go bindings for just the happy path in CreateVolume

could not test with the above-mentioed code, facing issue in PVC create

I0812 04:03:10.152532       1 utils.go:110] GRPC request: {"capacity_range":{"required_bytes":2147483648},"name":"pvc-a0ae0ee6-4ba0-4ecf-a04d-0922543271e7","parameters":{"clusterID":"abcd","imageFeatures":"layering","imageFormat":"2","pool":"replicapool"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["discard"]}},"access_mode":{"mode":1}}]}
I0812 04:03:10.156928       1 rbd_util.go:482] setting disableInUseChecks on rbd volume to: false
I0812 04:03:10.240855       1 cephcmds.go:158] GetOMapValueGo: error (rados: No such file or directory)
E0812 04:03:10.247148       1 utils.go:113] GRPC error: rpc error: code = Internal desc = rados: No such file or directory
I0812 04:03:10.973198       1 utils.go:109] GRPC call: /csi.v1.Controller/CreateVolume
I0812 04:03:10.973344       1 utils.go:110] GRPC request: {"capacity_range":{"required_bytes":2147483648},"name":"pvc-bd5d0796-741e-417d-9a0a-13b13182a30f","parameters":{"clusterID":"abcd","imageFeatures":"layering","imageFormat":"2","pool":"replicapool"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["discard"]}},"access_mode":{"mode":1}}]}
I0812 04:03:10.987897       1 rbd_util.go:482] setting disableInUseChecks on rbd volume to: false
I0812 04:03:11.173766       1 cephcmds.go:158] GetOMapValueGo: error (rados: No such file or directory)
E0812 04:03:11.178812       1 utils.go:113] GRPC error: rpc error: code = Internal desc = rados: No such file or directory

Actions in parallel: n = 25
CreateVolume: 31.60/25 = 1.26

So, with ceph-go bindings and when dealing with parallel requests, there is a big performance improvement on a per-RPC response time. Based on this we should pursue getting this change done to the code base as such.

@Madhu-1 you wanted to further understand how this impacts PVC creation time overall, and how it will benefit from the same and if throttling and other mechanisms will trip this up. Let me know how you want to proceed testing the same and any help required towards it.

Madhu-1 · 2019-08-12T05:04:43Z

@ShyamsundarR I have created a new image as per our discussion on reusing the connection, I don't see any major overall performance improvement

on 100 PVC creation, it took around 57 seconds to bound all PVC's

ShyamsundarR · 2019-08-12T11:18:19Z

could not test with the above-mentioed code, facing issue in PVC create

The initial csi.volumes.default OMap needs to be created by hand, as error detection there is not added, once done the existing code works as expected.

@ShyamsundarR I have created a new image as per our discussion on reusing the connection, I don't see any major overall performance improvement
on 100 PVC creation, it took around 57 seconds to bound all PVC's

Can you please share the code and the test? Also did you measure gRPC metrics for the same, pre and post change, tests?

mykaul · 2019-09-03T08:39:19Z

@Madhu-1 , @ShyamsundarR - any updates?

ShyamsundarR · 2019-09-03T12:54:49Z

@mykaul It was decided to pick up this work post 1.2.0 release for the following reasons,

go-ceph does not yet have a stable release branch for use
There is at least one feature still needed in go-ceph
Finally, we need to settle on the gains when operations are performed in parallel as per commentary in this PR

@nixpanic was interested in taking this forward, he may have further comments to provide.

humblec · 2019-10-04T14:44:58Z

@nixpanic assigning this task to you as per the discussion. This is a very high priority item for v2.0.0 , so we need immediate attention and priority here. Please let me know if you need any help on this.

humblec · 2019-10-04T14:45:31Z

#557

humblec · 2019-11-25T03:40:25Z

@nixpanic can you please update this issue with the PR or WIP patch ?

nixpanic · 2019-12-03T14:01:21Z

Add support for rbd_image_options_t go-ceph#111 (currently under review) adds support for CreateWithOptions() so that createVolume() can be implemented with the package
https://github.com/nixpanic/go-ceph/tree/rbd/watchers adds support for replacing rbdStatus() so that at least the straight forward (non background task) deletion can be implemented

humblec · 2019-12-12T11:48:43Z

@nixpanic as the PR (ceph/go-ceph#111) is merged now, may be we could make some more progress here :).. Thanks !

nixpanic · 2020-02-17T16:55:13Z

Older version of Ceph's librbd do not support fetching the watchers on an image. That has been introduced with Mimic (not Nautilus as mentioned in earlier comments). The rbd command in Luminous can get the watchers, but it uses some internal Ceph functions that are not exposed in public libraries.

So, the question is: does Ceph-CSI want to keep support for older Ceph versions (Luminous), or can we move on to a newer version as minimal dependency?

There is a way to implement a fallback in ceph-csi, for the missing functionality in libraries, it can still call the rbd executable instead.

This is the initial step for improving performance during provisioning of CSI volumes backed by RBD. While creating a volume, an existing connection to the Ceph cluster is used from the ConnPool. This should speed up the creation of a batch of volumes significantly. Updates: ceph#449 Signed-off-by: Niels de Vos <[email protected]>

mykaul · 2020-02-25T11:58:13Z

Older version of Ceph's librbd do not support fetching the watchers on an image. That has been introduced with Mimic (not Nautilus as mentioned in earlier comments). The rbd command in Luminous can get the watchers, but it uses some internal Ceph functions that are not exposed in public libraries.

So, the question is: does Ceph-CSI want to keep support for older Ceph versions (Luminous), or can we move on to a newer version as minimal dependency?

There is a way to implement a fallback in ceph-csi, for the missing functionality in libraries, it can still call the rbd executable instead.

I think we should move on.

Madhu-1 · 2020-02-25T12:02:14Z

@nixpanic as discussed we want to support Mimic+ version in ceph-csi, ceph-csi support matrix is here https://github.com/ceph/ceph-csi#ceph-csi-features-and-available-versions

This is the initial step for improving performance during provisioning of CSI volumes backed by RBD. While creating a volume, an existing connection to the Ceph cluster is used from the ConnPool. This should speed up the creation of a batch of volumes significantly. Updates: ceph#449 Signed-off-by: Niels de Vos <[email protected]>

This is the initial step for improving performance during provisioning of CSI volumes backed by RBD. While creating a volume, an existing connection to the Ceph cluster is used from the ConnPool. This should speed up the creation of a batch of volumes significantly. Updates: #449 Signed-off-by: Niels de Vos <[email protected]>

nixpanic · 2020-04-08T07:24:09Z

There are many things that needed to be done for this Issue. I have created https://github.com/ceph/ceph-csi/projects/3 so that tracking the dependencies is a little easier. Don't hesitate to add more issues/PRs/cards to the project.

stale · 2020-10-04T06:17:17Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

stale · 2020-10-12T09:33:45Z

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

dillaman mentioned this issue Jun 28, 2019

Build a cache of pool and FS information, for faster lookup, than calling Ceph commands each time #433

Closed

ShyamsundarR mentioned this issue Jul 15, 2019

Use correct file descriptor to parse errors #473

Merged

humblec added release-2.0.0 v2.0.0 release Priority-0 highest priority issue enhancement New feature or request labels Oct 4, 2019

humblec mentioned this issue Oct 9, 2019

Release v2.0.0. of Ceph CSI driver. #557

Closed

12 tasks

nixpanic self-assigned this Oct 9, 2019

nixpanic mentioned this issue Dec 4, 2019

[RFC] rewrite RBD provisioning to use go-ceph instead of rbd command #729

Closed

nixpanic mentioned this issue Apr 8, 2020

resource usage with forks #921

Closed

nixpanic removed the Release-2.1.0 label Apr 8, 2020

nixpanic added the component/rbd Issues related to RBD label Apr 17, 2020

stale bot added the wontfix This will not be worked on label Oct 4, 2020

stale bot closed this as completed Oct 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can the CSI RBD plugin catch up in terms of performance to native CLI calls? #449

Can the CSI RBD plugin catch up in terms of performance to native CLI calls? #449

ShyamsundarR commented Jun 27, 2019

ShyamsundarR commented Jun 27, 2019

dillaman commented Jun 27, 2019

ShyamsundarR commented Jul 24, 2019

ShyamsundarR commented Jul 29, 2019

Madhu-1 commented Aug 1, 2019

ShyamsundarR commented Aug 1, 2019

Madhu-1 commented Aug 2, 2019

ShyamsundarR commented Aug 7, 2019

humblec commented Aug 7, 2019

ShyamsundarR commented Aug 8, 2019 •

edited

Loading

Madhu-1 commented Aug 12, 2019

Legend:

1) Test results with existing code and added Prometheus gRPC metrics:

2) Test results using this code, that uses ceph-go bindings for just the happy path in `CreateVolume`

Madhu-1 commented Aug 12, 2019

ShyamsundarR commented Aug 12, 2019

mykaul commented Sep 3, 2019

ShyamsundarR commented Sep 3, 2019

humblec commented Oct 4, 2019 •

edited

Loading

humblec commented Oct 4, 2019

humblec commented Nov 25, 2019

nixpanic commented Dec 3, 2019

humblec commented Dec 12, 2019

nixpanic commented Feb 17, 2020

mykaul commented Feb 25, 2020

Madhu-1 commented Feb 25, 2020

nixpanic commented Apr 8, 2020

stale bot commented Oct 4, 2020

stale bot commented Oct 12, 2020

Can the CSI RBD plugin catch up in terms of performance to native CLI calls? #449

Can the CSI RBD plugin catch up in terms of performance to native CLI calls? #449

Comments

ShyamsundarR commented Jun 27, 2019

ShyamsundarR commented Jun 27, 2019

dillaman commented Jun 27, 2019

ShyamsundarR commented Jul 24, 2019

ShyamsundarR commented Jul 29, 2019

Madhu-1 commented Aug 1, 2019

ShyamsundarR commented Aug 1, 2019

Madhu-1 commented Aug 2, 2019

ShyamsundarR commented Aug 7, 2019

humblec commented Aug 7, 2019

ShyamsundarR commented Aug 8, 2019 • edited Loading

Legend:

1) Test results with existing code and added Prometheus gRPC metrics:

2) Test results using this code, that uses ceph-go bindings for just the happy path in CreateVolume

Madhu-1 commented Aug 12, 2019

Legend:

1) Test results with existing code and added Prometheus gRPC metrics:

2) Test results using this code, that uses ceph-go bindings for just the happy path in CreateVolume

Madhu-1 commented Aug 12, 2019

ShyamsundarR commented Aug 12, 2019

mykaul commented Sep 3, 2019

ShyamsundarR commented Sep 3, 2019

humblec commented Oct 4, 2019 • edited Loading

humblec commented Oct 4, 2019

humblec commented Nov 25, 2019

nixpanic commented Dec 3, 2019

humblec commented Dec 12, 2019

nixpanic commented Feb 17, 2020

mykaul commented Feb 25, 2020

Madhu-1 commented Feb 25, 2020

nixpanic commented Apr 8, 2020

stale bot commented Oct 4, 2020

stale bot commented Oct 12, 2020

ShyamsundarR commented Aug 8, 2019 •

edited

Loading

2) Test results using this code, that uses ceph-go bindings for just the happy path in `CreateVolume`

2) Test results using this code, that uses ceph-go bindings for just the happy path in `CreateVolume`

humblec commented Oct 4, 2019 •

edited

Loading