-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd: add volume locks for reclaimspace operations #4641
rbd: add volume locks for reclaimspace operations #4641
Conversation
0d019f8
to
88f490b
Compare
/test ci/centos/mini-e2e/k8s-1.28 |
88f490b
to
993a7c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use already created controller and node servers ?
internal/rbd/driver/driver.go
Outdated
@@ -213,7 +210,7 @@ func (r *Driver) setupCSIAddonsServer(conf *util.Config) error { | |||
r.cas.RegisterService(is) | |||
|
|||
if conf.IsControllerServer { | |||
rs := casrbd.NewReclaimSpaceControllerServer() | |||
rs := casrbd.NewReclaimSpaceControllerServer(NewControllerServer(r.cd)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rs := casrbd.NewReclaimSpaceControllerServer(NewControllerServer(r.cd)) | |
rs := casrbd.NewReclaimSpaceControllerServer(r.cs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should use the same already created Controller and Node Server struct, as they will share the same sets of locks. Since the ReplicationServer uses a new ControllerServer struct, I decided to follow the same for ReclaimSpace ControllerServer/NodeServer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot about the sharing locks with cephcsi controllerserver and nodeserver.
Let's just add new volume locks for both reclaimspace node&controller server in this pr and open a new issue to discuss whether we should share it with the default servers or not.
Both sparsify and fstrim are blocking operation, that's why we may need further discussion.
cc @Madhu-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets discuss on this one and we need to see can what will happen when we try to do nodeserver operations like (expand/umount) etc when the fstrim is going on can it leads to corruption etc. based on that we can have central/shared locking. Is it allowed to expand or remove the image when the ControllerReclaimOperation is going on etc.
993a7c2
to
5fd00ce
Compare
internal/rbd/driver/driver.go
Outdated
@@ -224,7 +224,9 @@ func (r *Driver) setupCSIAddonsServer(conf *util.Config) error { | |||
} | |||
|
|||
if conf.IsNodeServer { | |||
rs := casrbd.NewReclaimSpaceNodeServer() | |||
rs := casrbd.NewReclaimSpaceNodeServer(&rbd.NodeServer{ | |||
VolumeLocks: util.NewVolumeLocks(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here as well can you please check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might need to pass NewNodeServer(r.cd)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewNodeServer()
needs other params to be passed. So instead created NodeServer struct with volumeLocks
ceph-csi/internal/rbd/driver/driver.go
Lines 69 to 81 in 9574fa8
func NewNodeServer( | |
d *csicommon.CSIDriver, | |
t string, | |
nodeLabels, topology, crushLocationMap map[string]string, | |
) (*rbd.NodeServer, error) { | |
cliReadAffinityMapOptions := util.ConstructReadAffinityMapOption(crushLocationMap) | |
ns := rbd.NodeServer{ | |
DefaultNodeServer: csicommon.NewDefaultNodeServer(d, t, cliReadAffinityMapOptions, topology, nodeLabels), | |
VolumeLocks: util.NewVolumeLocks(), | |
} | |
return &ns, nil | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. the aim looks to be new locks instead of sharing as per #4641 (comment), lets do this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please test this change manually and share the result to ensure nothing breaks as we don't have any E2E tests for this one.
Created a PVC of 1Gi and wrote dummy data of 500M and deleted it.
before ReclaimSpace operation -
created ReclaimSpaceJob for the PVC
/NodeReclaimSpace -
/ControllerReclaimSpace -
@Madhu-1, will this be enough for the test? |
/test ci/centos/k8s-e2e-external-storage/1.28 |
/test ci/centos/k8s-e2e-external-storage/1.29 |
/test ci/centos/k8s-e2e-external-storage/1.28 |
1 similar comment
/test ci/centos/k8s-e2e-external-storage/1.28 |
/test ci/centos/k8s-e2e-external-storage/1.29 |
/test ci/centos/k8s-e2e-external-storage/1.30 |
1 similar comment
/test ci/centos/k8s-e2e-external-storage/1.30 |
/test ci/centos/k8s-e2e-external-storage/1.27 |
/test ci/centos/upgrade-tests-cephfs |
/test ci/centos/mini-e2e-helm/k8s-1.27 |
/test ci/centos/k8s-e2e-external-storage/1.28 |
/test ci/centos/k8s-e2e-external-storage/1.29 |
/test ci/centos/mini-e2e-helm/k8s-1.28 |
/test ci/centos/mini-e2e/k8s-1.28 |
/test ci/centos/k8s-e2e-external-storage/1.30 |
/test ci/centos/mini-e2e-helm/k8s-1.30 |
/test ci/centos/mini-e2e/k8s-1.30 |
@Mergifyio requeue multi-arch-build failed to download some Ceph packages |
✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically |
This commit adds locks on reclaimspace operations to prevent multiple process executing rbd sparsify/fstrim on same volume. Signed-off-by: Praveen M <[email protected]>
Signed-off-by: Praveen M <[email protected]>
fbd2de5
to
960e6c8
Compare
/test ci/centos/upgrade-tests-cephfs |
/test ci/centos/upgrade-tests-rbd |
/test ci/centos/k8s-e2e-external-storage/1.30 |
/test ci/centos/k8s-e2e-external-storage/1.28 |
/test ci/centos/mini-e2e-helm/k8s-1.30 |
/test ci/centos/k8s-e2e-external-storage/1.27 |
/test ci/centos/mini-e2e-helm/k8s-1.28 |
/test ci/centos/mini-e2e/k8s-1.30 |
/test ci/centos/k8s-e2e-external-storage/1.29 |
/test ci/centos/mini-e2e/k8s-1.28 |
/test ci/centos/mini-e2e-helm/k8s-1.27 |
/test ci/centos/mini-e2e-helm/k8s-1.29 |
/test ci/centos/mini-e2e/k8s-1.27 |
/test ci/centos/mini-e2e/k8s-1.29 |
@Mergifyio requeue |
✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically |
Describe what this PR does
This commit adds VolumeLocks on reclaimspace operations to
prevent multiple process executing rbd sparsify/fstrim on
same volume.
Related issues
Closes: #4637, #4648
Checklist:
Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>
: retest the<job-name>
after unrelatedfailure (please report the failure too!)