Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracker issue for intree rbd to ceph csi online migration #2509

Closed
12 of 14 tasks
humblec opened this issue Sep 15, 2021 · 8 comments
Closed
12 of 14 tasks

Tracker issue for intree rbd to ceph csi online migration #2509

humblec opened this issue Sep 15, 2021 · 8 comments
Assignees
Labels
dependency/k8s depends on Kubernetes features wontfix This will not be worked on
Milestone

Comments

@humblec
Copy link
Collaborator

humblec commented Sep 15, 2021

Describe the bug

This issue track the few changes required to support in tree rbd migration to ceph csi ( Project # https://github.com/ceph/ceph-csi/projects/18) , this allows an in tree rbd user to do online migration without changing any existing templates. This feature is tracked in kubernetes for v1.23 as alpha under kubernetes/enhancements#2963.

Demo : part 1 shows PVCs created before enabling migration and part2 shows PVCs created after migration.
https://drive.google.com/file/d/1yfcngCEyrOPDG5tOg7gva0wVJqNELRGc/view

As discussed in CSI calls, currently identified changes are:

  • Get clusterid from mon #util: get clusterID for the passed in mon string #2512
  • considering the clusterID field is a required one for csi , but intree SC has monitors field as a required thing we have to figure out the clusterID from the monitors passed in the request. The node stage and unstage operations should work without any issues as those will be tracked as static volumes for the csi driver.
  • Use a volume handle passed in and use it for operations on existing volumes. #rbd: detect migrated volhandle in DeleteVolume and delete rbd image #2523
    The volume handle would include the migration version, mon, pool and image which could be used for operations like delete. Once we have these information available with us its well possible to delete the existing in tree PVs. The volume handle can
    The current format of the volume handle looks like this:
mig_mons-<monitors-hash>_image-<imagename>_<pool-hash>

Admin supposed to create a clusterID based on the monitors hash ( md5sum) in the csi config map and keep the monitors under this configuration before enabling the migration. While CSI driver receive the volume handle it will look at the configmap and figure out the mons to do the operations.

The initial plan was to ask the admin to adjust the secret as a pre-upgrade step for migration which avoid this change in the CSI driver, however taking out admin burden is the main ask from kubernetes community so this change is introduced.
"key" field value will be picked up from the migraion secret to "UserKey" field. "adminId" field value will be picked up from the migration secret to "UserID" field. if adminId field is nil or not set, UserID field will be filled with default value ie admin.The above logic get activated only when the secret is a migration secret, otherwise skipped to the normal workflow as we have today.

ToDo:

Release 3.6 Items:

  • Validate or support inline volumes
  • CephFS online migration
@humblec humblec added the dependency/k8s depends on Kubernetes features label Sep 15, 2021
@humblec humblec self-assigned this Sep 15, 2021
@humblec humblec added this to the release-3.5.0 milestone Sep 15, 2021
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Sep 15, 2021

@humblec any specific reason not to use/suggest the CSI migration tool which is getting developed by @Yuggupta27 and @subhamkrai?

get clusterid from mon
use a volume handle passed in and support static volume mounting, unmounting, and delete operations on existing volumes.

IMO This is an extra burden to the cephcsi supportability point of view

can you please provide more details with examples of why we are taking this approach?

@humblec
Copy link
Collaborator Author

humblec commented Sep 15, 2021

@humblec any specific reason not to use/suggest the CSI migration tool which is getting developed by @Yuggupta27 and @subhamkrai?

@Madhu-1 For offline migration ( taking down the app and untangling from pvc and attaching old pv after renaming to new pvc) , the tool can also be used, as mentioned in the issue, this is an online migration for existing in tree users, the users will be capable to continue using the templates as they do today and everything will be taken care by the kubernetes, they dont want to change any existing templates like SC, PVC...etc . They will be unaware that, the migration is happening ...etc, thats why the migration framework in kubernetes (kubernetes/kubernetes#95361) becomes handy. Most of the kube intree drivers have been migrated or more are in migration path.

At any point they want to do offline migration , they could make use of the tool as well.. so users are left with good amount of choices and have options for both online and offline migration and make use of Ceph CSI. !!

get clusterid from mon
use a volume handle passed in and support static volume mounting, unmounting, and delete operations on existing volumes.

IMO This is an extra burden to the cephcsi supportability point of view

I dont think so, these helpers will be ONLY called for migrated volumes and while csi driver process those , that too with very limited usage like mount , unmount ..etc. Its is same as we currently support static volumes today. None of the existing functionality would be broken with this change , so it shouldnt be a burden. Also snapshot , cloning ..etc are not supported on these static kind of volumes.

can you please provide more details with examples of why we are taking this approach?

The intree driver has monitor field available with it, so our configmap can be read to get the clusterid if its a migration request. Almost like we do read clusterid mapping for replication cases...etc.
Once the clusterID is available , rest of the information is available for continuing to support it as static volume.
the volume handle would be same as our static volume case but just got a couple of strings append to it which carry the monitor and pool in it , very bare minimum stuff without any fancy operations on it.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Sep 15, 2021

I dont think so, these helpers will be ONLY called for migrated volumes and while csi driver process those , that too with very limited usage like mount , unmount ..etc. Its is same as we currently support static volumes today. None of the existing functionality would be broken with this change , so it shouldnt be a burden. Also snapshot , cloning ..etc are not supported on these static kind of volumes.

yes if we go for offline tools we can support all these features.

I dont think so, these helpers will be ONLY called for migrated volumes and while csi driver process those , that too with very limited usage like mount , unmount ..etc. Its is same as we currently support static volumes today. None of the existing functionality would be broken with this change , so it shouldnt be a burden. Also snapshot , cloning ..etc are not supported on these static kind of volumes.

we don't support DeleteVolume for the static PVC but you have added it to the list. as I already mentioned in #1725 (comment) not sure it's worth supporting this one with extra effort.

If we really need/want to support this one please make sure we support only the ceph version mentioned in the readme

@humblec
Copy link
Collaborator Author

humblec commented Sep 15, 2021

I dont think so, these helpers will be ONLY called for migrated volumes and while csi driver process those , that too with very limited usage like mount , unmount ..etc. Its is same as we currently support static volumes today. None of the existing functionality would be broken with this change , so it shouldnt be a burden. Also snapshot , cloning ..etc are not supported on these static kind of volumes.

yes if we go for offline tools we can support all these features.

Yep, iow, at any point in time ( even after online migration has been enabled) , if users want to go with offline migration (now or later or even only for few volumes) , they can take that route based on the production downtime and feasibility of making other changes for the consumers of current volumes and its templates.

Online can be an intermediate path as well for many deployments.

I dont think so, these helpers will be ONLY called for migrated volumes and while csi driver process those , that too with very limited usage like mount , unmount ..etc. Its is same as we currently support static volumes today. None of the existing functionality would be broken with this change , so it shouldnt be a burden. Also snapshot , cloning ..etc are not supported on these static kind of volumes.

we don't support DeleteVolume for the static PVC but you have added it to the list. as I already mentioned in #1725 (comment) not sure it's worth supporting this one with extra effort.

As this delete will be only for "migration volumes" it will be a very shim helper function and can be done with it.

If we really need/want to support this one please make sure we support only the ceph version mentioned in the readme

Sure @Madhu-1 , thats the plan 👍

humblec added a commit to humblec/ceph-csi that referenced this issue Sep 16, 2021
as part of migration support, the clusterID has to be fetched
from passed in mon. This adds a helper function to retrieve
clusterID from passed in mon string.

Updates ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Sep 16, 2021
as part of migration support, the clusterID has to be fetched
from passed in mon. Because the intree RBD storage class only
got monitor and not `clusterID` parameter support. However, in
CSI, SC has the `clusterID` parameter support but not mon. Due
to that we have to fetch the clusterID from config file for the
passed in mon and use it in our operations. This adds a helper
function to retrieve clusterID from passed in mon string.

Updates ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Sep 16, 2021
as part of migration support, the clusterID has to be fetched
from passed in mon. Because the intree RBD storage class only
got monitor and not `clusterID` parameter support. However, in
CSI, SC has the `clusterID` parameter support but not mon. Due
to that we have to fetch the clusterID from config file for the
passed in mon and use it in our operations. This adds a helper
function to retrieve clusterID from passed in mon string.

Updates ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Sep 16, 2021
as part of migration support, the clusterID has to be fetched
from passed in mon. Because the intree RBD storage class only
got monitor and not `clusterID` parameter support. However, in
CSI, SC has the `clusterID` parameter support but not mon. Due
to that we have to fetch the clusterID from config file for the
passed in mon and use it in our operations. This adds a helper
function to retrieve clusterID from passed in mon string.

Updates ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Sep 17, 2021
as part of migration support, the clusterID has to be fetched
from passed in mon. Because the intree RBD storage class only
got monitor and not `clusterID` parameter support. However, in
CSI, SC has the `clusterID` parameter support but not mon. Due
to that we have to fetch the clusterID from config file for the
passed in mon and use it in our operations. This adds a helper
function to retrieve clusterID from passed in mon string.

Updates ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Sep 17, 2021
as part of migration support, the clusterID has to be fetched
from passed in mon. Because the intree RBD storage class only
got monitor and not `clusterID` parameter support. However, in
CSI, SC has the `clusterID` parameter support but not mon. Due
to that we have to fetch the clusterID from config file for the
passed in mon and use it in our operations. This adds a helper
function to retrieve clusterID from passed in mon string.

Updates ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Sep 20, 2021
as part of migration support, the clusterID has to be fetched
from passed in mon. Because the intree RBD storage class only
got monitor and not `clusterID` parameter support. However, in
CSI, SC has the `clusterID` parameter support but not mon. Due
to that we have to fetch the clusterID from config file for the
passed in mon and use it in our operations. This adds a helper
function to retrieve clusterID from passed in mon string.

Updates ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
mergify bot pushed a commit that referenced this issue Sep 20, 2021
as part of migration support, the clusterID has to be fetched
from passed in mon. Because the intree RBD storage class only
got monitor and not `clusterID` parameter support. However, in
CSI, SC has the `clusterID` parameter support but not mon. Due
to that we have to fetch the clusterID from config file for the
passed in mon and use it in our operations. This adds a helper
function to retrieve clusterID from passed in mon string.

Updates #2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 2, 2021
This commit adds migration design doc which carry information about
the required changes and design for rbd intree to csi migration.

Updates ceph#2509
Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 2, 2021
This commit adds migration design doc which carry information about
the required changes and design for rbd intree to csi migration.

Updates ceph#2509
Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 2, 2021
This commit adds migration design doc which carry information about
the required changes and design for rbd intree to csi migration.

Updates ceph#2509
Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 2, 2021
This commit adds migration design doc which carry information about
the required changes and design for rbd intree to csi migration.

Updates ceph#2509
Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 2, 2021
This commit adds migration design doc which carry information about
the required changes and design for rbd intree to csi migration.

Updates ceph#2509
Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 2, 2021
This commit adds migration design doc which carry information about
the required changes and design for rbd intree to csi migration.

Updates ceph#2509
Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 2, 2021
This commit adds migration design doc which carry information about
the required changes and design for rbd intree to csi migration.

Fixes ceph#2596
Updates ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 2, 2021
This commit adds migration design doc which carry information about
the required changes and design for rbd intree to csi migration.

Fixes ceph#2596
Updates ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 22, 2021
This commit adds the migration secret request validation to expand,
create controller functions.

Ref # ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 24, 2021
This commit adds the migration secret request validation to expand,
create controller functions.

Ref # ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 24, 2021
This commit adds the migration secret request validation to expand,
create controller functions.

Ref # ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 24, 2021
This commit adds the migration secret request validation to expand,
create controller functions.

Ref # ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 24, 2021
This commit adds the migration secret request validation to expand,
create controller functions.

Ref # ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 29, 2021
This commit adds the migration secret request validation to expand,
create controller functions.

Ref # ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 29, 2021
This commit adds the migration secret request validation to expand,
create controller functions.

Ref # ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Nov 29, 2021
This commit adds the migration secret request validation to expand,
create controller functions.

Ref # ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Dec 17, 2021
This commit adds the migration secret request validation to expand,
create controller functions.

Ref # ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Dec 19, 2021
@humblec humblec removed the wontfix This will not be worked on label Dec 20, 2021
humblec added a commit to humblec/ceph-csi that referenced this issue Dec 20, 2021
This commit adds the migration secret request validation to expand,
create controller functions.

Ref # ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Dec 20, 2021

@humblec I have added one more todo to Remove E2E hacks and do valid migration testing with Kubernetes 1.23 as its released

mergify bot pushed a commit that referenced this issue Dec 20, 2021
This commit adds the migration secret request validation to expand,
create controller functions.

Ref # #2509

Signed-off-by: Humble Chirammal <[email protected]>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/ceph-csi that referenced this issue Dec 22, 2021
This commit adds the migration secret request validation to expand,
create controller functions.

Ref # ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Jan 4, 2022
This commit make recreateCSIRBDPods function to be a general one
so that it can be consumed by more clients.

Updates ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Jan 4, 2022
This commit make recreateCSIRBDPods function to be a general one
so that it can be consumed by more clients.

Updates ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
mergify bot pushed a commit that referenced this issue Jan 4, 2022
This commit make recreateCSIRBDPods function to be a general one
so that it can be consumed by more clients.

Updates #2509

Signed-off-by: Humble Chirammal <[email protected]>
humblec added a commit to humblec/ceph-csi that referenced this issue Jan 6, 2022
This commit make recreateCSIRBDPods function to be a general one
so that it can be consumed by more clients.

Updates ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/ceph-csi that referenced this issue Jan 12, 2022
This commit make recreateCSIRBDPods function to be a general one
so that it can be consumed by more clients.

Updates ceph#2509

Signed-off-by: Humble Chirammal <[email protected]>
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Apr 15, 2022
@github-actions
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependency/k8s depends on Kubernetes features wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants