Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/disconnect volume #29

Merged
merged 19 commits into from
Nov 10, 2021
Merged

Conversation

plaffitt
Copy link
Contributor

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This PR fix some stability issues related to multipathd by adding multiple consistency checks and prevent filesystem corruption. It also adds a lot of tests.

Which issue(s) this PR fixes:

Special notes for your reviewer:

This code is already being used in production by enix/san-iscsi-csi and Seagate/seagate-exos-x-csi.

Does this PR introduce a user-facing change?:

connector.Multipath has been removed, multipath is automatically used if multiple targets are given.
connector.TargetIqn and connector.TargetPortals has been replace by connector.Targets.
Connect is now a member function of connector.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jun 28, 2021
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 28, 2021
@plaffitt plaffitt force-pushed the fix/disconnect-volume branch from 2b2b08d to 04852eb Compare June 29, 2021 15:51
@plaffitt plaffitt force-pushed the fix/disconnect-volume branch from 9074942 to de0f8d9 Compare August 17, 2021 10:20
@humblec
Copy link
Contributor

humblec commented Aug 26, 2021

Thanks @paullaffitte 👍 , in a highlevel, the proposed code looks good to me, however havent tested this manually and verified myself on various code path changes. I am inclined to take this PR in though

@humblec
Copy link
Contributor

humblec commented Aug 26, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 26, 2021
@humblec
Copy link
Contributor

humblec commented Aug 26, 2021

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 26, 2021
TargetIqn: *iqn,
// List of portals must be >= 1 (>1 signals multipath/mpio)
TargetPortals: tgtp,
// List of targets must be >= 1 (>1 signals multipath/mpio)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have just one item in the targets can we consider it as "multipath" enabled share ? previously the decision was based on portals which always true and point us that its a multipath share.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current changes, there is a function Connector.IsMultipathEnabled that returns true if Connector.MountTargetDevice.Type == "mpath". So it doesn't depends on how many targets we have but on the fact that the devices is actually handled by multipathd or not. This information is retrieved through lsblk.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the changes proposed, but if at all possible, it would be great to not add new dependencies on host binaries such as lsblk and blockdev. Each new dependency makes it more challenging to support older OS releases. I wish I could tell customers to upgrade to the latest tool versions, but this is often very challenging in some manufacturing or highly secure environments.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found a few issues with the current https://github.com/kubernetes-csi/csi-lib-iscsi version, as I tested under a few different operating systems (Ubuntu, CentOS, RHEL), so my interest is to either work with the Enix PR changes or submit a PR here to address some small changes to handle multipath with iSCSI storage arrays.

One test case of interest, is testing these PR changes with multipathd.conf using 'user_friendly_names: "yes"' as I experienced issues when a customer was using this setting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jskazinski indeed very valuable points, it differently behaved for me as well with different distros.. Regarding user_friendly_names its indeed a good test to have.

iscsi/iscsi.go Outdated Show resolved Hide resolved
iscsi/multipath.go Outdated Show resolved Hide resolved
iscsi/iscsi.go Show resolved Hide resolved
iscsi/iscsi.go Outdated Show resolved Hide resolved
iscsi/multipath.go Show resolved Hide resolved
example/main.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2021
@humblec
Copy link
Contributor

humblec commented Sep 14, 2021

Just a general update here, we ( + @paullaffitte ) are having discussions on this PR to get this library in a good shape. Some of the fixes which I have made at my end conflict with this PR, but the plan is to get all the fixes in place for consumers to work correctly. Will update once we have all the fixes in place.

@humblec
Copy link
Contributor

humblec commented Sep 23, 2021

[status update]

We had a discussion on this PR ( @paullaffitte ) and beyond, the plan is to make sure the exported functions are not breaking while we fix other issues in this library as in this PR. This PR will be refreshed keeping backward compatibility is maintained and consumers are not broken. Once confirmed, we will get this PR in and start consuming in csi iscsi driver...etc.

iscsi/iscsi.go Outdated Show resolved Hide resolved
@humblec
Copy link
Contributor

humblec commented Oct 6, 2021

@paullaffitte any luck with the bit of refactoring of this PR based on above comments ?

@plaffitt
Copy link
Contributor Author

@humblec, I just pushed a commit that should remove breaking changes from the point of view of csi-driver-iscsi. I will continue to address comments above and try to make those changes ready in the course of next week.

@humblec
Copy link
Contributor

humblec commented Oct 15, 2021

@humblec, I just pushed a commit that should remove breaking changes from the point of view of csi-driver-iscsi.

Thanks @paullaffitte 👍

I will continue to address comments above and try to make those changes ready in the course of next week.

That would be awesome, so that I can refresh my patches for iscsi driver and be ready for a release of the same.

@humblec
Copy link
Contributor

humblec commented Oct 27, 2021

@paullaffitte can you please list down whats pending in this PR wrt to the comments or discussions we had so far? asking it explictly as this is a big PR with many changes.

@plaffitt
Copy link
Contributor Author

@humblec Sure, here is a list of changes present in this PR:

  • Disconnect volume following recommandations from Red Hat Customer Portal: Removing a storage device.
  • List devices using lsblk instead of relying on system files.
  • Relying on /sys/class/scsi_device/h:c:t:l/device instead of /sys/block/sdx/device to manage devices.
  • Keep track of all devices in connector instead of only the mounted one in order to handle disconnection properly.
  • Don't re-login when a session already exists.
  • Wait for paths to exists even when a session exists, since the device may have been added after the session was established in case of several devices on the same session.
  • Remove dead code.
  • Check multipath device consistency on connection and disconnection.
  • Added many tests totaling 62.5% of statements covered.
  • In order to know if multipath is enabled, the function Connector.IsMultipathEnabled returns true if Connector.MountTargetDevice.Type == "mpath".
  • Try to do iscsi discovery in case of session re-scan failed.

After our discussions, those changes have been added too:

  • Code involving breaking changes has been refactored in order to remove those breaking changes. Tested against csi-driver-iscsi.
  • We now use lsblk in raw mode in order to support older versions of this software and compatibility with a wider range of linux distributions.
  • Comments on newly introduced public functions.

Feel free to ask if you need more details on some points.

@humblec
Copy link
Contributor

humblec commented Oct 27, 2021

@humblec Sure, here is a list of changes present in this PR:

* Disconnect volume following recommandations from Red Hat Customer Portal: [Removing a storage device](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/removing_devices).

* List devices using lsblk instead of relying on system files.

* Relying on `/sys/class/scsi_device/h:c:t:l/device` instead of `/sys/block/sdx/device` to manage devices.

* Keep track of all devices in connector instead of only the mounted one in order to handle disconnection properly.

This was one of the bug which we want to get rid of for some time , thanks 👍

* Don't re-login when a session already exists.

Indeed good to have.

* Wait for paths to exists even when a session exists, since the device may have been added after the session was established in case of several devices on the same session.

Good 👍

* Remove dead code.

👍

* Check multipath device consistency on connection and disconnection.

I am not sure I got it completely, can you please share some more details on this?

* Added many tests totaling 62.5% of statements covered.

* In order to know if multipath is enabled, the function `Connector.IsMultipathEnabled` returns true if `Connector.MountTargetDevice.Type == "mpath"`.

This is one area we have to revisit later, the reason being, if user has set user_friendly_names or alias we may fail to detect the multipath device properly.

* Try to do iscsi discovery in case of session re-scan failed.

ooc, in which scenarios the discovery could be success but rescan fail?

After our discussions, those changes have been added too:

* Code involving breaking changes has been refactored in order to remove those breaking changes. Tested against csi-driver-iscsi.

This was the very important one and thanks for making sure we are not regressing any more 👍

* We now use lsblk in raw mode in order to support older versions of this software and compatibility with a wider range of linux distributions.

Yep, good to have !

* Comments on newly introduced public functions.

Helps in all extents.

Feel free to ask if you need more details on some points.

Just a couple of clarifications as mentioned above.

@humblec
Copy link
Contributor

humblec commented Oct 27, 2021

@paullaffitte one final comment on this PR, that said, it would be appreciated if you can squash/group the commits which cover the functionalities and tests together , atm there are 50 commits which I think that, bit overly segregated and make it difficult to review or pick a specific change in a workflow manner. Appreciated if you can squash and refresh. Please let me know if you need any help on this.. Thanks for the great work 👍

- fix example/main.go
- add go.mod to simplify local development
@plaffitt plaffitt force-pushed the fix/disconnect-volume branch from 1c519ba to b0ce999 Compare November 2, 2021 11:13
@plaffitt
Copy link
Contributor Author

plaffitt commented Nov 2, 2021

I applied the requested changes, there is just one point to clarify #29 (comment).

debug.Printf("Couldn't find dm-* path for path: %s, found non dm-* path: %s", path, devicePath)
return "", fmt.Errorf("couldn't find dm-* path for path: %s, found non dm-* path: %s", path, devicePath)

if multipathDevice.Type != "mpath" {
Copy link
Contributor

@humblec humblec Nov 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paullaffitte it is good to have a tracker issue to check how this code block in effect if user_friendly_names is enabled in a multipath system. We can also add a todo here so that we wont forget to keep track of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be better to have an issue for this as it make it more visible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted an issue about it: #32

@humblec
Copy link
Contributor

humblec commented Nov 2, 2021

I applied the requested changes, there is just one point to clarify #29 (comment).

This version looks good to me 👍 , I have one minor comment to have a tracker for user_friendly_names enabled multipath system setup. Rest looks good.

@j-griffith @jsafrane @msau42 @xing-yang requesting your review too.

Copy link
Contributor

@j-griffith j-griffith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the defaults on those flags being removed and hte conversation you have going regarding the introduction of lsblk I'm LGTM

I don't think my comment around the flag defaults is crticial or needs to be done to merge this, but I am confused on where you're landing on the lsblk question.

password = flag.String("password", "eJBDC7Bt7WE3XFDq", "")
lun = flag.Int("lun", 1, "")
debug = flag.Bool("debug", false, "enable logging")
portals = flag.String("portals", "192.168.1.112:3260", "Comma delimited. Eg: 1.1.1.1,2.2.2.2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point all these defaults should be removed and we should a mandatory check that the flags are set. These dfaults are nice for an example but they won't mean anything to to someone that's not using the openstack test setup I had 3 years ago :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure @j-griffith , I will get this cleared while working on some other optimization on this. Opening a tracker issue for this as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#33 tracks this.

@humblec
Copy link
Contributor

humblec commented Nov 2, 2021

Other than the defaults on those flags being removed and hte conversation you have going regarding the introduction of lsblk I'm LGTM

I don't think my comment around the flag defaults is crticial or needs to be done to merge this, but I am confused on where you're landing on the lsblk question.

Thanks @j-griffith 👍 , for now, we will stick with the current version of lsblk raw format parsing to support few versions of util-linux . in short, we are good for now.

@plaffitt
Copy link
Contributor Author

plaffitt commented Nov 4, 2021

With all recent changes, csi-driver-iscsi seems to still work properly. I just noted that there still is one breaking change: Connector.Multipath doesn't exists anymore and is set in csi-driver-iscsi in iscsi.buildISCSIConnector. Removing this line in the driver is the only required change.

@humblec
Copy link
Contributor

humblec commented Nov 8, 2021

With all recent changes, csi-driver-iscsi seems to still work properly. I just noted that there still is one breaking change: Connector.Multipath doesn't exists anymore and is set in csi-driver-iscsi in iscsi.buildISCSIConnector. Removing this line in the driver is the only required change.

@paullaffitte thanks for retesting it with latest refresh version of this patch. Regarding the Connector.Multipath breakage, I think its better if we can address it here. The reason being, csi-driver-iscsi is just one consumer, we dont know about other consumers of this library. If at all there is no way to fix the backward compatibility break here, we have to think about other options.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: j-griffith, paullaffitte

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 8, 2021
@humblec
Copy link
Contributor

humblec commented Nov 10, 2021

[Update]
Discussed the possibility of avoiding the connector.Multipath in the clients, we do see some mechanism , however considering all the logic has been moved to the library, it looks like we can survive with this change. Also , to not worry much about the other possibilities of breakage due to this, we dont have a release yet for this library, we see there are consumers , however we havent released a 'supported' version of this library yet, even consumers like csi-iscsi-driver still be to be released or GA, so we are more inclined to make this change here and carry on.

Considering this has gone through good amount of testing and review ( also an existing approval) so far, lets get this in.

If at all any alarm for the breakage side, we will release a version of this library with a fix accordingly.

Thanks @paullaffitte @j-griffith @jskazinski for the contribution and discussions on this. 👍

@humblec
Copy link
Contributor

humblec commented Nov 10, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 10, 2021
@humblec
Copy link
Contributor

humblec commented Nov 10, 2021

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 10, 2021
@k8s-ci-robot k8s-ci-robot merged commit 5c802c4 into kubernetes-csi:master Nov 10, 2021
@plaffitt
Copy link
Contributor Author

@humblec Do you have a release date in mind? Maybe just an estimation?

Thanks @humblec @j-griffith and @jskazinski for your collaboration!

@humblec
Copy link
Contributor

humblec commented Nov 11, 2021

@humblec Do you have a release date in mind? Maybe just an estimation?

Thanks @humblec @j-griffith and @jskazinski for your collaboration!

We are targetting a release around kube 1.23 release @paullaffitte ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants