-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xWaitForDisk | xDisk: Disk number changed between reboots #81
Comments
Disks is shared thru iSCSI. Here is the relevant config.
|
This is being worked on in xFailOverCluster. see dsccommunity/FailoverClusterDsc#26. Maybe the same logic could be brought to xDisk and xWaitForDisk? |
Hi @johlju - sorry for not looking at this earlier! That does look like the Disk Number isn't static. I'll check out what is being done in xFailOverCluster and see if we can port it over here. May need to still support Disk Number to prevent breaking legacy code, but also possibly support an alternate disk identifier? |
Absolutely agree disk number should be kept also. If another identifier of some sort is used instead, that could "override" the disk number. Also read that this could occur when using iSCSI, but then only with multiple paths (which I do not have, only one NIC). Which led me to believe that when using for example FC storage, disk number is (more) static. After doing another reboot a few days later the disk jumped back to disk number 1-5. Another thing is that the bug could be me, maybe it is me that has not configured Windows/iSCSI correctly for using iSCSI disks? 😄 |
@PlagueHO and @johlju I've now hit this issue. The disk numbers are changing and causing havoc with this resource. Below are all the properties for a disk. Maybe a unique identifier that could be used instead is SerialNumber/UniqueId or even GUID (used in the pull request for xFailOverCluster @johlju mentioned).
|
Hi @Zuldan. Im going to try and get the work done on this over the easter break/this weekend. I'm going to start by seeing if I can find a way to effectively test for the problem so I can be sure im fixing it 😁 TDD FTW! |
@PlagueHO your timing couldn't be better. I've got the config for all our SQL AOAG clusters in the lab working perfectly (except for this disk issue) and was going to start pushing out clusters into the Dev environment next week but was forced to delay that because of this problem. I only noticed the problem when I added multiple controllers and multiple disks to those controllers on the VM. Usually if you have a single controller and a couple of disks, the order you put the disks in is the order they appear within the VM, however when in the scenario described above, the disk order is scrambled. I definitely think disk numbers need to be kept, but for me it's not it's a legacy the thing, it's because 90% of our servers only have a single controller so the disk number identifier works really well and I wouldn't want to have to record down the serial/GUID for every disk on every VM and then inject that back into their DSC config. So having the ability to switch over to another identifier for our clustered servers would be great. If you want to go through some ideas around over the weekend or need some testing done I'm more than happy to help. FYI.
|
To diagnose the issue I slapped together some dirty code to provide all the stats from 'Disk' to 'Access Path.
It appears that there isn't always a GUID for every disk so it would be better to use UniqueId?
I came up with a couple of ways to implement it. See MOFs below. Would love a discussion and/or more ideas. Idea 1 (most obvious one)
Idea 2
Idea 3 (the idea here is that generally if you have a small amount of disks you assign drive letters and if you have a large amount of disks you use DiskAccess paths [letter limit of 25 excluding 'C'])
|
Cool! Great info and fantastic suggestions. I personally like Idea 1 because of not being a breaking change. The primary key thing is a pain, but I think it is acceptable - I've used similar patterns in xNetworking and xCertificate - so I don't think it is a terrible idea. We'd need to also change the DiskNumber from a I'd also suggest using DiskUniqueId to make it clear this is a parameter of the disk. I'd also add a parameter validation function in that would validate the parameter combination is valid - if the combo was invalid (none or both of DiskNumber and DiskUniqueId were set) then an exception would be logged. So: [ClassVersion("1.0.0.0"), FriendlyName("xDisk")]
class MSFT_xDisk : OMI_BaseResource
{
[Key, Description("Specifies the identifier for which disk to modify.")] String DriveLetter;
[Write, Description("Specifies the disk number for which disk to modify.")] Uint32 DiskNumber;
[Write, Description("Specifies the uniqueid for which disk to modify.")] String DiskUniqueId;
[Write, Description("Specifies the size of new volume.")] Uint64 Size;
[Write, Description("Define volume label if required.")] String FSLabel;
[Write, Description("Specifies the allocation unit size to use when formatting the volume.")] uint32 AllocationUnitSize;
[Write, Description("Specifies the file system format of the new volume."), ValueMap{"NTFS","ReFS"}, Values{"NTFS","ReFS"}] String FSFormat;
};
[ClassVersion("1.0.0.0"), FriendlyName("xDiskAccessPath")]
class MSFT_xDiskAccessPath : OMI_BaseResource
{
[Key, Description("Specifies the access path folder to the assign the disk volume to.")] String AccessPath;
[Write, Description("Specifies the disk number for which disk to modify.")] Uint32 DiskNumber;
[Write, Description("Specifies the uniqueid for which disk to modify.")] String DiskUniqueId;
[Write, Description("Specifies the size of new volume.")] Uint64 Size;
[Write, Description("Define volume label if required.")] String FSLabel;
[Write, Description("Specifies the allocation unit size to use when formatting the volume.")] uint32 AllocationUnitSize;
[Write, Description("Specifies the file system format of the new volume."), ValueMap{"NTFS","ReFS"}, Values{"NTFS","ReFS"}] String FSFormat;
}; @TravisEz13 , @johlju - do you guys have any thoughts on this as a solution? |
It looks good to me! Maybe just change |
My bad, xDiskAccessPath is for "mount points" and xDisk is for "drive letters". I didn't realize there was two almost identical resources. |
Cooleo! I'll get this out over the weekend. |
@johlju , @Zuldan - I've nearly finished the changes to However, with regards
So if we decided Idea 2 is acceptable for One other alternative would be to add a new resource Would love your guys thoughts before I proceed too far down one track or another... |
@PlagueHO I'm happy with idea 2 for all resources. If we change one then they all should be aligned. As for the breaking change, there's nothing stopping people from downloading previous versions of the resources to keep compatibility. DSC is all about continuous deployment, make changes and move forward ;-) |
@Zuldan - I'm in agreement. I think Idea 2 is the way to go if we're going to do a breaking change because I think consistency is very important. But I'd like some feedback from @johlju and @TravisEz13 because of being a breaking change. @TravisEz13 - a summary of the problem is that DiskNumber in some scenarios can not be used to uniquely identify a disk. Our proposal is to replace the old 'DiskNumber' key with a 'DiskId' key that can be used as either Disk Number (the default) or as a Disk Unique Id. A new 'DiskIdType' parameter would be added that would allow the DiskId to either be 'Number' (default) or 'UniqueId'. The Resources would be changed like this: xDisk: [ClassVersion("1.0.0.0"), FriendlyName("xDisk")]
class MSFT_xDisk : OMI_BaseResource
{
[Key, Description("Specifies the drive letter to assign to the volume.")] String DriveLetter;
[Write, Description("Specifies the disk identifier for the disk to modify.")] String DiskId;
[Write, Description("Specifies the identifier type that is used to identify the disk."), ValueMap{"Number","UniqueId"}, Values{"Number","UniqueId"}] String DiskIdType;
[Write, Description("Specifies the size of new volume.")] Uint64 Size;
[Write, Description("Define volume label if required.")] String FSLabel;
[Write, Description("Specifies the allocation unit size to use when formatting the volume.")] uint32 AllocationUnitSize;
[Write, Description("Specifies the file system format of the new volume."), ValueMap{"NTFS","ReFS"}, Values{"NTFS","ReFS"}] String FSFormat;
}; xDiskAccessPath: [ClassVersion("1.0.0.0"), FriendlyName("xDiskAccessPath")]
class MSFT_xDiskAccessPath : OMI_BaseResource
{
[Key, Description("Specifies the access path folder to the assign the disk volume to.")] String AccessPath;
[Write, Description("Specifies the disk identifier for the disk to modify.")] String DiskId;
[Write, Description("Specifies the identifier type that is used to identify the disk."), ValueMap{"Number","UniqueId"}, Values{"Number","UniqueId"}] String DiskIdType;
[Write, Description("Specifies the size of new volume.")] Uint64 Size;
[Write, Description("Define volume label if required.")] String FSLabel;
[Write, Description("Specifies the allocation unit size to use when formatting the volume.")] uint32 AllocationUnitSize;
[Write, Description("Specifies the file system format of the new volume."), ValueMap{"NTFS","ReFS"}, Values{"NTFS","ReFS"}] String FSFormat;
}; [ClassVersion("1.0.0.0"), FriendlyName("xWaitForDisk")]
class MSFT_xWaitForDisk : OMI_BaseResource
{
[Key, Description("Specifies the disk identifier for the disk to wt for.")] String DiskId;
[Write, Description("Specifies the identifier type that is used to identify the disk."), ValueMap{"Number","UniqueId"}, Values{"Number","UniqueId"}] String DiskIdType;
[Write, Description("Specifies the number of seconds to wait for the disk to become available.")] Uint32 RetryIntervalSec;
[Write, Description("The number of times to loop the retry interval while waiting for the disk.")] Uint32 RetryCount;
}; @Zuldan - I know you're waiting on this change, but can you wait a little bit longer so I can get confirmation about the proposal? |
@PlagueHO happy to wait mate. Better to have everyone agree to the changes. |
Cheers @Zuldan - it shouldn't take long to make the change as I've already done a reasonable amount on this. So as soon as I get the go-ahead I'll get it done! |
@PlagueHO Awesome work as always! Idea 1 Idea 2 Summary |
@Zuldan , @johlju I'm going to repeat to xDiskAccessPath and xWaitForDisk tomorrow and the submit the PR. Once that PR is through I'll start work on the new xDiskEx, xWaitForDiskEx, xPartition and xVolume. |
I will help you review this one :) |
@johlju - I was hoping you would 😁 Your reviews are the as good as they get. Right - onto the next changes. |
Hi @Zuldan, @johlju - I've submitted a PR with these changes. I've also converted xDisk to use the xDiskAccessPath pattern which is a more robust pattern (and has less bugs). Once this PR is through I'll look into creating the new resources as I still feel the current resources won't work in many scenarios. |
@PlagueHO Once you fixed the comments on the PR I will use the code in my lab to verify the function as well. 😄 |
Awesome! Thanks @johlju. I'm also going to update https://github.com/PlagueHO/LabBuilder to use the new version as well and use that to run some tests. @Zuldan - do you have any test environments you can run this version through? |
@PlagueHO, I can report back the new code is working well. I can scramble the disks around on multiple VM's and it's picking up the correct disks every time when using DiskUniqueId. My old configurations using Disk Number (with updated config) are also still working. Very happy! |
Disk number seem to be unreliable to select the right disk. Or might be that I'm I doing something wrong? :)
/cc @PlagueHO
I configured a cluster successfully with the disks.
After reboot of the server I tried to apply the configuration again. And it cannot find the disks any more.
Running diskpart shows that the disks no longer has the same disk numbers.
The cluster is happy at least.
Disk number 6 should be disk number 1.
The text was updated successfully, but these errors were encountered: