Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto-add-quorum-tie-breaker not honoured for containers (only) #45

Open
acidrop opened this issue Jul 21, 2021 · 6 comments
Open

auto-add-quorum-tie-breaker not honoured for containers (only) #45

acidrop opened this issue Jul 21, 2021 · 6 comments

Comments

@acidrop
Copy link

acidrop commented Jul 21, 2021

I'm doing some tests regarding quorum auto tie breaker and I noticed that even though it's honoured for qemu VMs, it does not for LXC containers. Not sure if this the right place to post this, but I thought that since this occurs on Proxmox, it might be more related to its plugin.

Here's a sequence of the commands...

root@pve1:~# linstor rg l -p
+-------------------------------------------------------------------+
| ResourceGroup | SelectFilter | VlmNrs | Description |
|===================================================================|
| DfltRscGrp | PlaceCount: 2 | | |
|-------------------------------------------------------------------|
| drbd-rg01 | PlaceCount: 2 | 0 | |
| | StoragePool(s): drbdpool | | |
| | DisklessOnRemaining: False | | |
| | ProviderList: ['ZFS_THIN'] | | |
|-------------------------------------------------------------------|
| drbd-rg02 | PlaceCount: 2 | 0 | ThinLVM |
| | StoragePool(s): thinpool01 | | |
| | DisklessOnRemaining: False | | |
| | ProviderList: ['LVM_THIN'] | | |
+-------------------------------------------------------------------+

root@pve1:~# linstor c lp|grep quorum
| DrbdOptions/Resource/quorum | majority |
| DrbdOptions/auto-add-quorum-tiebreaker | true |
| DrbdOptions/auto-quorum | io-error

root@pve1:~# linstor rd lp vm-109-disk-1 -p
+-----------------------------------------------------------+
| Key | Value |
|===========================================================|
| DrbdOptions/Net/allow-two-primaries | yes |
| DrbdOptions/Resource/quorum | off |
| DrbdOptions/auto-add-quorum-tiebreaker | False |
| DrbdOptions/auto-verify-alg | crct10dif-pclmul |
| DrbdPrimarySetOn | PVE3 |
+-----------------------------------------------------------+

Shouldn't the property inherited from the controller set properties ?...anyway I add it manually...

root@pve1:~# linstor rd sp vm-109-disk-1 DrbdOptions/auto-add-quorum-tiebreaker True
SUCCESS:
Successfully set property key(s): DrbdOptions/auto-add-quorum-tiebreaker
SUCCESS:
Description:
Resource definition 'vm-109-disk-1' modified.
Details:
Resource definition 'vm-109-disk-1' UUID is: e9847acd-efb8-42bc-9bf4-7f7becf126d5
SUCCESS:
(pve2) Resource 'vm-109-disk-1' [DRBD] adjusted.
SUCCESS:
(pve3) Resource 'vm-109-disk-1' [DRBD] adjusted.

root@pve1:~# linstor rd lp vm-109-disk-1 -p
+-----------------------------------------------------------+
| Key | Value |
|===========================================================|
| DrbdOptions/Net/allow-two-primaries | yes |
| DrbdOptions/Resource/quorum | off |
| DrbdOptions/auto-add-quorum-tiebreaker | true |
| DrbdOptions/auto-verify-alg | crct10dif-pclmul |
| DrbdPrimarySetOn | PVE3 |
+-----------------------------------------------------------+

root@pve1:~# linstor r l|grep 109
| vm-109-disk-1 | pve2 | 7005 | InUse | Ok | UpToDate | 2021-07-21 09:24:58 |
| vm-109-disk-1 | pve3 | 7005 | Unused | Ok | UpToDate | 2021-07-21 09:24:56 |

root@pve1:~# ssh pve2 'pct migrate 109 pve4 --restart'
2021-07-21 09:34:57 shutdown CT 109
2021-07-21 09:35:01 use dedicated network address for sending migration traffic (10.10.13.4)
2021-07-21 09:35:02 starting migration of CT 109 to node 'pve4' (10.10.13.4)
2021-07-21 09:35:02 volume 'linstor-thinlvm:vm-109-disk-1' is on shared storage 'linstor-thinlvm'
2021-07-21 09:35:02 start final cleanup
2021-07-21 09:35:03 start container on target node
2021-07-21 09:35:03 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve4' [email protected] pct start 109
2021-07-21 09:35:08 migration finished successfully (duration 00:00:11)

root@pve1:~# linstor r l|grep 109
| vm-109-disk-1 | pve2 | 7005 | Unused | Ok | UpToDate | 2021-07-21 09:24:58 |
| vm-109-disk-1 | pve3 | 7005 | Unused | Ok | UpToDate | 2021-07-21 09:24:56 |
| vm-109-disk-1 | pve4 | 7005 | InUse | Ok | Diskless | 2021-07-21 09:35:05 |

root@pve1:~# ssh pve4 'pct migrate 109 pve2 --restart'
2021-07-21 09:36:23 shutdown CT 109
2021-07-21 09:36:26 use dedicated network address for sending migration traffic (10.10.13.2)
2021-07-21 09:36:27 starting migration of CT 109 to node 'pve2' (10.10.13.2)
2021-07-21 09:36:27 volume 'linstor-thinlvm:vm-109-disk-1' is on shared storage 'linstor-thinlvm'

NOTICE
Intentionally removing diskless assignment (vm-109-disk-1) on (pve4).
It will be re-created when the resource is actually used on this node.
2021-07-21 09:36:28 start final cleanup
2021-07-21 09:36:30 start container on target node
2021-07-21 09:36:30 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve2' [email protected] pct start 109
2021-07-21 09:36:37 migration finished successfully (duration 00:00:14)

root@pve1:~# linstor r l|grep 109
| vm-109-disk-1 | pve2 | 7005 | InUse | Ok | UpToDate | 2021-07-21 09:24:58 |
| vm-109-disk-1 | pve3 | 7005 | Unused | Ok | UpToDate | 2021-07-21 09:24:56 |

root@pve1:~# linstor advise r -p|grep 109
| vm-109-disk-1 | Resource has 2 replicas but no tie-breaker, could lead to split brain. | linstor rd ap -d --place-count 1 vm-109-disk-1 |

root@pve1:~# linstor rd ap -d --place-count 1 vm-109-disk-1
usage: linstor [-h] [--version] [--no-color] [--no-utf8] [--warn-as-error]
[--curl] [--controllers CONTROLLERS] [-m]
[--output-version {v0,v1}] [--verbose] [-t TIMEOUT]
[--disable-config] [--user USER] [--password PASSWORD]
[--certfile CERTFILE] [--keyfile KEYFILE] [--cafile CAFILE]
[--allow-insecure-auth]
{advise, controller, drbd-proxy, encryption, error-reports,
exos, help, interactive, list-commands, node, physical-storage,
resource, resource-connection, resource-definition,
resource-group, snapshot, sos-report, space-reporting,
storage-pool, volume, volume-definition, volume-group} ...
linstor: error: unrecognized arguments: -d

Not sure if the above advise is correct in the first place ?

@acidrop acidrop changed the title auto-add-quorum-tie-braker not honoured on containers (only) auto-add-quorum-tie-breaker not honoured on containers (only) Jul 21, 2021
@acidrop acidrop changed the title auto-add-quorum-tie-breaker not honoured on containers (only) auto-add-quorum-tie-breaker not honoured for containers (only) Jul 21, 2021
@rck
Copy link
Member

rck commented Jul 21, 2021

The "unrecognized argument" looks like a client bug (@rp- ).
The rest is "something else", there is no difference between VMs and containers in the plugin. @ghernadi might see what happens here quicker, he knows the quorum/tiebreaker rules ways better. Might be even as intended, I don't know. Other guess: lvm vs. zfs, but probably unlikely.

@rp-
Copy link

rp- commented Jul 21, 2021

well the advise command seems just wrong here, I guess it should be --drbd-diskless instead of -d @WanzenBug ?

@ghernadi
Copy link

Shouldn't the property inherited from the controller set properties ?...anyway I add it manually...

Well, yes, unless the ResourceDefinition overrules the otherwise inherited controller-property. The same property can be set on multiple levels (Controller, ResourceGroup, ResourceDefinition, etc...). As a rule of thumb: The closer the property is to the actual volume (LVM / ZFS / ...) the higher its priority. In this case the False from the ResourceDefinition had a higher priority than the True from the Controller.

Regarding the rest: please update the linstor-client.
We recently changed that linstor r l will now show all resources, including the tie-breaker-resources that were "hidden by default" in previous versions and needed a linstor r l -a to be shown.

That means, I assume that you indeed had your tie-breaking resource deployed as expected, but it was simply hidden by the mentioned client behavior. I suspect this due to these logs:

root@pve1:~# linstor r l|grep 109
| vm-109-disk-1 | pve2 | 7005 | InUse | Ok | UpToDate | 2021-07-21 09:24:58 |
| vm-109-disk-1 | pve3 | 7005 | Unused | Ok | UpToDate | 2021-07-21 09:24:56 |

only 2 resources shown here

root@pve1:~# ssh pve2 'pct migrate 109 pve4 --restart'
...
root@pve1:~# linstor r l|grep 109
| vm-109-disk-1 | pve2 | 7005 | Unused | Ok | UpToDate | 2021-07-21 09:24:58 |
| vm-109-disk-1 | pve3 | 7005 | Unused | Ok | UpToDate | 2021-07-21 09:24:56 |
| vm-109-disk-1 | pve4 | 7005 | InUse | Ok | Diskless | 2021-07-21 09:35:05 |

Suddenly you have 3 resources. A tie-breaking resource that gets promoted (DRBD primary) immediately loses its TIE_BREAKER flag and gets "degraded" from a linstor-managed TIE_BREAKER to a user-managed DISKLESS resource.
The only difference between tiebreaker and diskless is that Linstor is brave enough to automatically remove the tiebreaker in case it is no longer needed. A diskless is never touched - only taken over in case you try to delete the diskless but Linstor is configured to keep it as tiebreaker. You can conveniently overrule this "takeover" logic by deleting the now tiebreaker resource. However, in that case Linstor assumes that you do not want to have a tiebreaker on this ResourceDefinition, which means that Linstor also sets the DrbdOptions/auto-add-quorum-tiebreaker | False | property, which might explain why it was set in your setup..

@acidrop
Copy link
Author

acidrop commented Jul 21, 2021

In this case RD properties are set automatically during the VM/CT creation process via the LINSTOR Proxmox plugin.
Only difference I can see between VM and CT is that the 1st is being live migrated while the 2nd needs to be restarted (shutdown/start). Could that play a role on this ?

There are no properties set at the RG level..

root@pve1:~# linstor rg lp drbd-rg01 -p
+-------------+
| Key | Value |
|=============|
+-------------+
root@pve1:~# linstor rg lp drbd-rg02 -p
+-------------+
| Key | Value |
|=============|
+-------------+

Below follow some more tests, both when creating a Qemu VM and a LXC container...
As you will notice, the tiebreaker resource is created correctly for the VM (vm-109-disk-1) but not for the CT (vm-110-disk-1).
I did also add the "-a" parameter during "linstor r l' command, but it does not seem to be visible even with that.

QEMU VM=vm-109-disk-1
LXC CT=vm-110-disk-1

# Resource definition properties as created by Proxmox during VM/CT creating process…

root@pve1:~# linstor rd lp vm-109-disk-1 -p
+--------------------------------------------------------+
| Key                                 | Value            |
|========================================================|
| DrbdOptions/Net/allow-two-primaries | yes              |
| DrbdOptions/Resource/quorum         | off              |
| DrbdOptions/auto-verify-alg         | crct10dif-pclmul |
| DrbdPrimarySetOn                    | PVE3             |
+--------------------------------------------------------+


root@pve1:~# linstor rd lp vm-110-disk-1 -p
+--------------------------------------------------------+
| Key                                 | Value            |
|========================================================|
| DrbdOptions/Net/allow-two-primaries | yes              |
| DrbdOptions/Resource/quorum         | off              |
| DrbdOptions/auto-verify-alg         | crct10dif-pclmul |
| DrbdPrimarySetOn                    | PVE2             |
+--------------------------------------------------------+


root@pve1:~# linstor r l -a|grep vm-109
| vm-109-disk-1 | pve2 | 7005 | Unused | Ok    |   UpToDate | 2021-07-21 11:01:10 |
| vm-109-disk-1 | pve3 | 7005 | InUse  | Ok    |   UpToDate | 2021-07-21 11:01:07 |

root@pve1:~# ssh pve3 'qm migrate 109 pve4 --online'
2021-07-21 11:06:45 use dedicated network address for sending migration traffic (10.10.13.4)
2021-07-21 11:06:45 starting migration of VM 109 to node 'pve4' (10.10.13.4)
2021-07-21 11:06:45 starting VM 109 on remote node 'pve4'
2021-07-21 11:06:50 start remote tunnel
2021-07-21 11:06:51 ssh tunnel ver 1
2021-07-21 11:06:51 starting online/live migration on tcp:10.10.13.4:60000
2021-07-21 11:06:51 set migration capabilities
2021-07-21 11:06:52 migration downtime limit: 100 ms
2021-07-21 11:06:52 migration cachesize: 256.0 MiB
2021-07-21 11:06:52 set migration parameters
2021-07-21 11:06:52 start migrate command to tcp:10.10.13.4:60000
2021-07-21 11:06:53 average migration speed: 2.0 GiB/s - downtime 16 ms
2021-07-21 11:06:53 migration status: completed
2021-07-21 11:06:56 migration finished successfully (duration 00:00:11)

root@pve1:~# linstor r l -a|grep vm-109
| vm-109-disk-1 | pve2 | 7005 | Unused | Ok    |   UpToDate | 2021-07-21 11:01:10 |
| vm-109-disk-1 | pve3 | 7005 | Unused | Ok    |   UpToDate | 2021-07-21 11:01:07 |
| vm-109-disk-1 | pve4 | 7005 | InUse  | Ok    |   Diskless | 2021-07-21 11:06:47 |

root@pve1:~# ssh pve4 'qm migrate 109 pve3 --online'
2021-07-21 11:07:51 use dedicated network address for sending migration traffic (10.10.13.3)
2021-07-21 11:07:51 starting migration of VM 109 to node 'pve3' (10.10.13.3)
2021-07-21 11:07:51 starting VM 109 on remote node 'pve3'
2021-07-21 11:07:55 start remote tunnel
2021-07-21 11:07:56 ssh tunnel ver 1
2021-07-21 11:07:56 starting online/live migration on tcp:10.10.13.3:60000
2021-07-21 11:07:56 set migration capabilities
2021-07-21 11:07:56 migration downtime limit: 100 ms
2021-07-21 11:07:56 migration cachesize: 256.0 MiB
2021-07-21 11:07:56 set migration parameters
2021-07-21 11:07:56 start migrate command to tcp:10.10.13.3:60000
2021-07-21 11:07:57 average migration speed: 2.0 GiB/s - downtime 6 ms
2021-07-21 11:07:57 migration status: completed

NOTICE
  Intentionally removing diskless assignment (vm-109-disk-1) on (pve4).
  It will be re-created when the resource is actually used on this node.
2021-07-21 11:08:01 migration finished successfully (duration 00:00:10)

root@pve1:~# linstor r l -a|grep vm-109
| vm-109-disk-1 | pve2 | 7005 | Unused | Ok    |   UpToDate | 2021-07-21 11:01:10 |
| vm-109-disk-1 | pve3 | 7005 | InUse  | Ok    |   UpToDate | 2021-07-21 11:01:07 |
| vm-109-disk-1 | pve4 | 7005 | Unused | Ok    | TieBreaker | 2021-07-21 11:06:47 |

root@pve1:~# linstor rd lp vm-109-disk-1 -p
+--------------------------------------------------------+
| Key                                 | Value            |
|========================================================|
| DrbdOptions/Net/allow-two-primaries | yes              |
| DrbdOptions/Resource/on-no-quorum   | io-error         |
| DrbdOptions/Resource/quorum         | majority         |
| DrbdOptions/auto-verify-alg         | crct10dif-pclmul |
| DrbdPrimarySetOn                    | PVE3             |
+--------------------------------------------------------+


root@pve1:~# linstor r l -a|grep vm-110
| vm-110-disk-1 | pve2 | 7010 | InUse  | Ok    |   UpToDate | 2021-07-21 11:04:55 |
| vm-110-disk-1 | pve3 | 7010 | Unused | Ok    |   UpToDate | 2021-07-21 11:04:57 |

root@pve1:~# ssh pve2 'pct migrate 110 pve4 --restart'
2021-07-21 11:16:53 shutdown CT 110
2021-07-21 11:16:57 use dedicated network address for sending migration traffic (10.10.13.4)
2021-07-21 11:16:57 starting migration of CT 110 to node 'pve4' (10.10.13.4)
2021-07-21 11:16:57 volume 'linstor-thinlvm:vm-110-disk-1' is on shared storage 'linstor-thinlvm'
2021-07-21 11:16:58 start final cleanup
2021-07-21 11:16:59 start container on target node
2021-07-21 11:16:59 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve4' [email protected] pct start 110
2021-07-21 11:17:04 migration finished successfully (duration 00:00:11)

root@pve1:~# linstor r l -a|grep vm-110
| vm-110-disk-1 | pve2 | 7010 | Unused | Ok    |   UpToDate | 2021-07-21 11:04:55 |
| vm-110-disk-1 | pve3 | 7010 | Unused | Ok    |   UpToDate | 2021-07-21 11:04:57 |
| vm-110-disk-1 | pve4 | 7010 | InUse  | Ok    |   Diskless | 2021-07-21 11:17:01 |

root@pve1:~# linstor rd lp vm-110-disk-1 -p
+-----------------------------------------------------------+
| Key                                    | Value            |
|===========================================================|
| DrbdOptions/Net/allow-two-primaries    | yes              |
| DrbdOptions/Resource/on-no-quorum      | io-error         |
| DrbdOptions/Resource/quorum            | majority         |
| DrbdOptions/auto-add-quorum-tiebreaker | False            |
| DrbdOptions/auto-verify-alg            | crct10dif-pclmul |
| DrbdPrimarySetOn                       | PVE2             |
+-----------------------------------------------------------+

root@pve1:~# ssh pve4 'pct migrate 110 pve2 --restart'
2021-07-21 11:20:17 shutdown CT 110
2021-07-21 11:20:21 use dedicated network address for sending migration traffic (10.10.13.2)
2021-07-21 11:20:21 starting migration of CT 110 to node 'pve2' (10.10.13.2)
2021-07-21 11:20:21 volume 'linstor-thinlvm:vm-110-disk-1' is on shared storage 'linstor-thinlvm'
2021-07-21 11:20:21 start final cleanup
2021-07-21 11:20:23 start container on target node
2021-07-21 11:20:23 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve2' [email protected] pct start 110
2021-07-21 11:20:30 migration finished successfully (duration 00:00:13)

root@pve1:~# linstor rd lp vm-110-disk-1 -p
+-----------------------------------------------------------+
| Key                                    | Value            |
|===========================================================|
| DrbdOptions/Net/allow-two-primaries    | yes              |
| DrbdOptions/Resource/quorum            | off              |
| DrbdOptions/auto-add-quorum-tiebreaker | False            |
| DrbdOptions/auto-verify-alg            | crct10dif-pclmul |
| DrbdPrimarySetOn                       | PVE2             |
+-----------------------------------------------------------+

root@pve1:~# linstor r l -a|grep vm-110
| vm-110-disk-1 | pve2 | 7010 | InUse  | Ok    |   UpToDate | 2021-07-21 11:04:55 |
| vm-110-disk-1 | pve3 | 7010 | Unused | Ok    |   UpToDate | 2021-07-21 11:04:57 |


@acidrop
Copy link
Author

acidrop commented Jul 21, 2021

Ok, after some further testing it looks like this is not a LINSTOR issue, as when I create RD/VD/Resource directly via linstor in the command line, auto-tie-breaker is automatically created once I delete the Diskless resource from the respective node. The property is correctly inherited from the Controller as expected in this case.

The "issue" seems to be related to Proxmox/LINSTOR plugin and how it handles "live migration" and "shutdown/start" actions no matter if that's a VM or a CT.
So, when executing a Live Migrate action on a VM to a Diskless node and then after migrate it back to a Diskful node, Linstor correctly marks the Diskless resource as a quorum tie breaker (i.e it does not delete it).
When executing a shutdown action on a VM or a CT which is located on a Diskless node, then it deletes its Diskless resource from that node (which makes sense). All in all this looks like an expected rather than a strange behaviour.

So to summarise, in order for the auto-tie-breaker resource to be created in Proxmox, there are 2 options:

  1. For Qemu VMs: Live Migrate the VM from a Diskful node to a Diskless node and then after migrate it back to the Diskful node.

  2. For LXC CTs: Manually create a Diskless resource on a node (i.e linstor r c -d pve4 vm-110-disk-1) and then after delete it (i.e linstor r d pve4 vm-110-disk-1). In this way "The given resource will not be deleted but will be taken over as a linstor managed tiebreaker resource."

@rck
Copy link
Member

rck commented Jul 21, 2021

hm, yes, without thinking it through completely, such things could happen. The "when to create a diskless and when to remove it" logic is currently in the plugin: if there is none whatsoever, create a diskless one. If moved away and it is diskless, just delete it. That in combination with auto-tiebreaker might have funny consequences.

LINSTOR now can handle that on its own, there is a "make available" API that does the right thing and handles more complicated storage situations. The plugin has not switched to that API. So let's keep this open as tracking issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants