-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add troubleshooting of node maintenance mode #619
base: main
Are you sure you want to change the base?
Conversation
|
3c2d88d
to
7487be3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from the Longhorn side!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial review done
3515efd
to
642b3ab
Compare
@jillian-maroket Thanks. I have updated the comments, and also change the |
@w13915984028 apart from the minor rephrasing the doc looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also a little confused about what is correct. In some places we use 'Node Maintenance'. Shouldn't the term 'Maintenance Mode' also be used here?
@jillian-maroket As already mentioned in individual comments, we should clarify whether words such as “StorageClass” or “Maintenance Mode” are ALWAYS specified in inline code.
| Harvester version | Embedded Longhorn version | Default value | | ||
| --- | --- | --- | | ||
| v1.3.1 | v1.6.0 | `true` | | ||
| v1.4.0 | v1.7.0 | `false` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to mention 1.4.0 in the versioned documentation of 1.3.x? It makes sense for 1.4 (dev)
, but 1.3 (stable)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong option to keep/remove it. This is a later update to v1.3 document, now we have a detailed plan about v1.4, add the section to v1.4 seems no harm.
But this more depends on the suggestion from @jillian-maroket . thanks.
@votdev Many of the text blocks that you flagged were recently added. I have yet to figure out which parts are new and which suggestions were applied/rejected. That's why the wording is so inconsistent. StorageClass is a regular K8s concept so I never format it as inline code. We do not have a convention for official feature/function names yet. Using title case is enough in most situations. We can bold or italicize them for emphasis. Backticks are usually reserved for commands, file paths, and whatever can be classified as code phrases/blocks. |
Signed-off-by: Jian Wang <[email protected]>
Thank for the review from @votdev and the explanation from @jillian-maroket . I just updated the document per Volker's suggstion, please take a new look. Btw, we have:
It needs a bit brain burning to select a proper matching word in different context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@w13915984028 You misunderstood. @votdev made wrong assumptions about the markup that we use, and you were NOT supposed to implement his suggestions.
Apply the suggestions in this review so we can finally merge the updates. If you continue to make changes that introduce language issues, I will not approve this PR.
cc: @bk201
docs/advanced/storageclass.md
Outdated
@@ -44,6 +44,12 @@ The number of replicas created for each volume in Longhorn. Defaults to `3`. | |||
|
|||
![](/img/v1.2/storageclass/create_storageclasses_replicas.png) | |||
|
|||
:::info important | |||
|
|||
When the value is `1`, the created volume from this StorageClass has only one replica, it may block the [Node Maintenance](../host/host.md#node-maintenance), check the section [Single-Replica Volumes](../troubleshooting/host.md#single-replica-volumes) and set a proper global option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the value is `1`, the created volume from this StorageClass has only one replica, it may block the [Node Maintenance](../host/host.md#node-maintenance), check the section [Single-Replica Volumes](../troubleshooting/host.md#single-replica-volumes) and set a proper global option. | |
Configuring Longhorn to create only one replica for each volume (**Number of Replicas**: `1`) may cause [node maintenance issues](../troubleshooting/host.md#single-replica-volumes). **Number of Replicas** is a global setting, so specify a value that makes sense for your implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number of Replicas is a global setting
==
a global setting
is not accurate. It only affects the volumes created from this StorageClass
.
I guess your comment has a wrong assumption.
User can create many StorageClasses
, and when creating Volume
he can/may further select which StorageClass
to base on, and the default
StorgeClass will be used by default. In this context, only those StorageClasses
which have replica count 1
matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first part of your draft mentions a replica count value of 1. The last part tells the user to "set a proper global option".
So the global option that you mentioned was the StorageClass itself and not the replica count? You want users to specify which StorageClass will be used to create volumes by default? If this is correct, include these details in the note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems some concepts are mixed.
(1) StorageClass is like a model, user can create as many as he wants;
(2) When creating volume, user may refer any one of his StorageClasses; the parameters from SC are converted to paramters of volume
(3) When LH sees the volumes, it will do many tasks, including create the expected number of replicas
for this volume.
(4) Replicas are scheduled to a group of target node(s)
(5) When node maintenance
happens, the replica
on this node may affect if this node can be successfully drained.
(6) To give user more flexibility, LH has a global option Node Drain Policy to control node drain
strategy.
(1) (2) touch StorageClass
, (3)~(6) touch LH internal mechanism; they have the connection via a global setting Node Drain Policy
.
Back to here, suppose a reader has some k8s backgrounds, he reads this block of text, and click the link to troubleshooting section, he will know what we try to descibe.
docs/advanced/storageclass.md
Outdated
@@ -44,6 +44,12 @@ The number of replicas created for each volume in Longhorn. Defaults to `3`. | |||
|
|||
![](/img/v1.2/storageclass/create_storageclasses_replicas.png) | |||
|
|||
:::info important | |||
|
|||
When the value is `1`, the created volume from this `StorageClass` has only one replica, it may block the [Node Maintenance](../host/host.md#node-maintenance), check the section [Single-Replica Volumes](../troubleshooting/host.md#single-replica-volumes) and set a proper global option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the value is `1`, the created volume from this `StorageClass` has only one replica, it may block the [Node Maintenance](../host/host.md#node-maintenance), check the section [Single-Replica Volumes](../troubleshooting/host.md#single-replica-volumes) and set a proper global option. | |
Configuring Longhorn to create only one replica for each volume (**Number of Replicas**: **1**) may prevent you from enabling [Maintenance Mode](../host/host.md#maintenance-mode). The replica count is a global setting, so specify a value that makes sense for your implementation. For troubleshooting information, see [Single-Replica Volumes](../troubleshooting/host.md#single-replica-volumes). |
I will follow @jillian-maroket 's suggestions to rework the document, thanks. @votdev When you have different ideas about the markdown format, please negotiate with @jillian-maroket and leave suggestions on the review, instead of |
Signed-off-by: Jian Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@w13915984028 Please implement the pending changes so we can finally merge. If I find any more wording issues, I'll just fix them later.
cc: @bk201
For the remaining comments, please refer this reply: #619 (comment). As many different concepts/objects are involved in this document PR, it is normal that it is a bit hard to understand, please still read more of the existing documents of those different parts to get a full picture. thanks. |
@w13915984028 I have reviewed this PR several times. In my last comment, I asked you to implement whatever changes are necessary to make the document correct. I even said that I'm willing to fix wording issues later just so we can finally merge this PR. Instead of just fixing the incorrect parts, you chose to explain at length what I misunderstood. @bk201 I have reached my limit. As you know, his drafts are always difficult to understand and review. The PR is massive and it changed a lot over time because of reviewer feedback and because he decided to add more content in different places. Please evaluate if what we currently have is good enough to merge. I am supposed to be working on the doc conversion and Longhorn v1.7.1 deliverables this week. |
Signed-off-by: Jian Wang <[email protected]>
A third commit is added to remove the description of |
issue:
harvester/harvester#6264
harvester/harvester#6266
related PR:
harvester/harvester-installer#788