-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set master scores also on the unavailable members #56
Conversation
Hi @YanChii, This already has been raised and discussed by @blogh in a previous pull request. Unfortunately, we cannot use I was contemplating the idea of using the notify variables to record the cluster nodes in a cluster attribute...i have to dig a bit in Pacemaker code first about how it finds node given in notify vars...no other idea if its a good idea or not yet. |
Hi @ioguix, thank you for you response. I see the problem now. After reading the thread with your question, I'm starting to think that the list of nodes as a cfg parameter will be the best option. It will solve also another issue that I'm encountering - the observer nodes in the cluster. I'm building 3-node PAF clusters with the 3rd node as an observer. Observer is a light node without any datadir (and ideally without any postgres install) with only one purpose - to be a quorum tie breaker. Current PAF implementation expects every node to be a full database member and therefore it fails probes on this node and also spams logs with the warning that the node is not connected. So far I've patched it directly in the RA code (to ignore this node). I was considering the implementation of observer_nodes parameter. But the parameter of database_nodes (or something like that) will solve both problems at once. What do you think about that? Jan |
Another source of the node list (for both problems) can be the list of (positive) location constraints. But the way of getting this info can be equally problematic as the |
Hi @YanChii,
There's already two ways to solve your issue:
As described above, So we end up with three different solutions to explore:
|
Hi @ioguix, thank you for the explanation. It is true that banning the resource from some node or disabling pacemaker will prevent the cluster from running probes. But it will not prevent the pgslms from flooding system logs with Also, if you want to run two independent postgres HA resources using one pacemaker (e.g: 3 pacemaker nodes with one PG resource on nodes 1+3 and second PG resource on nodes 2+3), you will get the above "not connected" message even if everything is working correctly. To be honest, I really hate the idea of What is your opinion on these things? Jan |
I didn't like the idea of a I don't like the idea of an optional parameter because security of the data souldn't be an option. The WARNING can be flood controlled by moving the log message in the node_score test at the end of _check_locations like so (at least it's what I did when I tested the
I am not sure I follow your exemple of two clusters on one pacemaker (although I never tried it, but will asap). I think that since the configuration would be per database we would have one Benoit EDIT: changed "per cluster" with "per database". my sentence didn't make sense. |
Hi @blogh, two-DB cluster setup is used in some cases because you do not need two standby nodes (node3 is standby for both DBs). To explain more, the setup is like this: Node_list parameter (as every parameter) is set per-HA-resource: If node_list is implemented as mandatory, there will be no need to alter the WARNING. Jan |
Ok, I understand your points, we are on the same page. my bad. Benoit |
Trying to resume this discussion, it seems Specifically, auto-detecting nodes using notify vars is not enough to help PAF defining what nodes host the PostgreSQL instances taking part of the HA cluster. Defining what PostgreSQL instances are taking part in the cluster is needed in the scenario described by @YanChii above. The only idea that comes in mind to answer this problem would be to compare the database system identifier that is supposed to be common to all instances in a cluster. Eg:
So to compare both solutions:
I still feel the second solution is more elegant, but it requires much more dev effort. I would be OK to go with the first solution, but I'm still not clear if it should be mandatory or not. Thoughts? |
Hi guys ! Regarding solution (1). I feel like it should be mandatory, the score is not reliable as is (in this corner case). We have to fix it one way or another. Regarding solution (2). Less configuration is always good :)
If we remove a node from the cluster, do we leave it on the user to delete the list (and wait for one monitor to redetect all the nodes) ? It feels hacky :/ Benoit [EDIT: moderated my not reliable comment] |
humm, it doesn't work if we have connections in pg_stat_replication for other reasons than the standby replication (pg_receive_xlog etc..). We could do the registration:
It's late... beer time. Benoit |
The main problem with solution (2) is the administrative node removal. If it requires some admin commands to inform the cluster, it sounds easier to edit the Now, going back to the original subject of this issue, the very first problem is the master score not being removed (or set to -1000) when a master (or standby) crash. To be exhaustive we have another solution that I believe should be rejected. We discussed solution (3) yesterday night with @blogh. It appeared to me this master score problem could probably be handled during the post-stop notification from one existing node. This sounds quite appealing, but in fact, it would break the current Pacemaker transition because of the attribute update and we don't want that. Postponing this master score update to the next monitor is trickier than the So I guess we have a consensus here to say that we don't like the I will work with @blogh's patch to move forward with this. Regards, |
First of all, since I messed up my repo while trying to do a merge (I am very bad at git ;/). I did a reset and had to use a file I saved to redo the branch (hence the date of the commits). I noticed two problems already:
Benoit. |
I'm closing this PR as this solution will not be merged, but we can keep discussing here if needed. |
I was playing with @blogh's patch, doing some tests. I was able to defeat the whole purpose of this new parameter during the second test: when removing a node from the We would have to add some more code complexity to track what node was removed from Ok, this is a corner case during an administrative task, we could document this, etc. But in fact, I feel like The only clean and safe solution would be to set the master_score right when the resource is gone, during the pre or post stop notification. But as explained it would break the Pacemaker transition, forbidding us to detect recovery or move transition. Solution to your problems exists:
The last simple solution that stroke me today was to use private node attribute (outside of the CIB) to set the master score. I will study this tomorrow. Cheers, |
...and we can not set a master score as a private attribute. |
Hi.
When the master crashes and is kicked out of the cluster, his master score remains set on 1001. This can be dangerous if you shut down whole cluster and then bring online the old master first. It is most dangerous with the 2-node cluster.
My proposal is to set master scores to -1000 also to unavailable cluster members, because they obviously cannot become masters until resynced.
Simply said, I've replaced
crm_node --partition
by
crm_node --nodes
here https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L548
This was also mentioned in #26.
Jan