-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parameter to track highest reached timeline #57
Comments
Hello @YanChii, This is something I had in mind for a while as well. However, setting attributes to the CIB during transition breaks it. That means we will loose the recovery and move detection thanks to the notify vars. We should be fairly safe with your patch though, as you set the global TL in post-promote, so after most of the job has been done. A new transition should have almost nothing to do but constraints and collocation related tasks to do I guess. Did you try your patch with recovery/move scenario? What did you found from logs? I had some other approaches in mind to achieve this goal. I'll try to work on this tomorrow. @YanChii, sorry for my last patch conflicting with #59 and the long awaiting for my answers :/ Cheers, |
About your auto-failback project, the best way to detect a failed master not being able to catchup with the new master it to compare its TL+LSN with TL history. If the old master TL+LSN is greater than the TL fork LSN, then you must use pg_rewind on it. |
Hi @ioguix,
But it doesn't affect the functioning. What do you think about it? Actually I think CIB is not the best place for this parameter, but I don't know any other location that is persistent across reboots and also persistent against node disconnect/rejoin (and thus can be queried by any node any time). Regarding the TL+LSN (you probably meant automatic rewind+join of old master to the cluster, not auto-failback): I'm not sure I understand. Before you start DB on a node, you simply check if the local TL is lower than the TL written in the CIB. You don't have access to other node's LSNs if you are not in promote phase. Pg_rewind should be run in start or pre-start phase. Jan |
As I wrote, this happen during post-promote action, so most of the work has been done yet, it shouldn't be a problem.
This is kind of a hack, but Ken Gaillot gave a solution to create private attributes using a node that will never exists in the cluster, see: https://www.mail-archive.com/[email protected]/msg03683.html I suppose you could use a non-existant node name known by all your node, where you could set this private attribute. Because this attribute will not survive a reboot, you will have to create it upon cluster startup. did you study this solution? Another way would be to use the "Storable" module to store locally on each node this information in a file. This has been used in v1.1 release if you need some code sample, see: 5741b74
OK, I didn't meant an auto-failback all the way to getting the old master back as master, but just getting it back in the cluster as a standby. You are right.
Checking if the TL of the old master is lower than the current "cluster TL" is not enough to decide if we need pg_rewind or not. As instance, during switchover, ALL other standby have a TL lower than the new master and we don't need to pg_rewind on them because before the promotion, they were all lagging or in sync with the new master. But anyway, I suppose you could always call pg_rewind whenever the master change and removing the dead code in PAF regarding the switchover logic to keep things clean and simple. I suppose pg_rewind doesn't hurt if the node to get back in replication with the new master were already lagging behind it before the TL fork. |
Hi @ioguix,
Yes, I've seen this conversation.
A few thoughts on this, correct me if I'm wrong:
Conclusion: Using nonexistent nodename seems also a bit hacky to me (and also a bit fragile) and I still consider permanently stored CRM attribute as a bit safer and easier solution. What's your opinion?
You don't need to call pg_rewind on master change. The only place to call pg_rewind is in pre-start phase of the resource on the node that has lower TL. Not in check/probe phase, not after master change. After the pre-start->pg_rewind->start sequence, you are sure that slave will catch up and you don't need to do anything else until you are about to start a resource again. |
Hi, I started a new branch on my repo to work on the TL check during failover for a safer election code. You can find a diff of my current code there: dalibo/PAF@master...ioguix:check_tl_during_election As you will notice, I don't track the highest TL existing in the cluster, I only compare the TL of remaining standbies during election to make sure the selected one to promote has the highest existing TL. It seems to me we don't need to track the highest TL in PAF to achieve this goal. Considering your point about taking care the old master can not be promoted by mistake, I'm still on the fenced to decide if it's a real threat or not. Last, you wrote:
This is not true. A clean standby can have a lower TL and will not need pg_rewind to catch up with the new master.
Yes, I know, that's how PAF works today :)
Wrong. After a switchover, all standbies have a lower TL until they catch up with the new master (either by WAL shipping or Streaming rep).
I'm not sure to understand you. But what I was trying to point out is that maybe you can call pg_rewind every time, even when everything looks clean. I guess it will not hurt if there is no rewind work to do and your code will just be simpler. Mind you, maybe pg_rewind can actually check if there is a need to rewind the local instance using a dry run? Anyway, I'll try to be on IRC this week in chatroom #clusterlabs on freenode.net if you need to discuss this. |
Hi @YanChii , Following our discussion on IRC, I started a branch in my repository to allow more than one PAF resource in the same cluster. My limited tests so far sounds good. I'll plan to merge in few days. See: dalibo/PAF@master...ioguix:rsc_name_in_attr Regards, |
Your solution to rename the variables is quite elegant and simple. |
Hi,
I have a proposal for additional data consistency measure that will help to prevent wrong promote decisions.
The proposal is to create a permanent parameter that will store the highest timeline number that was ever reached in this database cluster. The parameter is saved in post-promote phase and consulted in pre-promote. It will ensure that failed master will never be promoted.
Details:
post-promote: save the new timeline value to the crm_config database. Why crm and not private attr: crm parameter is permanent across reboots/crashes, it is node independent and is consistently reachable from any node within quorum partition. Format:
crm_attribute --lifetime forever --type crm_config --name "$name" --update "$val"
pre-promote: get the timeline value of the local database and compare it to the global highest timeline value. If the local timeline is lower than highest global, abort the promotion (set attr to abort).
Why it is needed:
I'm in half-way of implementing the global timeline check and I've opened this issue to ask if this sounds desirable to you (my aim is to integrate as many changes as possible back into your project).
Jan
The text was updated successfully, but these errors were encountered: