-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix when we switch synchronous_standby_names to '*'. #488
Conversation
Because we assign both SECONDARY and PRIMARY in the same monitor ProceedGroupState call, we might end up with the primary fetching its new synchronous_standby_names from the monitor before when the secondary has made it to its new state. With the previous coding, the primary would then retrieve '', meaning sync replication is disabled, when what's needed is '*'. We add some needed test coverage from the situation and fix it in the monitor by returning '*' as soon as the other node is assigned SECONDARY rather than only after it has reported the state.
e29295d
to
7d3960d
Compare
With this PR branch, I still get intermittent failures as in #481:
I'm not sure that the work here is related, because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, and I verified that the race condition is gone. (And, wow it was almost always setting the sync standbys to `` before the patch)
With this PR branch, I still get intermittent failures as in #481:
As we talked in a private chat, we realized that we might need to find the reason for #481 separately as that's multiple standbys are not going through this code-path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed through chat, some test ordering should still be changed. Other than that this looks good to me.
Because we assign both SECONDARY and PRIMARY in the same monitor
ProceedGroupState call, we might end up with the primary fetching its new
synchronous_standby_names from the monitor before when the secondary has
made it to its new state.
With the previous coding, the primary would then retrieve '', meaning sync
replication is disabled, when what's needed is '*'.
We add some needed test coverage from the situation and fix it in the
monitor by returning '*' as soon as the other node is assigned SECONDARY
rather than only after it has reported the state.