Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: VTOrc runs an unnecessary PRS #12018

Closed
GuptaManan100 opened this issue Dec 28, 2022 · 3 comments · Fixed by #12019
Closed

Bug Report: VTOrc runs an unnecessary PRS #12018

GuptaManan100 opened this issue Dec 28, 2022 · 3 comments · Fixed by #12019
Labels
Component: VTorc Vitess Orchestrator integration Type: Bug

Comments

@GuptaManan100
Copy link
Member

Overview of the Issue

VTOrc ran a PRS when there was already a primary which was successfully promoted.

The following steps happended -

  1. Tabelt A was promoted to the primary but getting its MySQL information fails. This causes the database_instance table for this instacne to be empty.
  2. VTOrc runs a PRS trying to promote a different primary.

Reproduction Steps

  1. Start a new cluster and promote a primary via vtcltd.
  2. Repeat multiple times until VTOrc runs an unnecessary PRS

Binary Version

main

Operating System and Environment details

main

Log Fragments

No response

@GuptaManan100 GuptaManan100 added Type: Bug Component: VTorc Vitess Orchestrator integration labels Dec 28, 2022
@GuptaManan100
Copy link
Member Author

This issue was introduced in #9409 when the LEFT JOIN with database_instance was converted to JOIN. There is discussion as to why this was done here - #9409 (comment).

This scenario however was overlooked. Having a JOIN there means that the tablet which is the primary isn't returned from the query, which leads VTOrc to think that there is no primary in the given shard.

@GuptaManan100
Copy link
Member Author

The fix is to convert the JOIN back to a LEFT JOIN. We will however have to fix the problem specified in #9409 (comment) in a different way then.

@GuptaManan100
Copy link
Member Author

Before the actual bug fix, a cleanup for the cluster_name is required. It is currently stored in the database_instance which means that just as we switch to a LEFT JOIN we'll have the problem of the cluster name being empty. Instead we should be using keyspace and shard. This refactor has been accomplished in #12012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VTorc Vitess Orchestrator integration Type: Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant