Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: fix nil pointer in vtgate on topo connection error #13010

Merged
merged 3 commits into from
May 10, 2023

Conversation

olyazavr
Copy link
Contributor

@olyazavr olyazavr commented May 2, 2023

Description

We see vtgate container restarts whenever there's a problem connecting to zk:

2023-01-18T22:12:28.921215833Z E0118 22:12:28.921183       6 keyspace_events.go:352] error while watching keyspace "SyncStore": zk: could not connect to a server
2023-01-18T22:12:28.921252801Z E0118 22:12:28.921232       6 keyspace_events.go:352] error while watching keyspace "SyncStore": zk: could not connect to a server
2023-01-18T22:12:28.923405294Z panic: runtime error: invalid memory address or nil pointer dereference
2023-01-18T22:12:28.923409629Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xe4d153]
2023-01-18T22:12:28.923411603Z 
2023-01-18T22:12:28.923413422Z goroutine 113 [running]:
2023-01-18T22:12:28.923415237Z vitess.io/vitess/go/vt/topo/topoproto.SrvKeyspaceGetPartition(...)
2023-01-18T22:12:28.923417145Z 	vitess.io/vitess/go/vt/topo/topoproto/srvkeyspace.go:54
2023-01-18T22:12:28.923419039Z vitess.io/vitess/go/vt/discovery.(*keyspaceState).ensureConsistentLocked(0x1b9a040?)
2023-01-18T22:12:28.923421892Z 	vitess.io/vitess/go/vt/discovery/keyspace_events.go:222 +0x33
2023-01-18T22:12:28.923423694Z vitess.io/vitess/go/vt/discovery.(*keyspaceState).onHealthCheck(0xc029529d10, 0xc0295bbdb0)
2023-01-18T22:12:28.923437929Z 	vitess.io/vitess/go/vt/discovery/keyspace_events.go:328 +0x1ba
2023-01-18T22:12:28.923443219Z vitess.io/vitess/go/vt/discovery.(*KeyspaceEventWatcher).processHealthCheck(0xc0006b5fa8?, 0xc0295bbdb0)
2023-01-18T22:12:28.923445196Z 	vitess.io/vitess/go/vt/discovery/keyspace_events.go:403 +0x3c
2023-01-18T22:12:28.923450507Z vitess.io/vitess/go/vt/discovery.(*KeyspaceEventWatcher).run.func1()
2023-01-18T22:12:28.92345682Z 	vitess.io/vitess/go/vt/discovery/keyspace_events.go:194 +0x7b
2023-01-18T22:12:28.923459099Z created by vitess.io/vitess/go/vt/discovery.(*KeyspaceEventWatcher).run
2023-01-18T22:12:28.923460969Z 	vitess.io/vitess/go/vt/discovery/keyspace_events.go:183 +0xf8

This is not desirable, there should be an error, but it should not kill the whole vtgate process.

This is called from this healthcheck code, which does a nil check above, but then proceeds to this method without another nil check. If the return value of the method I have modified is nil, the healthcheck assumes the kss is in some sort of transient state and returns out, so this is safe to return nil from

The error we see above the nil pointer comes from this same file

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Deployment Notes

@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels May 2, 2023
@vitess-bot
Copy link
Contributor

vitess-bot bot commented May 2, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a test is added or modified, there should be a documentation on top of the test to explain what the expected behavior is what the test does.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

@github-actions github-actions bot added this to the v17.0.0 milestone May 2, 2023
Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look good. But, a test would be needed here.

@harshit-gangal harshit-gangal added Type: Bug Component: Query Serving and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels May 8, 2023
Signed-off-by: Olga Shestopalova <[email protected]>
@olyazavr olyazavr requested a review from frouioui as a code owner May 9, 2023 18:39
@olyazavr
Copy link
Contributor Author

olyazavr commented May 9, 2023

@harshit-gangal added test to ensure the error is not thrown if newKeyspace is nil. Also verified that without the fix, this test fails

Signed-off-by: Olga Shestopalova <[email protected]>
@frouioui frouioui merged commit 2cd5cac into vitessio:main May 10, 2023
@olyazavr olyazavr deleted the fix-nil-vtgate-zk-error branch May 15, 2023 18:33
timvaillancourt pushed a commit to slackhq/vitess that referenced this pull request May 28, 2024
…13010)

* fix nil pointer in vtgate on topo connection error

Signed-off-by: Olga Shestopalova <[email protected]>

* add test

Signed-off-by: Olga Shestopalova <[email protected]>

* fix year

Signed-off-by: Olga Shestopalova <[email protected]>

---------

Signed-off-by: Olga Shestopalova <[email protected]>
Co-authored-by: Olga Shestopalova <[email protected]>
timvaillancourt added a commit to slackhq/vitess that referenced this pull request May 28, 2024
* bugfix: fix nil pointer in vtgate on topo connection error (vitessio#13010)

* fix nil pointer in vtgate on topo connection error

Signed-off-by: Olga Shestopalova <[email protected]>

* add test

Signed-off-by: Olga Shestopalova <[email protected]>

* fix year

Signed-off-by: Olga Shestopalova <[email protected]>

---------

Signed-off-by: Olga Shestopalova <[email protected]>
Co-authored-by: Olga Shestopalova <[email protected]>

* VTGate Buffering: Use a more accurate heuristic for determining if we're doing a reshard (vitessio#13856)

Signed-off-by: Matt Lord <[email protected]>

---------

Signed-off-by: Olga Shestopalova <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Co-authored-by: Olga Shestopalova <[email protected]>
Co-authored-by: Olga Shestopalova <[email protected]>
Co-authored-by: Matt Lord <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants