Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] Make gflag update safe in production #5892

Open
bmatican opened this issue Oct 1, 2020 · 0 comments
Open

[docdb] Make gflag update safe in production #5892

bmatican opened this issue Oct 1, 2020 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@bmatican
Copy link
Contributor

bmatican commented Oct 1, 2020

Jira Link: DB-2070
Currently, we have a yb-ts-cli set_flag command that we use to update gflags in production, without restarting servers. Some gflags are runtime safe that way, allowing us to dynamically enable/disable certain features, or tweak various parameters.

However, while set_flag uses a SetAtomic primitive for updating the gflags, the rest of the code does not use the corresponding concurrency safe GetAtomic, thus leaving us with race conditions in prod, when updating flags this way!

We should fix that, either by patching the gflag library itself somehow, or at least introducing a macro to be used across the codebase, when reading gflags, which uses GetAtomic under the hood.

cc @mbautin @spolitov @robertsami

Note: this is the proper fix over the short term, test-only #5800

@bmatican bmatican added the area/docdb YugabyteDB core features label Oct 1, 2020
@bmatican bmatican self-assigned this Oct 1, 2020
bmatican added a commit that referenced this issue Oct 1, 2020
Summary:
Since we should never write to gflags from the main code, we might as well ignore all test
race conditions on gflags.

Currently, the main DB code should not be updating gflags, so all race conditions we find in testing should be strictly due to tests writing, while the server side code does reads. We should be safe to ignore those!

There are two caveats to this:
- Despite test-only, race conditions on updating `string` gflags can still cause issues, and technically, a TSAN report on those races would be easier to figure out, than say, a SEGV
- For production code, while the DB itself does not normally update gflags, we do have a `yb-ts-cli set_flag` command, through which we can runtime update flags, in memory. We frequently use that, for modifying various settings, without restarting servers. While the RPC itself uses `SetAtomic`, the rest of the code does not, so that use case can generate are legitimate races! Have filed #5892 for fixing that in the future.

Test Plan:
jenkins and also tried out on a test that would give races at a 1/100 rate:
`ybd tsan --cxx-test client_backup-txn-test --gtest_filter BackupTxnTest.DeleteTable -n 500`

Reviewers: sergei, timur, mikhail

Reviewed By: mikhail

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D9454
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 9, 2022
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

4 participants