Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(test): fix flakiness of TestPersistLFDiscardStats #1963

Merged
merged 7 commits into from
Jun 12, 2023

Conversation

billprovince
Copy link
Contributor

Problem

Fixes DGRAPHCORE-234

The purpose of TestPersistLFDiscardStats is to make sure that if you make changes to the file such that we need to maintain the discardStats in the value log, and if you close the DB, then when you reopen the DB, you remember the
same status.

Note that the DiscardStats are updated when we perform compaction, and in our test cases we have 4 compactors
concurrently running (with ids 0, 1, 2, and 3). Most of the time, we were fine when we captured a single compaction
cycle -- and then closed the DB and then reopened. However, there was a race condition wherein we waited for some
arbitrary amount of time, then captured the current discardStats, then closed the DB. The problem is that between
the time that we capture the discardStatus and close the DB, another compaction cycle may have started!

Hence, every once in a while, we would find that the saved copy of the discard stats would not match what we
picked up later when we reopened the file.

Solution

As noted in the problem statement, we ended up waiting an "arbitrary amount of time" instead of waiting for specific
events and then reacting to those events. Specifically, we wanted to wait until at least "some stats" had been generated,
and then we waited for compaction to complete. Unfortunately, there did not exist a clear way (previously) to capture the
event of "some stats having been generated", nor was there a way to capture the discardStats upon closure of the
database.

The solution then was to add in two things: first, a test channel in the database where we can log messages to this
channel, but only when the channel has been specified. Second, we add in a means (via options.go) of specifying
an "onCloseDiscardCapture" map. This map will be populated (assuming it was initialized and is not nil) when we
close the db and specifically when we close the valueLog.

We no longer rely on time.Sleep, but instead rely on specific events.

options.go Outdated Show resolved Hide resolved
options.go Outdated
// opt.onCloseDiscardCapture so that when we Close the DB, we make sure to
// copy the contents of the discardStats to the map c.
// This is strictly for testing purposes.
func (opt Options) withOnCloseDiscardCapture(c map[uint64]uint64) Options {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same response

db.go Outdated Show resolved Hide resolved
db.go Outdated Show resolved Hide resolved
options.go Outdated Show resolved Hide resolved
Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work Bill. This looks a lot cleaner to me and a good balance of what we can achieve for testing. My major concerns are two:

  1. I am thinking we don't need to do this dance of having options, with functions, setters and then the actual field to hold test extensions. We can skip all of that and let the test directly set the field that it needs.
  2. Instead of using strings for comparison, would it be better to use an enum instead?

value.go Outdated Show resolved Hide resolved
test_only.go Outdated
@@ -0,0 +1,97 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could rename this file to test_extensions.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

db.go Outdated Show resolved Hide resolved
test_only.go Outdated Show resolved Hide resolved
test_util.go Outdated Show resolved Hide resolved
@billprovince
Copy link
Contributor Author

Thanks for the work Bill. This looks a lot cleaner to me and a good balance of what we can achieve for testing. My major concerns are two:

  1. I am thinking we don't need to do this dance of having options, with functions, setters and then the actual field to hold test extensions. We can skip all of that and let the test directly set the field that it needs.
  2. Instead of using strings for comparison, would it be better to use an enum instead?

Thanks, Aman.
Due to the need to capture messages that occur during DB* Open(opts), we need to specify the syncChan on the options. Waiting unitl after Open returns will have us lose out on the opportunity. However, we could skip the onCloseDiscardCapture option, as this can be updated after Open safely.

With respect to strings vs. enums, I still prefer strings, but I made them string constants instead of just hard-coded strings. This way, the compiler can help us determine if there was an error or not. (Which is one of the main advantages of using enums).

Enums would definitely confer an advantage: however, the advantage of strings is that it is easier to log them and generate messages with them! I actually used the channel to get debug information during testing, which you may be aware is a bit difficult to obtain using standard go test. Logged messages don't get logged unless they are hooked up to a testing.T instance. Of course, I removed my ugly debugging messages, but having the ability to add them back when desired is clearly helpful.

test_extensions.go Outdated Show resolved Hide resolved
// withSyncChan returns a new Options value with syncChan set to the given value.
// If not specified, any operations that would otherwise occur with the syncChan
// will be silently skipped.
func (opt Options) withSyncChan(ch chan string) Options {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this func

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

test_extensions.go Outdated Show resolved Hide resolved
util_test.go Outdated
@@ -0,0 +1,58 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this code to db_test.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mangalaman93 mangalaman93 changed the title fix(Test): Fixes flakiness of TestPersistLFDiscardStats fix(test): fix flakiness of TestPersistLFDiscardStats Jun 12, 2023
Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all the comments, looks much better now.

@mangalaman93 mangalaman93 merged commit 3e4a25d into main Jun 12, 2023
@mangalaman93 mangalaman93 deleted the FixDiscardStatsTest branch June 12, 2023 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants