Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify checkpoint files are completely written out when checking if they exist #1086

Merged
merged 11 commits into from
Oct 16, 2018

Conversation

richardartoul
Copy link
Contributor

@richardartoul richardartoul commented Oct 15, 2018

The existing implementation would assume that if a checkpoint file existed on disk that everything was fine, but we found that in some cases it was possible to end up with a checkpoint file of size 0 (no digest). In that case, we want to proceed as if the checkpoint file did not exist as it was not completely written out.

@codecov
Copy link

codecov bot commented Oct 15, 2018

Codecov Report

Merging #1086 into master will increase coverage by 2.02%.
The diff coverage is 95.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1086      +/-   ##
==========================================
+ Coverage    75.3%   77.32%   +2.02%     
==========================================
  Files         569      578       +9     
  Lines       48213    48462     +249     
==========================================
+ Hits        36307    37475    +1168     
+ Misses       9640     8627    -1013     
- Partials     2266     2360      +94
Flag Coverage Δ
#aggregator 81.59% <ø> (ø) ⬆️
#collector 59.23% <ø> (ø) ⬆️
#dbnode 81.37% <95.23%> (+3.86%) ⬆️
#m3em 73.21% <ø> (ø) ⬆️
#m3ninx 75.33% <ø> (+4.1%) ⬆️
#m3nsch 51.19% <ø> (ø) ⬆️
#msg 75.11% <ø> (+0.12%) ⬆️
#query 63.67% <ø> (-1.65%) ⬇️
#x 75.1% <ø> (+5.74%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2eb71f7...b47f7b0. Read the comment docs.

// CompleteCheckpointFileExists returns whether a checkpoint file exists, and if so,
// is it complete.
func CompleteCheckpointFileExists(filePath string) (bool, error) {
f, err := os.Stat(filePath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this have the same strings.Contains check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would the check be? This one is specifically written to be used for checkpoint files, and if its not a checkpoint file you'll probably get an error anyways because its unlikely the file you're checking is exactly 4 bytes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think prateek just means, should you check that the filepath they're checking for definitely is a checkpoint file (i.e. make sure that it passes the checkpoint suffix)? I think that makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I guess I just wasn't worried about it because this function is harder to misuse but I'll add it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

start := time.Now()
shardDir := ShardDataDirPath(dir, testNs1ID, shard)
err := os.MkdirAll(shardDir, defaultNewDirectoryMode)
var (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind making a new test which only tests the checkpointFileSizeBytes condition we're worried about and adding a little blurb about why it exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test and put a comment in the function itself

@@ -40,6 +40,10 @@ import (
xtime "github.com/m3db/m3x/time"
)

const (
checkpointFileSizeBytes = 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe have this and digest/digestLen be exported types and ensure they're the same in a test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

func TestCheckpointFileSizeBytesSize(t *testing.T) {
// These values need to match so that the logic for determining whether
// a checkpoint file is complete or not remains correct.
require.Equal(t, digest.DigestLenBytes, CheckpointFileSizeBytes)
Copy link
Collaborator

@robskillington robskillington Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, why don't you just make CheckpointFileSizeBytes = digest.DigestLenBytes? I actually don't mind either way though, whatever works.

Copy link
Contributor Author

@richardartoul richardartoul Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I discussed this with Prateek and we like the idea that the tests would break if you changed one and not the other just to make the person think about the change more

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the comment about strings.Contains which I think would be good if you wanted to add it (to make sure callers are calling this for a checkpoint file)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants