-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use lucene directory abstraction to compute file size #649
Conversation
14eda48
to
530cb6a
Compare
This comment was marked as off-topic.
This comment was marked as off-topic.
530cb6a
to
0ec848a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good - we should update the PR description with content about motivation on this PR.
Let's go ahead and add a test for this as well - if we make the calculateDirectorySize()
static we only need to pass in a FSDirectory, which we can use Mockito to stub out the two methods in question (listAll
, fileLength
)
This should help quite a bit on getting more consistent chunk sizes - it looks like by the time we go to close a chunk the size can vary quite a bit. This is the size as seen by the cache node when downloading them:
|
0ec848a
to
26007d3
Compare
Summary
When we use the regular file abstractions to calculate the size of the lucene index, lucene might be merging segments. This means the function has a race condition between the time it fetches the list and the time the calculation is done.
This is harmless in practice as we just try again and keep the old value. But it leads to noisy logs. When we use Lucene's FS abstraction it has a list of pendingDeletes which helps avoid this race condition.
Another advantage of this PR is during the happy case i.e when the java file abstraction computes the size it can take the pending deletes into account. So the disk size is smaller than what we want to. This will address that problem as well as noted by Bryan