Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add onDiskSize to core status API #223

Merged
merged 8 commits into from
Oct 1, 2024

Conversation

nginthfs
Copy link
Collaborator

rebase of #212

Copy link
Collaborator

@magibney magibney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments about semantics -- only 1 case I think that might actually have practical implications for our deployment.

Comment on lines +175 to +178
if (directory instanceof OnDiskSizeDirectory) {
return onDiskSizeOfDirectory(directory);
}
return sizeOfDirectory(directory);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: at the Directory level it totally makes sense to fallback to sizeOfDirectory() here. This is a bit counterintuitive considering that falling back was the source of the "drift" bug, but it's different at the Directory level:

Any Directory impl that makes no distinction between "onDisk" and "logical" does so because it's writing everything to disk. It also makes sense to continue to refer to this as "onDisk" as opposed to "compressed" or whatever, because compression is orthogonal. Also, the fact that the access copy in our case is also technically "on disk" -- albeit an ephemeral local ssd -- is an implementation detail. For semantic purposes the access copy should be considered a cache.

Comment on lines 44 to 51
NamedList<Object> nodeStatus = new SimpleOrderedMap<>();
if (it.handler.coreContainer != null) {
Path solrDataHome = it.handler.coreContainer.getCoreRootDirectory();
File dataDir = solrDataHome.toFile();
nodeStatus.add("indexDiskSizeTotal", dataDir.getTotalSpace());
nodeStatus.add("indexDiskSizeAvailable", dataDir.getUsableSpace());
}
it.rsp.add("nodeStatus", nodeStatus);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I'd be fine leaving this, if you like. Either way.

This might even be upstreamable. Serves other directory impls, and gives data even when indexInfo=false (more efficient).


private static class Sizes implements Accountable {
private static final long RAM_BYTES_USED =
RamUsageEstimator.shallowSizeOfInstance(SizeAccountingIndexOutput.class);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should instead be RamUsageEstimator.shallowSizeOfInstance(SizeAccountingIndexOutput.Sizes.class), no?

Comment on lines 130 to 131
// backing IndexOutput does not allow us to get onDiskSize
return 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gah -- this doesn't matter for us practically, but I think we may need a conditional here. We should return 0 iff backing IndexOutput instanceof SizeReportingIndexOutput AND backing directory instanceof OnDiskSizeDirectory. Otherwise, in keeping with how non-OnDiskSizeDirectory onDiskSize is handled, we should indeed fall back to returning sizeOf (as before).

return ((DirectoryFactory.OnDiskSizeDirectory) in).onDiskFileLength(name);
} else {
// directory does not implement onDiskSize
return 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too I suspect that for consistency we should actually fallback to return simple sizeOf().

Comment on lines 375 to 378
long onDiskSize = this.getFilePointer();
if (backing instanceof CompressingDirectory.SizeReportingIndexOutput) {
onDiskSize = ((CompressingDirectory.SizeReportingIndexOutput) backing).getBytesWritten();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same issue as the accounting drift, but it only happens on init, so it's harder to notice. I think we want to do the same here as we compute onDiskSize:

if (backing instanceof SizeReportingIndexOutput)
  onDiskSize = getBytesWritten()
else if (backingDirectory instanceof OnDiskSizeDirectory)
  onDiskSize = 0
else
  onDiskSize = getFilePointer()

Comment on lines 390 to 398
long onDiskSize = 0;
if (backing instanceof CompressingDirectory.SizeReportingIndexOutput) {
long finalBytesWritten = getBytesWritten(backing);
onDiskSize = finalBytesWritten;
// logical size should already be set through writeByte(s), but we need to finalize the
// on-disk size here
sizeWriter.apply(0, finalBytesWritten - lastBytesWritten, name);
}
fileSizeMap.put(name, new Sizes(backing.getFilePointer(), onDiskSize));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too I think we want a 3-way condition, as above

if (backing instanceof SizeReportingIndexOutput)
  onDiskSize = getBytesWritten()
else if (backingDirectory instanceof OnDiskSizeDirectory)
  onDiskSize = 0
else
  onDiskSize = getFilePointer()

Comment on lines +463 to +467
if (secondary instanceof CompressingDirectory.SizeReportingIndexOutput) {
return ((CompressingDirectory.SizeReportingIndexOutput) secondary).getBytesWritten();
} else {
return getFilePointer();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this instanceof fallback is ok, as far as semantics of getBytesWritten()? wdyt?

@nginthfs
Copy link
Collaborator Author

@magibney I've updated this PR with all of your suggestions. It passes unit tests and my local integration test. Mind taking another look?

Copy link
Collaborator

@magibney magibney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nginthfs nginthfs merged commit b8f0b76 into fs/branch_9x Oct 1, 2024
7 checks passed
nginthfs added a commit that referenced this pull request Oct 1, 2024
* add onDiskSize to core status API

* tidy

* add back indexDiskSize

* fix typo in Sizes RamUsageEstimator

* add some needed conditionals to onDiskSize calcs

* tidy

* add backingDirectory to SizeAccountingIndexInput

* tidy
nginthfs added a commit that referenced this pull request Oct 1, 2024
* add onDiskSize to core status API (#223)

* add onDiskSize to core status API

* tidy

* add back indexDiskSize

* fix typo in Sizes RamUsageEstimator

* add some needed conditionals to onDiskSize calcs

* tidy

* add backingDirectory to SizeAccountingIndexInput

* tidy

* tidy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants