Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TieredStorage] Add capacity() API and limit file size to 16GB #401

Merged
merged 2 commits into from
Mar 24, 2024

Conversation

yhchiang-sol
Copy link

@yhchiang-sol yhchiang-sol commented Mar 22, 2024

Problem

The TieredStorage has not yet implemented the AccountsFile::capacity()
API.

Summary of Changes

Implement capacity() API for TieredStorage and limit file size to 16GB,
same as the append-vec file.

@yhchiang-sol yhchiang-sol changed the title [TieredStorage] Add capacity() API and limit hot-storage file size to 16GB [TieredStorage] Add capacity() API and limit file size to 16GB Mar 22, 2024
Comment on lines 378 to 376
if self.is_empty() {
return MAX_HOT_STORAGE_FILE_SIZE;
}
self.len() as u64

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this impl not just?

self.len() as u64

If we have a reader, we shouldn't be able to write any more data, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I think it should be simply self.len() as u64.

If we have a reader, we shouldn't be able to write any more data, right?

Right. In that case, the capacity should be equal to len(), or should we return 0 instead?

Btw, is there a reason why capacity() is u64 but len is usize() in AccountsFile / AppendVec API?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. In that case, the capacity should be equal to len(), or should we return 0 instead?

The capacity of a container is the number of elements currently allocated. So for a file-based container like tiered storage, its capacity is the file size. Returning zero would only be correct if the file size was zero.


Btw, is there a reason why capacity() is u64 but len is usize() in AccountsFile / AppendVec API?

I'm not sure. I would imagine not a strong one.

accounts-db/src/tiered_storage.rs Show resolved Hide resolved

pub fn capacity(&self) -> u64 {
self.reader()
.map_or(MAX_TIERED_STORAGE_FILE_SIZE, |reader| reader.capacity())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this impl makes sense. We don't want to have to specify the max size constant in multiple places (as is currently). So the one in HotStorageReader::capacity() can hopefully be removed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we need MAX_TIERED_STORAGE_FILE_SIZE. Because the or case means we don't have a reader yet, meaning we don't know whether it is cold or hot yet.

But yep, under the HotStorageReader it should return MAX_HOT_STORAGE_FILE_SIZE.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know whether it is cold or hot yet.

I think this is another example of the API being inverted/backwards. The caller does have a tiered storage though, that's how they can even call capacity(). But the API does not go down into a specific hot/cold impl to ask for its capacity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is another example of the API being inverted/backwards.

My original thinking is that: the accounts-db layer manages temperature (i.e., hot / cold / warm .. etc) as it has the whole picture of active / inactive accounts and other related information like shrink, and the accounts-file layer's job is to determine the best suitable file format based on the temperature assigned by the accounts-db. (and this is the reason why it was TieredHot previously as we eventually need an enum to determine the temperature.)

So back to the main topic, if we think determining temperature is accounts-db layer's job, then we should have some temperature information here even without constructing a reader.

We can revisit this later as we currently don't have plans for cold format. For now I think MAX_TIERED_STORAGE_FILE_SIZE is good enough.

@codecov-commenter
Copy link

codecov-commenter commented Mar 22, 2024

Codecov Report

Attention: Patch coverage is 0% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 81.8%. Comparing base (e9cc9f8) to head (a960163).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master     #401     +/-   ##
=========================================
- Coverage    81.9%    81.8%   -0.1%     
=========================================
  Files         840      840             
  Lines      227913   227923     +10     
=========================================
- Hits       186672   186635     -37     
- Misses      41241    41288     +47     

Copy link

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@yhchiang-sol yhchiang-sol merged commit 6024712 into anza-xyz:master Mar 24, 2024
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants