Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhead expecations #171

Closed
terror opened this issue Feb 13, 2022 · 11 comments · Fixed by #175
Closed

Overhead expecations #171

terror opened this issue Feb 13, 2022 · 11 comments · Fixed by #175

Comments

@terror
Copy link

terror commented Feb 13, 2022

Currently working on performance benchmarks, and we noticed that when inserting 150K bytes into the database required the database size to be 1MiB. How much overhead should we be expecting in this case?

Here's the current state of the database code and the corresponding benchmark

@casey
Copy link
Contributor

casey commented Feb 13, 2022

For reference, we're inserting 1000 blocks with one transaction per block, which amounts to around 150kb.

@cberner
Copy link
Owner

cberner commented Feb 14, 2022

That seems a little high, but only like a factor of 2 too high. See here for an approximate formula: https://github.com/casey/ord/pull/20#issuecomment-1003267185

I've fixed at least one issue on master where space was getting wasted

@casey
Copy link
Contributor

casey commented Feb 14, 2022

Ah, nice, I missed that. Is that formula still correct when value sizes are large? I.e. if I insert a 1mib value, should I expect database usage to increase 3mib after insert?

@cberner
Copy link
Owner

cberner commented Feb 14, 2022

No. I think expectation would be more like 1.5x in that case. For a 1mib value it will allocate the next power of 2 page size and store your value in that. For small values it's more like 3x due to overhead

@cberner
Copy link
Owner

cberner commented Feb 14, 2022

So, when I run that benchmark it prints Blockfile is 151871 bytes, but that's only the size of the blockfile not the size of all your inserts. There are also 1000 entries in the hashes to heights/children tables. I forget how big the hashes are, but probably 64bytes? That would increase the total data inserts to more like 400kb, so 1mib of required space seems about right (maybe slightly low even)

@casey
Copy link
Contributor

casey commented Feb 14, 2022

So, when I run that benchmark it prints Blockfile is 151871 bytes, but that's only the size of the blockfile not the size of all your inserts.

Ah yes, all that other stuff 😅

I forget how big the hashes are, but probably 64bytes?

They're all sha-256 (or double sha-256, because Satoshi was paranoid) so 32 bytes.

Is it a reasonable strategy to try to allocate values in close to power-of-two sizes? I could store a bunch of blocks together, say just under 128MiB, which would reduce the space overhead at least for that part of the data.

@casey
Copy link
Contributor

casey commented Feb 14, 2022

I think this issue can be closed, although maybe documenting the expected overhead in the readme or design doc would be useful.

@cberner
Copy link
Owner

cberner commented Feb 15, 2022

I think this issue can be closed, although maybe documenting the expected overhead in the readme or design doc would be useful.

I think I'll add some stats which people can query to see the actual overhead. I'm not sure I want to document the behavior, at least not right now, since the overhead is dependent on heuristics about when to merge pages

@cberner
Copy link
Owner

cberner commented Feb 15, 2022

Is it a reasonable strategy to try to allocate values in close to power-of-two sizes? I could store a bunch of blocks together, say just under 128MiB, which would reduce the space overhead at least for that part of the data.

Doesn't really feel worth it to me, but ya if you're tight on space then I would suggest a power-of-two minus half a page (2kb). There's no way that redb's overhead will exceed half a page :)

@cberner
Copy link
Owner

cberner commented Feb 17, 2022

I added some methods to retrieve the storage stats. Looks like it checks out against the formula I gave:

    stored_leaf_bytes: 376871,
    overhead_bytes: 50922,
    fragmented_bytes: 510191

@casey
Copy link
Contributor

casey commented Feb 17, 2022

Sweet, this is awesome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants