-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
low memory population inquiry #25
Comments
@thomasmueller do you want to answer @slimsag ? |
I am sure that @thomasmueller will chime in at some point. The short story is that it is possible to do various tradeoffs with respect to temporary memory usage. We published what we feel is the best default. We expect that it should work very well. For the time being, we decided not to push alternative constructions. |
I also need the low memory population. |
I'm sorry for the delay! Yes, memory usage can be reduced a lot, to the point where you basically only need 64 bits per key. There are two ways I know of: (A) Split the set into 16 subsets (chunks), using 4 bits of the key (or hash of the key). This idea is very old, and once you understand how it works, it is quite easy (I think)... So, instead of 1 filter, you have 16 smaller filters. You can construct them one-by-one so that you need less temporary memory (basically reduce the memory usage by a factor of 16). Disadvantage: lookup may be a tiny bit slower, because you now have 16 filters instead of just 1. Well, it's might not actually be slower - you have to test it. A second advantage: you can construct multiple smaller filters concurrently, if you have enough memory. (B) Using a special construction algorithm that preserves memory, at the expense of being slower. For the binary fuse filter, construction will be maybe half as fast than now (I don't have the exact numbers). Lookup is unchanged. I did work on a prototype implementation. Approach (A) sounds more useful. It is relatively simple, and has the added advantage of the parallel construction. Let me know if you have any question, or if you are really interested in approach (B). Regards, |
Do I understand correctly that approach A above also requires no change to the xorfilter code, it merely requires constructing 16 independent xor filters (with smaller bit size)? That's smart |
Yes, approach A requires no change. The 16 independent xor filters would have have the same "bits per entry" size as one large filter. |
Hi Thomas, |
Hello - super interesting work on the binary fuse filters. I'm enjoying them a lot. I noticed the readme says:
I'm exploring application of binary fuse filters in search engines (using them as an ngram set lookup). I previously was investigating moving the memory allocations from the population code ot instead utilize mmaped system memory as a means to reduce the physical memory requirements of populating filters - but to hear you think it may be possible to accept a slower construction time with almost no temporary memory is extremely interesting to me.
I am curious if anyone has had more in-depth thoughts about how this would be approached, or tried an implementation of this? If not I will likely try my hand at it, just figured I'd ask in case anyone already had and I might be spared a few hours :)
The text was updated successfully, but these errors were encountered: