-
-
Notifications
You must be signed in to change notification settings - Fork 267
Batch get option #491
Comments
Depending on what your keys look like, you could do a @juliangruber Any other ideas? |
If the keys are adjacent, then If there are small gaps (say you have keys If there are big gaps (you have keys |
If you can design your keys in such a way (making use of the lexicographical sort order) that they are indeed adjacent, that's the preferred way. You wouldn't need to skip anything. |
Retrieving a "bag of keys" is pretty valid and common scenario which is not totally straightforward due the to asnyc-iness of the API. Just now the API handles batches of noncontiguous writes, so it seems reasonable for it to handle batches of reads. If you want to fetch noncontiguous values, at the moment you have to carefully read the whole README just to confirm that there isnt a builtin, and then you have to work out some sort of boilerplate incantation to wire up a sequence of gets, and package up the result (slightly tricky owing to the async nature of the API). Changing the API would be a bit drastic at this stage in the 2.x.x release, but I think the lib would be a lot more accessible if there was at least a quick novice-friendly copy-paste example for "batch get" in the docs. |
@fergiemcdowall Can you share a use case? I'd try to shift the thinking to storing a bag of keys. Key design is important. If the keys are contiguous on disk, reading is too. In other words, if you need to read a lot of keys that "belong together", shouldn't they be stored together? Indexes are another option. A "batch get" is doable to implement BTW, with |
@vweevers I get why you are saying this- one of the great things about leveldb is that it handles ranges really well, and doing clever things with sorted indexes is where leveldb really shines. That said, there are plenty of situations where you have to grab a set of keys that are spread out all over an index. For example- I use Although I use the hell out of |
Good point. I've had similar use cases. There's a difference though between needing a few keys (quick brown fox) and 15k. So I'm interested in the specifics of your use case too @NirmalVatsyayan. I think, rather than a batch get, I prefer to officially expose iterators with a For the simple use cases, "batch get" can be a userland module on top of iterators. |
It's also doable in |
@vweevers I am using the cache for re-ranking of results from Support vector machine, i get N most similar items for some input vector and need to re rank them based upon some business logic (fuzzy match + noun intersection etc), so the keys could be any thing among the data set. Basically i need to cache 2-3 million objects and want to access any random set of 15 K from them. |
@dominictarr this sounds like something more up your alley. Any advice for @NirmalVatsyayan? |
I was thinking about this too. For those *-down without Iterator.prototype.seek = function(target) {
// Native implementation
if (typeof(this._iterator.seek) == "function") {
this._iterator.seek(target);
return this;
}
// Polyfill
// 1. End the internal iterator
this._iterator.end(noop);
// 2. New options
var options = Object.assign({}, this._options);
options[this._options.reverse === true ? "lte" : "gte"] = target;
// ... what about 'limit'?
// 3. Swap!
this._iterator = this._db.iterator(options);
return this;
}; |
@peakji what was your use case for seeking again? Just so we have a full picture (and another argument for exposing iterators 😉). Also do you need snapshot guarantees? Because with this proposed polyfill, |
@vweevers I'm using
If we want to get the intersection (089 and 091), we only need two iterators (a:* and b:*) and several skips:
This is much faster than naive scanning on large datasets, because we are able to skip a lot of keys (a:002 to a:070).
Ahh... I forgot the snapshot thing... 👍 for throwing error. |
A little history:
|
Hi, I'm currently try to implement a multi get with So for our use case (a random sparse list of keys), I have an implementation a bit like below: const iterator=level.iterator();
// wrap next in promise
const next = async()=>{
return new Promise((resolve, reject)=>{
iterator.next((err,key,value)=>{
// resolve, reject
// ...
});
})
}
for (let key of sortedKeys) {
iterator.seek(key);
const result = await next();
// do something
} Or should I just use Also I have a question about
Does this mean after calling the I guess for my use case, the most ideal method is using the purposed multiGet feature. |
Where is it documented like that? We should update that, because in
|
Thanks for the quick response, the doc is in https://github.com/Level/leveldown |
Potentially, yes. With leveldown it's recommended to use |
I'm actually using rocksdb in my project. And was passing |
rocksdb also has the highWaterMark option. If you're expecting gaps between the keys (within your gte/lte range) then it can significantly boost perf. |
Thanks, that'll be really useful. I've investigated a bit around the |
That doesn't ring a bell, can you link to batchNext in source code? |
Nah, sorry to border you. I've find it in a third party libaray and made a bad assumption it's calling the internal method of |
Hi,
Is there any batch get option or trick to get for multiple keys in minimum pass. We have a use case where we need to get values of 15 K keys, firing so many get requests is not a proper way probably.
Kindly assist.
The text was updated successfully, but these errors were encountered: