Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added BITPOS command to return offsets of set bits in a key. #1295

Open
wants to merge 1 commit into
base: unstable
Choose a base branch
from

Conversation

jonahharris
Copy link
Contributor

This pull request adds a command, BITPOS key [start] [end] [limit], which returns a bulk reply containing the offsets of set bits in the given key.

@dim
Copy link

dim commented Oct 1, 2013

Hey,

I'm not 100% sure about this. I also attempted the standard approach of replylen = addDeferredMultiBulkLength(c); first but then switched to using a 'helper array', simply for performance.

Here's a little integration benchmark I've written: https://gist.github.com/dim/6775254

# my patch
--> Benchmarking sparse: 10 bits set out of 10000
    Completed in 0.043s returning 10
--> Benchmarking medium: 941 bits set out of 10000
    Completed in 1.197s returning 941
--> Benchmarking dense: 6270 bits set out of 10000
    Completed in 11.614s returning 6270

# your patch
--> Benchmarking sparse: 10 bits set out of 10000
    Completed in 0.069s returning 10
--> Benchmarking medium: 941 bits set out of 10000
    Completed in 1.967s returning 941
--> Benchmarking dense: 6270 bits set out of 10000
    Completed in 12.928s returning 6270

@jonahharris
Copy link
Contributor Author

Thank you for your benchmark. Because it requires an excess amount of memory to store the bit positions in an intermediate array, your patch only works for small bitsets, which it appears you're testing with. Our bitsets range from 10 to 100 million bits, which is where we encountered crashes caused by your patch. For example, set bit 1048576 and give it a try. You'll get:

Program received signal SIGSEGV, Segmentation fault.
bitposCommand (c=0x7fff7dcc4000) at bitops.c:480
480                 res[bitcount++] = start*8 + pos;
(gdb) bt
#0  bitposCommand (c=0x7fff7dcc4000) at bitops.c:480
#1  0x00000000004198af in call (c=0x7fff7dcc4000, flags=7) at redis.c:1675
#2  0x000000000041b7ed in processCommand (c=0x7fff7dcc4000) at redis.c:1850
#3  0x0000000000423d9f in processInputBuffer (c=0x7fff7dcc4000) at networking.c:1017
#4  0x0000000000423eb3 in readQueryFromClient (el=<value optimized out>, fd=33, privdata=0x7fff7dcc4000, mask=<value optimized out>) at networking.c:1080
#5  0x00000000004157f5 in aeProcessEvents (eventLoop=0x7ffff686b150, flags=3) at ae.c:382
#6  0x0000000000415acb in aeMain (eventLoop=0x7ffff686b150) at ae.c:425
#7  0x00000000004148f3 in main (argc=<value optimized out>, argv=<value optimized out>) at redis.c:2798
(gdb) quit

Upon inspection, this is because res is 0x0. Your patch assumes the allocation will be successful, which it is on your smaller bitsets. But since the patch isn't checking the allocation, when it fails to allocate the array, it crashes when attempting to store the first set bit.

While you should always verify an allocation was successful, the method you chose to use to dynamically allocate the array was inappropriate. This is because C99 defines variable length arrays to be deallocated automatically when the array goes out of scope. To reduce the variability in performance and simplify implementation, most compilers implement this by allocating VLAs from the stack rather than from the heap, which means the size of your array is limited to the maximum allocation from the stack.

You can specifically allocate from the heap by changing line 471 from:

long res[bytecount*8];

to

long *res = (long *) zmalloc (sizeof(long) * bytecount);

And, accordingly, freeing it on line 497:

zfree(res);

This works fine. But the allocation isn't without cost. On large, especially sparse, arrays, it takes more time and memory to allocate the memory for the intermediate array than it does to return the bits which are set.

@jonahharris
Copy link
Contributor Author

I'm wondering if anyone actually looks at these pull requests. This functionality is used quite a bit by multiple people, but no response or comment has been made.

@badboy
Copy link
Contributor

badboy commented Apr 3, 2014

We have BITPOS now in Redis. I'm wondering if it has enough functionality compared to your proposed solution?

@jonahharris
Copy link
Contributor Author

It's not the same command we had been proposing. Whereas the proposed function returns all set bits, BITPOS, as implemented, really only returns the first bit.

@dim
Copy link

dim commented Apr 3, 2014

Yep, an option to pass a limit or range would make it complete.
On 3 Apr 2014 19:39, "Jonah H. Harris" [email protected] wrote:

It's not the same command we had been proposing. Whereas the proposed
function returns all set bits, BITPOS, as implemented, really only returns
the first bit.


Reply to this email directly or view it on GitHubhttps://github.com//pull/1295#issuecomment-39488541
.

@mattsta
Copy link
Contributor

mattsta commented Apr 3, 2014

FYI: I added your command (renamed to BITALLPOS) as a loadable module that can be used with my pluggable Redis: https://github.com/mattsta/krmt

Usage example:

127.0.0.1:6379> config set module-add /Users/matt/repos/krmt/bitallpos.so
OK
127.0.0.1:6379> set abc 3
OK
127.0.0.1:6379> bitallpos abc
1) "2"
2) "3"
3) "6"
4) "7"
127.0.0.1:6379> setbit try 0 1
(integer) 0
127.0.0.1:6379> bitallpos try
1) "0"
127.0.0.1:6379> setbit try 72 1
(integer) 0
127.0.0.1:6379> bitallpos try
1) "0"
2) "72"
127.0.0.1:6379> setbit try 954 1
(integer) 0
127.0.0.1:6379> bitallpos try
1) "0"
2) "72"
3) "954"
127.0.0.1:6379> setbit try 8000001 1
(integer) 0
127.0.0.1:6379> bitallpos try
1) "0"
2) "72"
3) "954"
4) "8000001"

@jonahharris
Copy link
Contributor Author

I like the loadable module. I haven't played with writing one before.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Jonah H. Harris seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants