-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add bitwise operation for varbit with variable length #107863
Conversation
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for implementing the variable-length bitwise operations! This looks really nice, I only have a few minor suggestions.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cty123)
-- commits
line 4 at r1:
Could you add the following text to the bottom of your commit message? Thanks!
Epic: None
Informs: #107821
Release Notes(sql): Adds built-in functions `varbit_or_unsigned` and `varbit_and_unsigned` for variable-length input bitwise OR and AND operations, respectively.
pkg/sql/sem/builtins/builtins.go
line 8193 at r1 (raw file):
}, types.VarBit, "Calculates bitwise OR value of bit array 'a' and 'b' that may have different lengths.",
nit: s/array/arrays here and in all overload infos.
pkg/sql/sem/builtins/builtins.go
line 8200 at r1 (raw file):
}, types.VarBit, "Calculates bitwise OR value of bit array 'a' and 'b' that may have different lengths.",
Could you add to the description here and for all other functions that a and b are considered unsigned?
pkg/sql/sem/builtins/builtins.go
line 11064 at r1 (raw file):
} // Perform bitwise AND operation 2 bit strings that may have different lengths. The function applies left padding implicitly.
It's worth mentioning here that the padding is unsigned, i.e., only pads 0s.
pkg/sql/sem/builtins/builtins.go
line 11064 at r1 (raw file):
} // Perform bitwise AND operation 2 bit strings that may have different lengths. The function applies left padding implicitly.
nit: s/2/to here and below.
pkg/sql/sem/builtins/builtins.go
line 11064 at r1 (raw file):
} // Perform bitwise AND operation 2 bit strings that may have different lengths. The function applies left padding implicitly.
nit: Do you mind line-wrapping the comment to 80 chars?
pkg/sql/logictest/testdata/logic_test/builtin_function
line 4045 at r1 (raw file):
SELECT varbit_or('1010010', '0101'); ---- 1010101
This should be 1010111, I think.
pkg/sql/logictest/testdata/logic_test/builtin_function
line 4091 at r1 (raw file):
SELECT varbit_and('1010010'::varbit, '0101'::varbit); ---- 0000000
Could you also add the following test cases for edge cases?
# Test for invalid inputs.
statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit
SELECT varbit_or('not binary', '111')
statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit
SELECT varbit_or('111', 'not binary')
statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit
SELECT varbit_and('not binary', '111')
statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit
SELECT varbit_and('111', 'not binary')
# Accept hex as well as binary inputs.
query T
SELECT varbit_or('xfe8c', '1111')
----
1111111010001111
query T
SELECT varbit_and('1111', 'xfe8c')
----
0000000000001100
# Test on a large (>64 bit) input.
query T
SELECT varbit_or('xffffffffffffffffff', '010')
----
111111111111111111111111111111111111111111111111111111111111111111111111
pkg/sql/sem/builtins/builtins_test.go
line 786 at r1 (raw file):
} func TestVarbitOrAnd(t *testing.T) {
Thank you for the unit test! This is great.
I'm a tad concerned that the name of these new builtins, which are not aggregate functions, are similar to names of aggregate functions, |
Agree that the names could be improved. Would |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rharding6373)
Previously, rharding6373 (Rachael Harding) wrote…
Could you add the following text to the bottom of your commit message? Thanks!
Epic: None Informs: #107821 Release Notes(sql): Adds built-in functions `varbit_or_unsigned` and `varbit_and_unsigned` for variable-length input bitwise OR and AND operations, respectively.
Should this be
... Adds built-in functions `varbit_or` and `varbit_and` for...
Because in functions.md the 2 functions I added are varbit_or
and varbit_and
. Or are you suggesting I should rename them to varbit_or_unsigned
and varbit_and_unsigned
? It would be a bit weird if this is case, because Postgresql and CRDB both have bit_or
and bit_and
functions rather than bit_or_unsigned
or bit_and_unsigned
.
Or are they just written this way in the commit message to indicate they are only for unsigned varbit?
pkg/sql/sem/builtins/builtins.go
line 8200 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Could you add to the description here and for all other functions that a and b are considered unsigned?
I am fine with this, but just a little bit weird. Why would varbit have signs? I mean, from my perspective, they are just an array of 0s and 1s, and it's really up to the user to translate them, with 2s complement, 1s complement or unsigned integer. At least on Postgresql documentation, I didn't see any emphasis on the operands being unsigned,
https://www.postgresql.org/docs/9.4/functions-bitstring.html
https://www.postgresql.org/docs/9.1/functions-aggregate.html
I am not 100% sure about the naming, but meanwhile I don't have better idea than |
7028ed1
to
b28b638
Compare
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rharding6373)
pkg/sql/sem/builtins/builtins.go
line 11064 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
It's worth mentioning here that the padding is unsigned, i.e., only pads 0s.
Done.
pkg/sql/logictest/testdata/logic_test/builtin_function
line 4045 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
This should be 1010111, I think.
Done.
pkg/sql/logictest/testdata/logic_test/builtin_function
line 4091 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Could you also add the following test cases for edge cases?
# Test for invalid inputs. statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit SELECT varbit_or('not binary', '111') statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit SELECT varbit_or('111', 'not binary') statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit SELECT varbit_and('not binary', '111') statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit SELECT varbit_and('111', 'not binary') # Accept hex as well as binary inputs. query T SELECT varbit_or('xfe8c', '1111') ---- 1111111010001111 query T SELECT varbit_and('1111', 'xfe8c') ---- 0000000000001100 # Test on a large (>64 bit) input. query T SELECT varbit_or('xffffffffffffffffff', '010') ---- 111111111111111111111111111111111111111111111111111111111111111111111111
Done.
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
3 similar comments
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
2 similar comments
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
229156e
to
31ff17c
Compare
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Thanks for your continued effort on these builtins! I just got back from vacation and will review your most recent changes this week. |
As raised in cockroachdb#107821, the existing bitwise operations in CRDB follow the same standard as Postgresql that require the two operands of varbit type to have the same length. The restriction makes it easier for the implementation but harder for users. As of today, we need to apply casting and left padding to make sure the operands have the same length. Instead, it might be useful to have CRDB built-in functions that can handle this. Here I have implemented 3 basic bitwise operation functions that are tailored for varbit typed data of arbitrary length. Epic: None Informs: cockroachdb#107821 Release note (sql change): Adds built-in functions `bitmask_or`, `bitmask_and` and `bitmask_xor` for variable-length input bitwise OR, AND, and XOR operations, respectively.
31ff17c
to
fc0b06f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! I made minor changes to the commit message to reflect the new builtin names and our format requirements. Everything else looks good to me.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cty123)
Previously, cty123 (cty) wrote…
Should this be
... Adds built-in functions `varbit_or` and `varbit_and` for...
Because in functions.md the 2 functions I added are
varbit_or
andvarbit_and
. Or are you suggesting I should rename them tovarbit_or_unsigned
andvarbit_and_unsigned
? It would be a bit weird if this is case, because Postgresql and CRDB both havebit_or
andbit_and
functions rather thanbit_or_unsigned
orbit_and_unsigned
.Or are they just written this way in the commit message to indicate they are only for unsigned varbit?
You're right, I forgot to edit this comment before publishing my initial review. I updated the commit message with the new names.
pkg/sql/sem/builtins/builtins.go
line 8200 at r1 (raw file):
Previously, cty123 (cty) wrote…
I am fine with this, but just a little bit weird. Why would varbit have signs? I mean, from my perspective, they are just an array of 0s and 1s, and it's really up to the user to translate them, with 2s complement, 1s complement or unsigned integer. At least on Postgresql documentation, I didn't see any emphasis on the operands being unsigned,
https://www.postgresql.org/docs/9.4/functions-bitstring.html
https://www.postgresql.org/docs/9.1/functions-aggregate.html
Thanks for this change. For context, we had some internal discussion about whether we should support a signed version in the future that pads with '1'
if the most significant bit of the shorter input is '1'
instead of '0'
. The rationale being that even though varbits aren't a signed type themselves, you have to decide how to interpret them once you have to extend the length of one of them in order to do logical bitwise operations on them. Both the postgres builtins and the bit string operators require inputs to be the same length which forces the user to interpret them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cty123)
bors r=rharding6373 |
Build failed (retrying...): |
Build succeeded: |
As raised in #107821, the existing bitwise operations in CRDB follow the same standard as Postgresql that require the two operands of varbit type to have the same length. The restriction makes it easier for the implementation but harder for users. As of today, we need to apply casting and left padding to make sure the operands have the same length. Instead, it might be useful to have CRDB built-in functions that can handle this. Here I have implemented 2 basic bitwise operation functions that are tailored for varbit typed data of arbitrary length.