Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add bitwise operation for varbit with variable length #107863

Merged
merged 1 commit into from
Aug 17, 2023

Conversation

cty123
Copy link
Contributor

@cty123 cty123 commented Jul 30, 2023

As raised in #107821, the existing bitwise operations in CRDB follow the same standard as Postgresql that require the two operands of varbit type to have the same length. The restriction makes it easier for the implementation but harder for users. As of today, we need to apply casting and left padding to make sure the operands have the same length. Instead, it might be useful to have CRDB built-in functions that can handle this. Here I have implemented 2 basic bitwise operation functions that are tailored for varbit typed data of arbitrary length.

@cty123 cty123 requested a review from a team as a code owner July 30, 2023 12:21
@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jul 30, 2023

CLA assistant check
All committers have signed the CLA.

@blathers-crl
Copy link

blathers-crl bot commented Jul 30, 2023

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

Before a member of our team reviews your PR, I have some potential action items for you:

  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added the O-community Originated from the community label Jul 30, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich requested review from a team, yuzefovich and rharding6373 and removed request for a team and yuzefovich July 31, 2023 18:35
Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing the variable-length bitwise operations! This looks really nice, I only have a few minor suggestions.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cty123)


-- commits line 4 at r1:
Could you add the following text to the bottom of your commit message? Thanks!

Epic: None
Informs: #107821 

Release Notes(sql): Adds built-in functions `varbit_or_unsigned` and `varbit_and_unsigned` for variable-length input bitwise OR and AND operations, respectively.

pkg/sql/sem/builtins/builtins.go line 8193 at r1 (raw file):

			},
			types.VarBit,
			"Calculates bitwise OR value of bit array 'a' and 'b' that may have different lengths.",

nit: s/array/arrays here and in all overload infos.


pkg/sql/sem/builtins/builtins.go line 8200 at r1 (raw file):

			},
			types.VarBit,
			"Calculates bitwise OR value of bit array 'a' and 'b' that may have different lengths.",

Could you add to the description here and for all other functions that a and b are considered unsigned?


pkg/sql/sem/builtins/builtins.go line 11064 at r1 (raw file):

}

// Perform bitwise AND operation 2 bit strings that may have different lengths. The function applies left padding implicitly.

It's worth mentioning here that the padding is unsigned, i.e., only pads 0s.


pkg/sql/sem/builtins/builtins.go line 11064 at r1 (raw file):

}

// Perform bitwise AND operation 2 bit strings that may have different lengths. The function applies left padding implicitly.

nit: s/2/to here and below.


pkg/sql/sem/builtins/builtins.go line 11064 at r1 (raw file):

}

// Perform bitwise AND operation 2 bit strings that may have different lengths. The function applies left padding implicitly.

nit: Do you mind line-wrapping the comment to 80 chars?


pkg/sql/logictest/testdata/logic_test/builtin_function line 4045 at r1 (raw file):

SELECT varbit_or('1010010', '0101');
----
1010101

This should be 1010111, I think.


pkg/sql/logictest/testdata/logic_test/builtin_function line 4091 at r1 (raw file):

SELECT varbit_and('1010010'::varbit, '0101'::varbit);
----
0000000

Could you also add the following test cases for edge cases?

# Test for invalid inputs.
statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit
SELECT varbit_or('not binary', '111')

statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit
SELECT varbit_or('111', 'not binary')

statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit
SELECT varbit_and('not binary', '111')

statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit
SELECT varbit_and('111', 'not binary')

# Accept hex as well as binary inputs.
query T
SELECT varbit_or('xfe8c', '1111')
----
1111111010001111

query T
SELECT varbit_and('1111', 'xfe8c')
----
0000000000001100

# Test on a large (>64 bit) input.
query T
SELECT varbit_or('xffffffffffffffffff', '010')
----
111111111111111111111111111111111111111111111111111111111111111111111111

pkg/sql/sem/builtins/builtins_test.go line 786 at r1 (raw file):

}

func TestVarbitOrAnd(t *testing.T) {

Thank you for the unit test! This is great.

@mgartner
Copy link
Collaborator

mgartner commented Aug 2, 2023

I'm a tad concerned that the name of these new builtins, which are not aggregate functions, are similar to names of aggregate functions, bool_and, bool_or, bit_and, bit_or, and bit_xor. Will that be unnecessarily confusing for users? Are there any sensible alternative names?

@rharding6373
Copy link
Collaborator

Agree that the names could be improved. Would bitmask_and and bitmask_or be any better? It's a similar pattern, but since bitmask isn't a type maybe that would be a bit more clear.

Copy link
Contributor Author

@cty123 cty123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rharding6373)


-- commits line 4 at r1:

Previously, rharding6373 (Rachael Harding) wrote…

Could you add the following text to the bottom of your commit message? Thanks!

Epic: None
Informs: #107821 

Release Notes(sql): Adds built-in functions `varbit_or_unsigned` and `varbit_and_unsigned` for variable-length input bitwise OR and AND operations, respectively.

Should this be

... Adds built-in functions `varbit_or` and `varbit_and` for...

Because in functions.md the 2 functions I added are varbit_or and varbit_and. Or are you suggesting I should rename them to varbit_or_unsigned and varbit_and_unsigned? It would be a bit weird if this is case, because Postgresql and CRDB both have bit_or and bit_and functions rather than bit_or_unsigned or bit_and_unsigned.

Or are they just written this way in the commit message to indicate they are only for unsigned varbit?


pkg/sql/sem/builtins/builtins.go line 8200 at r1 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Could you add to the description here and for all other functions that a and b are considered unsigned?

I am fine with this, but just a little bit weird. Why would varbit have signs? I mean, from my perspective, they are just an array of 0s and 1s, and it's really up to the user to translate them, with 2s complement, 1s complement or unsigned integer. At least on Postgresql documentation, I didn't see any emphasis on the operands being unsigned,

https://www.postgresql.org/docs/9.4/functions-bitstring.html
https://www.postgresql.org/docs/9.1/functions-aggregate.html

@cty123
Copy link
Contributor Author

cty123 commented Aug 2, 2023

I am not 100% sure about the naming, but meanwhile I don't have better idea than varbit_or and varbit_and. bitmask_and and bitmask_or sound okay to me

@cty123 cty123 force-pushed the cty123/add-varbit-op branch from 7028ed1 to b28b638 Compare August 2, 2023 20:24
@blathers-crl
Copy link

blathers-crl bot commented Aug 2, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cty123
Copy link
Contributor Author

cty123 commented Aug 2, 2023

  • Renamed varbit_or->bitmask_or, varbit_and->bitmask_and
  • Added bitmask_xor with more testings(I see no reason no to add xor function)
  • Fixed the grammar

Copy link
Contributor Author

@cty123 cty123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rharding6373)


pkg/sql/sem/builtins/builtins.go line 11064 at r1 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

It's worth mentioning here that the padding is unsigned, i.e., only pads 0s.

Done.


pkg/sql/logictest/testdata/logic_test/builtin_function line 4045 at r1 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

This should be 1010111, I think.

Done.


pkg/sql/logictest/testdata/logic_test/builtin_function line 4091 at r1 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Could you also add the following test cases for edge cases?

# Test for invalid inputs.
statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit
SELECT varbit_or('not binary', '111')

statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit
SELECT varbit_or('111', 'not binary')

statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit
SELECT varbit_and('not binary', '111')

statement error pgcode 22P02 could not parse string as bit array: "n" is not a valid binary digit
SELECT varbit_and('111', 'not binary')

# Accept hex as well as binary inputs.
query T
SELECT varbit_or('xfe8c', '1111')
----
1111111010001111

query T
SELECT varbit_and('1111', 'xfe8c')
----
0000000000001100

# Test on a large (>64 bit) input.
query T
SELECT varbit_or('xffffffffffffffffff', '010')
----
111111111111111111111111111111111111111111111111111111111111111111111111

Done.

@blathers-crl
Copy link

blathers-crl bot commented Aug 2, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

3 similar comments
@blathers-crl
Copy link

blathers-crl bot commented Aug 2, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl
Copy link

blathers-crl bot commented Aug 10, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl
Copy link

blathers-crl bot commented Aug 10, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cty123 cty123 requested a review from rharding6373 August 10, 2023 21:21
@blathers-crl
Copy link

blathers-crl bot commented Aug 12, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

2 similar comments
@blathers-crl
Copy link

blathers-crl bot commented Aug 12, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl
Copy link

blathers-crl bot commented Aug 12, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cty123 cty123 force-pushed the cty123/add-varbit-op branch from 229156e to 31ff17c Compare August 12, 2023 19:33
@blathers-crl
Copy link

blathers-crl bot commented Aug 12, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@rharding6373
Copy link
Collaborator

Thanks for your continued effort on these builtins! I just got back from vacation and will review your most recent changes this week.

As raised in cockroachdb#107821, the existing bitwise operations in CRDB follow the
same standard as Postgresql that require the two operands of varbit type
to have the same length. The restriction makes it easier for the
implementation but harder for users. As of today, we need to apply
casting and left padding to make sure the operands have the same length.
Instead, it might be useful to have CRDB built-in functions that can
handle this. Here I have implemented 3 basic bitwise operation functions
that are tailored for varbit typed data of arbitrary length.

Epic: None
Informs: cockroachdb#107821

Release note (sql change): Adds built-in functions `bitmask_or`,
`bitmask_and` and `bitmask_xor` for variable-length input bitwise OR,
AND, and XOR operations, respectively.
Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks for the changes! I made minor changes to the commit message to reflect the new builtin names and our format requirements. Everything else looks good to me.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cty123)


-- commits line 4 at r1:

Previously, cty123 (cty) wrote…

Should this be

... Adds built-in functions `varbit_or` and `varbit_and` for...

Because in functions.md the 2 functions I added are varbit_or and varbit_and. Or are you suggesting I should rename them to varbit_or_unsigned and varbit_and_unsigned? It would be a bit weird if this is case, because Postgresql and CRDB both have bit_or and bit_and functions rather than bit_or_unsigned or bit_and_unsigned.

Or are they just written this way in the commit message to indicate they are only for unsigned varbit?

You're right, I forgot to edit this comment before publishing my initial review. I updated the commit message with the new names.


pkg/sql/sem/builtins/builtins.go line 8200 at r1 (raw file):

Previously, cty123 (cty) wrote…

I am fine with this, but just a little bit weird. Why would varbit have signs? I mean, from my perspective, they are just an array of 0s and 1s, and it's really up to the user to translate them, with 2s complement, 1s complement or unsigned integer. At least on Postgresql documentation, I didn't see any emphasis on the operands being unsigned,

https://www.postgresql.org/docs/9.4/functions-bitstring.html
https://www.postgresql.org/docs/9.1/functions-aggregate.html

Thanks for this change. For context, we had some internal discussion about whether we should support a signed version in the future that pads with '1' if the most significant bit of the shorter input is '1' instead of '0'. The rationale being that even though varbits aren't a signed type themselves, you have to decide how to interpret them once you have to extend the length of one of them in order to do logical bitwise operations on them. Both the postgres builtins and the bit string operators require inputs to be the same length which forces the user to interpret them.

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cty123)

@rharding6373
Copy link
Collaborator

bors r=rharding6373

@craig
Copy link
Contributor

craig bot commented Aug 17, 2023

Build failed (retrying...):

@craig craig bot merged commit 02065f5 into cockroachdb:master Aug 17, 2023
@craig
Copy link
Contributor

craig bot commented Aug 17, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-community Originated from the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants