Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mssql): add hashbytes and test for binary output hash fns #8107

Merged
merged 4 commits into from
Jan 29, 2024

Conversation

gforsyth
Copy link
Member

@gforsyth gforsyth commented Jan 26, 2024

Description of changes

This adds support for ops.HashBytes to mssql and also adds a test for that functionality so it's easier to port when we merge in the epic split branch.

I've also added a new op, HashHexDigest which returns the hexdigest of various cryptographic hashing functions since I imagine this is what many users are probably after. This newer op (and corresponding hexdigest method) can also support many more backends, as most of them default to returning the string hex digest and not the raw binary output.

I tried to be very accurate in the notimpl and notyet portions of both tests and I think I've done that.

For now, only exposing DuckDB, Pyspark, and MSSQL so we don't add a huge extra burden for the epic split but also address the user request in #8082

And I guess now we can commence debate over the method name? 🐎

Issues closed

Resolves #8082

@gforsyth gforsyth force-pushed the hashbytes branch 2 times, most recently from 86e29d2 to 4efcdbf Compare January 26, 2024 17:47
Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Just a few docstring requests.

And I guess now we can commence debate over the method name? 🐎

I like the name hexdigest for the new operation, seems clear and matches the equivalent python stdlib name.

ibis/expr/types/strings.py Outdated Show resolved Hide resolved
ibis/expr/types/strings.py Outdated Show resolved Hide resolved
@cpcloud
Copy link
Member

cpcloud commented Jan 26, 2024

Should we discuss deprecating whichever other method overlaps with the new hexdigest? I can't remember whether it's hash or hashbytes or something else.

gforsyth and others added 2 commits January 26, 2024 16:24
feat(mssql): add hexdigest

Apply suggestions from code review

Co-authored-by: Jim Crist-Harif <[email protected]>
@gforsyth
Copy link
Member Author

Should we discuss deprecating whichever other method overlaps with the new hexdigest? I can't remember whether it's hash or hashbytes or something else.

So it would be hashbytes, although currently there is a difference in that hashbytes returns the binary hash and there are at least a few backends that support both of those options.

@cpcloud cpcloud added this to the 8.0 milestone Jan 29, 2024
@cpcloud cpcloud merged commit 91f60cd into ibis-project:main Jan 29, 2024
79 checks passed
@gforsyth gforsyth deleted the hashbytes branch January 29, 2024 18:12
gforsyth added a commit to gforsyth/ibis that referenced this pull request Feb 1, 2024
…project#8107)

This adds support for `ops.HashBytes` to `mssql` and also adds a test
for that functionality so it's easier to port when we merge in the epic
split branch.

I've also added a new op, `HashHexDigest` which returns the hexdigest of
various cryptographic hashing functions since I imagine this is what
many users are _probably_ after. This newer op (and corresponding
`hexdigest` method) can also support many more backends, as most of them
default to returning the string hex digest and not the raw binary
output.

I tried to be very accurate in the `notimpl` and `notyet` portions of
both tests and I think I've done that.

For now, only exposing DuckDB, Pyspark, and MSSQL so we don't add a huge
extra burden for the epic split but also address the user request in

And I guess now we can commence debate over the method name? 🐎

Resolves ibis-project#8082

---------

Co-authored-by: Jim Crist-Harif <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: add support for HashBytes operation on multiple backends
4 participants