-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mssql): add lpad and rpad ops #10060
Conversation
Here was my initial implementation before it became what it is now:
But it was giving me this result:
Where I was expecting this result:
|
ac32fab
to
7c128db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working through this!
@IndexSeek The pad functions are not consistent in their truncation behavior right now, so this PR should handle that at least for MS SQL. The desired behavior is the Python behavior, which is to never truncate if the padded string would be shorter than existing string. For example: >>> s = "abc, 123"
>>> s.ljust(4, '-')
'abc, 123' |
Added @gforsyth for review since he's been doing a bunch of work trying to vanquish (or least contain) the dark underbelly of accursed database strings. |
That makes total sense! I have an idea (that involves a case expression) of how we can make this work, and I'll try to tackle it this evening. Thanks for the initial review @cpcloud and the quick fix on the default value!
Thanks for looping @gforsyth in! Containing the strings is no easy feat! When I looked through the tests, I thought the following section would make a great candidate for addition to the new "Cursed Knowledge" page: ibis/ibis/backends/tests/test_string.py Lines 1143 to 1152 in 3533827
|
Oh, yeah, that page is just getting started 😂 |
It will grow without bound |
I think I'll add a few more (technically redundant) test tables for padding and length (don't let this block you here @IndexSeek) so we can more explicitly map out "levels" of string support into things like:
|
I have adjusted the functions now to support Python-style padding. I was able to reuse much of the previous logic with a slight change, essentially:
Since this behavior differs from the other backends, I added "mssql" to the notyet section for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, @IndexSeek ! As noted, I have some new string data that's a little more simple-ASCII for validating the simpler lpad
and rpad
cases and your implementation here matches Python's padding behavior, modulo issues with the unclear definition of what "length" is.
## Description of changes Simplifying the previous implementation from `sge.Case(ifs...)` per #10060 (comment).
Thank you for the review! "length" is definitely tricky here. SQL Server/T-SQL has the DATALENGTH function, returning the number of bytes, where as it's LEN function returns the number of characters. But it seems that some backends treat I am excited to review the new string data - it will be great to cover more cases! |
Co-authored-by: Phillip Cloud <[email protected]>
## Description of changes Simplifying the previous implementation from `sge.Case(ifs...)` per ibis-project#10060 (comment).
Description of changes
I noticed that MSSQL was the only backend missing lpad and rpad, so I wanted to add support for these.
The LPAD operation was a bit tricky. I'm wondering if there might be a more straightforward way, but after several attempts, this seemed to handle all of the cases well.