-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ir): add StringSlice
#8832
Conversation
4e1cb4b
to
c4c02bf
Compare
│ ric │ | ||
└───────────────────────┘ | ||
┏━━━━━━━━━━━━━━━━━━━━━━┓ | ||
┃ StringSlice(food, 3) ┃ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this repr isn't great: both [:3] and [3:] result in StringSlice(food, 3)
. But that is separate problem.
], | ||
) | ||
def test_string(backend, alltypes, df, result_func, expected_func): | ||
expr = result_func(alltypes).name("tmp") | ||
def test_substring(backend, alltypes, df, result_func, expected_func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the diff for these test cases is really messy, sorry. What I did:
- move some test cases to separate test_substring test
- remove xfails/brokens marks since I fixed/implemented things in polars and mssql
- removed no test cases
- added the
positive-index
test case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, I was going to ask why you deleted test cases 😂
Fixes ibis-project#8796 This also improves the compilation for Substring: - the bigquery, clickhouse, and datafusion implementations can just get deleted, they do the same thing as the base SqlGlotCompiler - The mssql compiler didn't actually compile Substring(arg, start, None) properly, it created the simple `substr(arg, len)` SQL function, but in mssql the length is always required. So fix that unrelated bug. - Optimizes the SQL generated by visit_Substring: Before, the generated SQL was of the form `if start >= 1 then substr(arg, start, length) else substr(arg, start+len(arg), length)`. Now, it is like `substring(arg, if start >= 1 then start else start+len(arg), length)`. This reduces the number of appearances of arg from 3 to 2. In general we want to shrink the args of the ifelses to as small as possible. - Add in the check for negative literal lengths to the SqlGlotCompiler
stop = self.length() | ||
|
||
return self.substr(start, stop - start) | ||
if start is not None and not isinstance(start, ir.Expr) and start < 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test case for this? Might be good to add one if there isn't. Same for the other seemingly uncovered lines below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is not currently a test case. Agreed should add one. In this PR or a followup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be done in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, question about uncovered lines when raising errors at the API level.
I'll regenerate the snowflake snapshots. |
return ops.Substring(_.arg, start=0, length=_.end) | ||
if ( | ||
isinstance(_.start, ops.Literal) | ||
and isinstance(_.start.value, int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rewrite rule is good as is, just noting that this could be spelled as a pattern as well: p.StringSlice(p.Literal(int), p.Literal(int)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in there would be a different rule for each of the patterns? (Literal, None), (None, literal), (literal, literal)? Maybe a full example of how this could be spelled? If there are different rules then would I need to import three different rewrites in all the compilers?
I'd love to learn for next time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ideal in this current scenario, just wanted to highlight you that nested patterns are also supported.
There are various ways to spell this, here is a more complicated one showing what can be done with the rewrite system:
from ibis.expr.rewrites import p, d
p.StringSlice(end=None) >> d.Substring(_.arg, start=_.start) |
p.StringSlice(start=None) >> d.Substring(_.arg, start=0, length=_.end) |
p.StringSlice(p.Literal(int), p.Literal(int)) >> d.Substring(_.arg, start=_.start, length=_.end.value - _.start.value) |
p.StringSlice >> d.Substring(_.arg, start=_.start, length=d.Subtract(_.end, _.start)
Thanks @NickCrews! |
Fixes #8796