-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: scalar regex match physical expr #12270
base: main
Are you sure you want to change the base?
Conversation
Thank you for this PR @zhuliquan . Have you run any benchmarks that show this approach is noticeably faster than the existing approach? It makes sense that it would be faster as it does not re-compile the regular expression for each batch, but I think it would help to quantify this difference |
yeah add benchmarks
|
d49edca
to
e9fc6c7
Compare
9f02ab6
to
f1a81a7
Compare
Which issue does this PR close?
Closes #11146.
Rationale for this change
This PR is successor of PR #11455
BinaryExpr
will compile literal regex pattern when it evaluatingRecordBatch
every time, Sometime, the time of compiling regex pattern is also expensive. In our approach, literal regex pattern will be compiled once and cached to be reused in execution. It's will save compile time of pre execution and speed up execution.What changes are included in this PR?
ScalarRegexMatchExpr
to handle regexp match with literal regrex pattern.PhysicalScalarRegexMatchExprNode
in proto to handleScalarRegexMatchExpr
and add arm in funcparse_physical_expr
andserialize_physical_expr
.BinaryExpr
arm increate_physical_expr
. CreatingScalarRegexMatchExpr
instead ofBinaryExpr
when Rhs is string literal expr andop
isRegexMatch | RegexIMatch | RegexNotMatch | RegexNotIMatch
.Are these changes tested?
Yes, test mod in
scalar_regex_match.rs
Are there any user-facing changes?