-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EQL: Add Substring function with Python semantics #53688
Conversation
Initial function added to act as a template. Does not reuse substring from SQL due to the difference in semantics and the accepted arguments. Currently it is missing any integration test as, the usage of scripting, requires an actual integration test against a proper cluster (and likely its own QA project).
Pinging @elastic/es-search (:Search/EQL) |
...src/main/java/org/elasticsearch/xpack/eql/expression/function/scalar/string/StringUtils.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/eql/expression/function/scalar/string/StringUtils.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/eql/expression/function/scalar/string/StringUtils.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice step forward. Left few comments.
while (end < 0) { | ||
end += length; | ||
} | ||
int validEndIndex = length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need validEndIndex
, do you?
while (start < 0) { | ||
start += length; | ||
} | ||
while (end < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this while
be replaced with this simple math?
if (end < 0) {
end = end - end/length * length + (end % length != 0 ? length : 0);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No so simple :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-) instead of subtracting length each time and then check if the value is still negative, the formula above subtracts the entire amount once by dividing the end
value by length
(to see how many lengths "fit" in "end") and taking the whole part only (not also the decimals), multiplying by length and adding a possible remainder (defined as end % length
).
private final Expression source, start, end; | ||
|
||
public Substring(Source source, Expression src, Expression start, Expression end) { | ||
super(source, end != null ? Arrays.asList(src, start, end) : Arrays.asList(src, new Literal(source, 0, DataTypes.INTEGER), start)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no end
, shouldn't end
be treated as the end of the string itself, meaning source's length?
public Substring(Source source, Expression src, Expression start, Expression end) { | ||
super(source, end != null ? Arrays.asList(src, start, end) : Arrays.asList(src, new Literal(source, 0, DataTypes.INTEGER), start)); | ||
this.source = src; | ||
this.start = end == null ? new Literal(source, 0, DataTypes.INTEGER) : start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here you mean to use start == null
, no?
super(source, end != null ? Arrays.asList(src, start, end) : Arrays.asList(src, new Literal(source, 0, DataTypes.INTEGER), start)); | ||
this.source = src; | ||
this.start = end == null ? new Literal(source, 0, DataTypes.INTEGER) : start; | ||
this.end = end == null ? start : end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If end == null
shouldn't end
be the length of the string itself?
if (start == null || end == null) { | ||
return source; | ||
} | ||
if (!(start instanceof Number)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
== false
instead of !
?
if (!(end instanceof Number)) { | ||
throw new EqlIllegalArgumentException("A number is required; received [{}]", end); | ||
} | ||
if (((Number) end).intValue() < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought end
can be negative?
|
||
InternalEqlScriptUtils() {} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra empty line.
@@ -143,7 +141,7 @@ public void testFunctionVerificationUnknown() { | |||
error("file where opcode=0 and indexOf(file_name, 'plore') == 2")); | |||
assertEquals("1:15: Unknown function [add]", | |||
error("process where add(serial_event_id, 0) == 1")); | |||
assertEquals("1:15: Unknown function [subtract]", | |||
assertEquals("1:15: Unknown function [subtract], did you mean [substring]?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic
|
||
import static org.elasticsearch.xpack.eql.expression.function.scalar.string.StringUtils.substringSlice; | ||
|
||
public class StringUtilsTests extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No tests with null
values?
I've updated the algorithm since my understanding of sequence based on the docs was not correct; so I experimented with the Python REPL and came up with the above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left also some comments mostly minor.
import java.util.Map; | ||
|
||
public abstract class InternalQlScriptUtils { | ||
public class InternalQlScriptUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be also final
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because it is extended by the other Script
classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, sorry.
@@ -35,7 +35,7 @@ protected Settings nodeSettings(int nodeOrdinal) { | |||
|
|||
@Override | |||
protected Collection<Class<? extends Plugin>> nodePlugins() { | |||
return Collections.singletonList(LocalStateEqlXPackPlugin.class); | |||
return Arrays.asList(LocalStateEqlXPackPlugin.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover.
if (end < 0) { | ||
end = 0; | ||
} | ||
if (end > length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check couldn't be included inside the
if (end < 0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because once checks the lower limit, whether end
is still negative (after potentially adding length
) while the other checks the upper limit.
return start; | ||
} | ||
|
||
public Pipe length() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also, shouldn't be end?
assertEquals(str.substring(5, 9), substringSlice(str, -5, -1)); | ||
} | ||
|
||
public void testSubstringSliceNegativeOverLength() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about both negative and both higher than length?
Also a test for both positive and both higher than end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what the test does - -15
and -11
are both negative and higher than length.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my bad about the negative.
assertEquals(str.substring(0, 10 - 1), substringSlice(str, -start, -1)); | ||
} | ||
|
||
public void testEndHigherThanLenght() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public void testEndHigherThanLenght() { | |
public void testEndHigherThanLength() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. As a side note (and a separate PR), I think we need to add NodeSubclassTests to EQL, similar to QL's and SQL's corresponding classes.
} | ||
|
||
public void testNullValue() { | ||
assertNull(substringSlice(null, 0, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe random ints (negative too) for completeness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are covered in testSubstringRandomSliceNegative
Improve separation of scripting between EQL and SQL by delegating common methods to QL. The context detection is determined based on the package to avoid having repetitive class hierarchies. Relates elastic#53688
Improve separation of scripting between EQL and SQL by delegating common methods to QL. The context detection is determined based on the package to avoid having repetitive class hierarchies. The Painless whitelists have been improved so that the declaring class is used instead of the inherited one. Relates #53688
Improve separation of scripting between EQL and SQL by delegating common methods to QL. The context detection is determined based on the package to avoid having repetitive class hierarchies. The Painless whitelists have been improved so that the declaring class is used instead of the inherited one. Relates #53688 (cherry picked from commit 6d46033) EQL: Add Substring function with Python semantics (#53688) Does not reuse substring from SQL due to the difference in semantics and the accepted arguments. Currently it is missing full integration tests as, due to the usage of scripting, requires an actual integration test against a proper cluster (and likely its own QA project). (cherry picked from commit f58680b)
Initial function added to act as a template.
Does not reuse substring from SQL due to the difference in semantics and
the accepted arguments.
Currently it is missing any integration test as, the usage of scripting,
which might require an actual integration test against a proper cluster (and
likely its own QA project).