Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EQL: Add Substring function with Python semantics #53688

Merged
merged 8 commits into from
Mar 19, 2020

Conversation

costin
Copy link
Member

@costin costin commented Mar 17, 2020

Initial function added to act as a template.
Does not reuse substring from SQL due to the difference in semantics and
the accepted arguments.
Currently it is missing any integration test as, the usage of scripting,
which might require an actual integration test against a proper cluster (and
likely its own QA project).

Initial function added to act as a template.
Does not reuse substring from SQL due to the difference in semantics and
the accepted arguments.
Currently it is missing any integration test as, the usage of scripting,
requires an actual integration test against a proper cluster (and likely
its own QA project).
@costin costin added the :Analytics/EQL EQL querying label Mar 17, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/EQL)

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice step forward. Left few comments.

while (end < 0) {
end += length;
}
int validEndIndex = length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need validEndIndex, do you?

while (start < 0) {
start += length;
}
while (end < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this while be replaced with this simple math?

if (end < 0) {
   end = end - end/length * length + (end % length != 0 ? length : 0);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No so simple :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-) instead of subtracting length each time and then check if the value is still negative, the formula above subtracts the entire amount once by dividing the end value by length (to see how many lengths "fit" in "end") and taking the whole part only (not also the decimals), multiplying by length and adding a possible remainder (defined as end % length).

private final Expression source, start, end;

public Substring(Source source, Expression src, Expression start, Expression end) {
super(source, end != null ? Arrays.asList(src, start, end) : Arrays.asList(src, new Literal(source, 0, DataTypes.INTEGER), start));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no end, shouldn't end be treated as the end of the string itself, meaning source's length?

public Substring(Source source, Expression src, Expression start, Expression end) {
super(source, end != null ? Arrays.asList(src, start, end) : Arrays.asList(src, new Literal(source, 0, DataTypes.INTEGER), start));
this.source = src;
this.start = end == null ? new Literal(source, 0, DataTypes.INTEGER) : start;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here you mean to use start == null, no?

super(source, end != null ? Arrays.asList(src, start, end) : Arrays.asList(src, new Literal(source, 0, DataTypes.INTEGER), start));
this.source = src;
this.start = end == null ? new Literal(source, 0, DataTypes.INTEGER) : start;
this.end = end == null ? start : end;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If end == null shouldn't end be the length of the string itself?

if (start == null || end == null) {
return source;
}
if (!(start instanceof Number)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== false instead of !?

if (!(end instanceof Number)) {
throw new EqlIllegalArgumentException("A number is required; received [{}]", end);
}
if (((Number) end).intValue() < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought end can be negative?


InternalEqlScriptUtils() {}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra empty line.

@@ -143,7 +141,7 @@ public void testFunctionVerificationUnknown() {
error("file where opcode=0 and indexOf(file_name, 'plore') == 2"));
assertEquals("1:15: Unknown function [add]",
error("process where add(serial_event_id, 0) == 1"));
assertEquals("1:15: Unknown function [subtract]",
assertEquals("1:15: Unknown function [subtract], did you mean [substring]?",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic


import static org.elasticsearch.xpack.eql.expression.function.scalar.string.StringUtils.substringSlice;

public class StringUtilsTests extends ESTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests with null values?

@costin
Copy link
Member Author

costin commented Mar 18, 2020

I've updated the algorithm since my understanding of sequence based on the docs was not correct; so I experimented with the Python REPL and came up with the above.

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left also some comments mostly minor.

import java.util.Map;

public abstract class InternalQlScriptUtils {
public class InternalQlScriptUtils {
Copy link
Contributor

@matriv matriv Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be also final ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No because it is extended by the other Script classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, sorry.

@@ -35,7 +35,7 @@ protected Settings nodeSettings(int nodeOrdinal) {

@Override
protected Collection<Class<? extends Plugin>> nodePlugins() {
return Collections.singletonList(LocalStateEqlXPackPlugin.class);
return Arrays.asList(LocalStateEqlXPackPlugin.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover.

if (end < 0) {
end = 0;
}
if (end > length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check couldn't be included inside the

if (end < 0)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No because once checks the lower limit, whether end is still negative (after potentially adding length) while the other checks the upper limit.

return start;
}

public Pipe length() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also, shouldn't be end?

assertEquals(str.substring(5, 9), substringSlice(str, -5, -1));
}

public void testSubstringSliceNegativeOverLength() {
Copy link
Contributor

@matriv matriv Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about both negative and both higher than length?
Also a test for both positive and both higher than end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the test does - -15 and -11 are both negative and higher than length.
Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my bad about the negative.

assertEquals(str.substring(0, 10 - 1), substringSlice(str, -start, -1));
}

public void testEndHigherThanLenght() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void testEndHigherThanLenght() {
public void testEndHigherThanLength() {

@astefan astefan self-requested a review March 19, 2020 12:39
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. As a side note (and a separate PR), I think we need to add NodeSubclassTests to EQL, similar to QL's and SQL's corresponding classes.

}

public void testNullValue() {
assertNull(substringSlice(null, 0, 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe random ints (negative too) for completeness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are covered in testSubstringRandomSliceNegative

@costin costin merged commit f58680b into elastic:master Mar 19, 2020
@costin costin deleted the eql/functions-nointeg branch March 19, 2020 14:58
costin added a commit to costin/elasticsearch that referenced this pull request Mar 22, 2020
Improve separation of scripting between EQL and SQL by delegating common
methods to QL. The context detection is determined based on the package
to avoid having repetitive class hierarchies.

Relates elastic#53688
costin added a commit that referenced this pull request Mar 24, 2020
Improve separation of scripting between EQL and SQL by delegating common
methods to QL. The context detection is determined based on the package
to avoid having repetitive class hierarchies.
The Painless whitelists have been improved so that the declaring class is used 
instead of the inherited one.

Relates #53688
costin added a commit that referenced this pull request Mar 24, 2020
Improve separation of scripting between EQL and SQL by delegating common
methods to QL. The context detection is determined based on the package
to avoid having repetitive class hierarchies.
The Painless whitelists have been improved so that the declaring class
is used instead of the inherited one.

Relates #53688

(cherry picked from commit 6d46033)

EQL: Add Substring function with Python semantics (#53688)

Does not reuse substring from SQL due to the difference in semantics and
the accepted arguments.
Currently it is missing full integration tests as, due to the usage of
scripting, requires an actual integration test against a proper cluster
(and likely its own QA project).

(cherry picked from commit f58680b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/EQL EQL querying
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants