-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: operators in data-prepper expression #1065
FEAT: operators in data-prepper expression #1065
Conversation
Signed-off-by: Chen <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #1065 +/- ##
============================================
- Coverage 91.32% 91.32% -0.01%
- Complexity 683 692 +9
============================================
Files 84 85 +1
Lines 1994 2017 +23
Branches 168 170 +2
============================================
+ Hits 1821 1842 +21
- Misses 132 133 +1
- Partials 41 42 +1
Continue to review full report at Codecov.
|
|
||
import static com.google.common.base.Preconditions.checkArgument; | ||
|
||
public class LowerThanOperator implements Operator<Boolean> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think LessThan
or LessThanOrEqual
are a little better names for these
checkArgument(args.length == 2, "Operands length needs to be 2."); | ||
if ((args[0] instanceof Integer) && (args[1] instanceof Float)) { | ||
return ((Integer) args[0]) > ((Float) args[1]); | ||
} else if ((args[0] instanceof Float) && (args[1] instanceof Integer)) { | ||
return ((Float) args[0]) > ((Integer) args[1]); | ||
} else if ((args[0] instanceof Integer) && (args[1] instanceof Integer)) { | ||
return ((Integer) args[0]) > ((Integer) args[1]); | ||
} else if ((args[0] instanceof Float) && (args[1] instanceof Float)) { | ||
return ((Float) args[0]) > ((Float) args[1]); | ||
} | ||
throw new IllegalArgumentException(args[0] + " or " + args[1] + " should be either Float or Integer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is repeated a lot throughout the operators. Is there a way to break this out into a method with something like the following declaration so it can be reused for all of them? It may end up being ugly anyways given that you can't inject the operator super easily.
private evalNumericalOperator(Object... args, String Operator)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have also thought upon refactoring it. But it seems to me repeating this snippet of very similar logic cannot be avoided unless operator can be injected to indicate explicit method call that takes in args of generic Number & Comparable type. e.g. Integer.compareTo(Integer b) will not work since it only compares Integer to Integer
|
||
import static com.google.common.base.Preconditions.checkArgument; | ||
|
||
public class EqualOperator implements Operator<Boolean> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we want to handle when the user tries to use this operator to compare floats? Is there a worry that the precision will consider two essentially equal floats not equal? Should we consider floats equal if they are within a certain threshold like .0001
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This equal operator is to cover generic object equal, which means it completely delegates to Object.equals method implementation. The case you are mentioning is valid but is conceptually different as it is an approximate equal only applies to float number type. IMO, user is not recommended to base equality off two float numbers. But if we do want to support that use case, we should define a different operator in the antlr grammar that also takes in the delta as input. Then we will have sth like ApproxOperator.eval(Object...args) that takes in 3 args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok to not cover that use case at this time.
Signed-off-by: Chen <[email protected]>
Signed-off-by: Chen <[email protected]>
if ((args[0] instanceof Integer) && (args[1] instanceof Float)) { | ||
return ((Integer) args[0]) > ((Float) args[1]); | ||
} else if ((args[0] instanceof Float) && (args[1] instanceof Integer)) { | ||
return ((Float) args[0]) > ((Integer) args[1]); | ||
} else if ((args[0] instanceof Integer) && (args[1] instanceof Integer)) { | ||
return ((Integer) args[0]) > ((Integer) args[1]); | ||
} else if ((args[0] instanceof Float) && (args[1] instanceof Float)) { | ||
return ((Float) args[0]) > ((Float) args[1]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider converting argument into BigDecimal and simplifying the code for maintainability. The BigDecimal class gives complete control over rounding behavior and is an exact way of representing numbers. The disadvantage of BigDecimal is that it's slower. Not sure if it makes much difference in comparison operation.
Below applies to all number comparison operation.
public Boolean eval(final Object... args) {
checkArgument(args.length == 2, "Operands length needs to be 2.");
return getBigDecimal(val1).compareTo(getBigDecimal(val1)) > 0;
}
// this can be moved to some common class
public static BigDecimal getBigDecimal( final Object value ) {
BigDecimal ret = null;
if( value != null ) {
if( value instanceof BigDecimal ) {
ret = (BigDecimal) value;
} else if( value instanceof Integer ) {
ret = new BigDecimal( (Integer) value );
} else if( value instanceof Float ) {
ret = new BigDecimal( (Float) value );
} else {
throw new IllegalArgumentException/ClassCastException("Not possible to convert ["+value+"] from class "+value.getClass()+" into a BigDecimal.");
}
}
return ret;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dinujoh Thanks for the suggestion! Although it refactors similar lines of code into single method for better readability, I am a little worried about the overhead of converting number into BigDecimal (getBigDecimal): https://stackoverflow.com/questions/1378044/how-using-bigdecimal-would-affect-application-performance. Also on precision aspect, can we reserve this change until higher precision is indeed desired in use cases? @sbayer55 thoughts are welcome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbayer55 can comment if precision is required or we can reserve it for later. If we want to support double then we would require precision.
Consider the following code which is much more readable and avoids if-else.
private static final Map<String, BiFunction<Object, Object, Boolean>> comparisonFunction = new HashMap<>() {
{
put(Integer.class.getName() + "_" + Integer.class.getName(), (a,b) -> (Integer)a > (Integer)b);
put(Integer.class.getName() + "_" + Float.class.getName(), (a,b) -> (Integer)a > (Float)b);
put(Float.class.getName() + "_" + Integer.class.getName(), (a,b) -> (Float)a > (Integer)b);
put(Float.class.getName() + "_" + Float.class.getName(), (a,b) -> (Float)a > (Float)b);
}
};
public Boolean eval(final Object... args) {
checkArgument(args.length == 2, "Operands length needs to be 2.");
final BiFunction<Object, Object, Boolean> operation = comparisonFunction.get(constructKey(args[0], args[1]));
if(operation == null) {
throw new IllegalArgumentException(args[0] + " or " + args[1] + " should be either Float or Integer");
}
return operation.apply(args[0], args[1]);
}
// can be moved to common class
private String constructKey(final Object val1, final Object val2) {
return val1.getClass().getName() + "_" + val2.getClass().getName();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dinujoh Thanks. This indeed looks more compact! I will make the change to see what others think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dinujoh adding a tolerance to precision operators is out of scope for 1.3. I created issue (#1109)[https://github.com//issues/1109] to track enhancing floating point operation support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dinujoh I've heard BigDecimal is significantly slower compared to int/float operations. Do you know if there is any truth to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. While the refactoring of the evaluation method for Float
and Integer
would be nice, it doesn't seem to be trivial enough to make it worth it.
Signed-off-by: Chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working this. How will Data Prepper load these Operator objects?
* @param args operands | ||
* @return T | ||
*/ | ||
T eval(final Object... args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the full word - evaluate
.
@Override | ||
public Boolean eval(Object... args) { | ||
checkArgument(args.length == 2, "Operands length needs to be 2."); | ||
if ((args[0] instanceof Integer) && (args[1] instanceof Float)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to use the Comparable
interface to simplify this. I don't believe it will work for comparing Integers and Floats. But, you can convert the Integer to a Float if one is a Float and the other isn't.
Also, you can improve on args[0]
and args[1]
:
Object leftValue = args[0];
Object rightValue = args[1];
// overly simplified code
return leftValue > rightValue;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe it will work for comparing Integers and Floats. But, you can convert the Integer to a Float if one is a Float and the other isn't.
@dlvenable Not sure if I follow this part. Do you mean sth as
Float.valueOf(leftValue) > Float.valueOf(rightValue)
or
Float.valueOf(leftValue).compareTo(Float.valueOf(rightValue)) == 1
? My only concern is that when comparing Integer equal (e.g. >=
) it is bringing in Float value that might introduce delta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adopted the proposed approach from Dinu below. Let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One takeaway from @dlvenable comment is if either number is a float then both numbers can be type typecast as a float. Java will cast the int as a float internally in all cases where a float is part of an operation with an int.
|
||
import static com.google.common.base.Preconditions.checkArgument; | ||
|
||
public class AndOperator implements Operator<Boolean> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these classes should be package protected.
|
||
package org.opensearch.dataprepper.expression; | ||
|
||
public interface Operator<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this class be packaged protected?
This has been discussed offline. |
Signed-off-by: Chen <[email protected]>
Signed-off-by: Chen <[email protected]>
Signed-off-by: Chen <[email protected]>
|
||
@Override | ||
public Boolean evaluate(Object... args) { | ||
checkArgument(args.length == 2, "Operands length needs to be 2."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a higher level question for @sbayer55. What the user experience like with this error message actionable for our users? How will I, as a user, know which Operands length needs to be adjusted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have the full context. But my guess is that L31 should always given the expression parser and correct parseTreeEvaluator evaluation logic that is supposed to be consistent with the ANTLR grammar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of safety checks before the operator is called.
- The Grammar should enforce that operands are always provided the correct number of arguments
- The Operator provides a required number of arguments that will be pulled from the arg stack before evaluate is called.
The exception message could be more specific, saying something like "Greater than operator requires two arguments". The evaluator class will then catch that exception and can provide more context on where in the statement the error occurred. I don't think it would make sense to expose that level of information to the operator instance.
Issue 1003 has a sequence flow diagram to give additional context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbayer55 Thanks. IMO, to provide more specific exception message with operator type encoded, we might need to rethink on the design of NumericCompareOperator or NegateOperator that have been proposed. @cmanning09 thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a second look, we may inject operator type string e.g. ">=" into NumericCompareOperator
but have not yet found a way for NegateOperator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered a negate operator that internally holds a reference to another operator whose results are negated?
class GreaterThanOrEqualOperator implements Operator<Boolean> { | ||
private static final Map<String, BiFunction<Object, Object, Boolean>> GREATER_THAN_OR_EQUAL_ON = | ||
new HashMap<String, BiFunction<Object, Object, Boolean>>() {{ | ||
put(Integer.class.getName() + "_" + Integer.class.getName(), (a, b) -> (Integer) a >= (Integer) b); | ||
put(Integer.class.getName() + "_" + Float.class.getName(), (a, b) -> (Integer) a >= (Float) b); | ||
put(Float.class.getName() + "_" + Integer.class.getName(), (a, b) -> (Float) a >= (Integer) b); | ||
put(Float.class.getName() + "_" + Float.class.getName(), (a, b) -> (Float) a >= (Float) b);}}; | ||
|
||
@Override | ||
public Integer getSymbol() { | ||
return DataPrepperExpressionParser.GTE; | ||
} | ||
|
||
@Override | ||
public Boolean evaluate(Object... args) { | ||
checkArgument(args.length == 2, "Operands length needs to be 2."); | ||
final Object leftValue = args[0]; | ||
final Object rightValue = args[1]; | ||
final BiFunction<Object, Object, Boolean> operation = GREATER_THAN_OR_EQUAL_ON.get( | ||
BinaryOperandsKeyFactory.typesKey(leftValue, rightValue)); | ||
if (operation == null) { | ||
throw new IllegalArgumentException(leftValue + " or " + rightValue + " should be either Float or Integer"); | ||
} | ||
return operation.apply(leftValue, rightValue); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like @dinujoh 's suggestion and I think we can continue in this direction to reduce the duplicate code as well. Now that the operation logic has been reduced to a map, we can pull out all the boilerplate code and inject the unique map for each operator through a constructor. The typesKey
method doesn't have to be a static utility function either and can reside in the class.
class NumericComparisonOperator implements Operator<Boolean> {
private final Integer symbol;
private final Map<String, BiFunction<Object, Object, Boolean>> operationMap;
public GenericComparisonOperator(final Integer symbol, final Map<String, BiFunction<Object, Object, Boolean>> operationMap) {
this.symbol = symbol;
this.operationMap = operationMap;
}
public getSymbol() {
return this.symbol;
}
@Override
public Boolean evaluate(final Object... args) {
checkArgument(args.length == 2, "Operands length needs to be 2.");
final Object leftValue = args[0];
final Object rightValue = args[1];
final BiFunction<Object, Object, Boolean> operation = operationMap.get(
BinaryOperandsKeyFactory.typesKey(leftValue, rightValue));
if (operation == null) {
throw new IllegalArgumentException(leftValue + " or " + rightValue + " should be either Float or Integer");
}
return operation.apply(leftValue, rightValue);
}
private String typesKey(final Object leftValue, final Object rightValue) {
return leftValue.getClass().getName() + "_" + rightValue.getClass().getName();
}
}
Then we can build these instances as part of our DI.
private static final Map<String, BiFunction<Object, Object, Boolean>> GREATER_THAN_OR_EQUAL_ON =
new HashMap<String, BiFunction<Object, Object, Boolean>>() {{
put(Integer.class.getName() + "_" + Integer.class.getName(), (a, b) -> (Integer) a >= (Integer) b);
put(Integer.class.getName() + "_" + Float.class.getName(), (a, b) -> (Integer) a >= (Float) b);
put(Float.class.getName() + "_" + Integer.class.getName(), (a, b) -> (Float) a >= (Integer) b);
put(Float.class.getName() + "_" + Float.class.getName(), (a, b) -> (Float) a >= (Float) b);}};
public Operator<Boolean> greaterThanOrEqualToBuilder() {
return new NumericComparisonOperator(DataPrepperExpressionParser.GTE, GREATER_THAN_OR_EQUAL_ON);
}
class Operator<Boolean> lessThanOrEqualToBuilder() {
// LESS_THAN map...
}
// etc.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we group all NumericComparisonOperator instantiation into a factory class?
public Boolean evaluate(Object... args) { | ||
checkArgument(args.length == 2, "Operands length needs to be 2."); | ||
checkArgument(args[0] instanceof String, "Left operand needs to be String."); | ||
checkArgument(args[1] instanceof String, "Right Operand needs to be String."); | ||
try { | ||
return !((String) args[0]).matches((String) args[1]); | ||
} catch (final PatternSyntaxException e) { | ||
throw new IllegalArgumentException(e); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we consider simply negating the operations to reduce the code and checks? If our regex (or others ) implementation has to change we now have to update multiple classes. Leveraging existing implementations would simplify this and reduce code duplication again.
class GenericNotOperator implements Operator<Boolean> {
private final Integer symbol;
private final Operator<Boolean> operationToNegate;
public GenericNotOperator(final Integer symbol, Operator<Boolean> operationToNegate) {
this.symbol = symbol;
this.operationToNegate = operationToNegate;
}
@Override
public Integer getSymbol() {
return this.symbol;
}
@Override
public Boolean evaluate(final Object... args) {
return !this.operation.evaluate(args);
}
}
Ideally we can inject these through DI.
public Operator<Boolean> notMatchRegexPatternBuilder() {
return new GenericNotOperator(DataPrepperExpressionParser.NOT_MATCH_REGEX_PATTERN, new RegexEqualOperator());
}
public Operator<Boolean> notInBuilder() {
return new GenericNotOperator(DataPrepperExpressionParser.NOT_IN_PATTERN, new NotInOperator());
}
} | ||
|
||
@Override | ||
public Boolean evaluate(Object... args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the args final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenqi0805 is might be helpful to enable this setting in inellij
Signed-off-by: Chen <[email protected]>
Signed-off-by: Chen <[email protected]>
…ession-evaluator Signed-off-by: Chen <[email protected]>
Signed-off-by: Chen <[email protected]>
Signed-off-by: Chen <[email protected]>
Signed-off-by: Chen <[email protected]>
Signed-off-by: Chen <[email protected]>
public class OperatorFactory { | ||
public final BiPredicate<Object, Object> regexEquals = (x, y) -> ((String) x).matches((String) y); | ||
public final BiPredicate<Object, Object> equals = (x, y) -> { | ||
if ((x == null) || (y == null)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can be replaced by (x, y) -> Objects.equals (x, y)
Signed-off-by: Chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one minor comment that can be addressed in a follow up PR.
interface Operator<T> { | ||
Integer getSymbol(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All interfaces should be documented. Please add java docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still unsure how we will use the OperatorFactory
and thus I'm not quite sure this is the best interface. But, I'm fine to approve this PR.
|
||
@Bean | ||
public NumericCompareOperator greaterThanOperator() { | ||
final Map<Class<? extends Number>, Map<Class<? extends Number>, BiFunction<Object, Object, Boolean>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend having made a GreaterThanOperator
which inherits from NumericCompareOperator
. It would provide these functions in the constructor. That would keep this class straightforward and simple to test.
class GreaterThanOperator {
private GreaterThanOperator(Map<Class<? extends Number>, Map<Class<? extends Number>, BiFunction<Object, Object, Boolean>>> operandsToOperationMap) {
super(DataPrepperExpressionParser.GT, operandsToOperationMap);
}
public static GreaterThanOperator create {
/* All the code currently in greaterThanOperator() */
return new GreaterThanOperator(operandsToOperationMap);
}
}
This is all package protected code, so I'm fine with what is here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pattern you suggest is closer to the following:
@Named
class GreaterThanOperator extends NumericCompareOperator{
@Inject // @Inject probably not necessary
public GreaterThanOperator() {
Map<Class<? extends Number>, Map<Class<? extends Number>, BiFunction<Object, Object, Boolean>>> operandsToOperationMap = ...
super(DataPrepperExpressionParser.GT, operandsToOperationMap);
}
}
TBH, I do not know from DI point view which pattern is favored, and there maybe other reasonings. We kind of already swang back and forth here. Since it is currently not blocking, I will go ahead and merge this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You actually cannot do what you have above because super
must be the first line in the constructor. You could add a static method which creates the Map
and have super(DataPrepperExpressionParser.GT, createOperationsMap());
. Or take the approach I had.
So this will probably be more manifest in next PR on ParseTreeEvaluator and ParseTreeEvaluatorListener. Just a headsup here:
The method annotated by @bean in OperatorFactory will be collected into the List<Operator<?>> by spring framework. |
Signed-off-by: Chen [email protected]
Description
This PR
Issues Resolved
Contributes to #1003
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.