Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Implement CASE... WHEN... THEN... ELSE... END #41349

Merged
merged 12 commits into from
Apr 22, 2019

Conversation

matriv
Copy link
Contributor

@matriv matriv commented Apr 18, 2019

Implement the ANSI SQL CASE expression which provides the if/else
functionality common to most programming languages.

The CASE expression can have multiple WHEN branches and becomes a
powerful tool for SQL queries as it can be used in SELECT, WHERE,
GROUP BY, HAVING and ORDER BY clauses.

Closes: #36200

Implement the ANSI SQL CASE expression which provides the if/else
functionality common to most programming languages.

The CASE expression can have multiple WHEN branches and becomes a
powerful tool for SQL queries as it can be used in SELECT, WHERE,
GROUP BY, HAVING and ORDER BY clauses.

Closes: elastic#36200
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@@ -25,6 +25,7 @@ STDDEV_POP |AGGREGATE
SUM_OF_SQUARES |AGGREGATE
VAR_POP |AGGREGATE
HISTOGRAM |GROUPING
CASE |CONDITIONAL
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you agree with listing CASE as a conditional function?
Case is quite special and doesn't follow at all the traditional functional format.

private final Expression defaultElse;

@SuppressWarnings("unchecked")
public Case(Source source, List<Expression> expressions) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "weird" ctor is here as public instead of a more useful (for the ExpressionBuilder) one which would like:
public Case(Source source, List<IfConditional> conditionals, Expression defaultElse) because of FunctionRegistry which gets confused if there are 2 public ctors, and obviously the one with IfConditional cannot be used in the FunctionRegistry.

super(source, expressions);
this.conditions = (List<IfConditional>) (List<?>) expressions.subList(0, expressions.size() - 1);
this.defaultElse = expressions.get(expressions.size() - 1);
setDataType();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matriv matriv requested review from costin and astefan April 19, 2019 09:34
@matriv matriv removed the WIP label Apr 19, 2019
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I left some comments.


*Input*:

Multiple but at least one WHEN *condition* THEN *result* clause and optional ELSE *defaultResult* clause.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence sounds weird. Maybe get rid of "Multiple" and rephrase?
At least one _WHEN *condition* THEN *result*_ clause is required and the expression can optionally have an _ELSE *default_result*_ clause. Every *condition* must be boolean expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rephrase with something like "One or multiple WHEN ... are used and the ..."

Multiple but at least one WHEN *condition* THEN *result* clause and optional ELSE *defaultResult* clause.
Every *condition* should be boolean expression.

*Output*: one of the *result* expressions if the corresponding WHEN *condition* evaluates to `true`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Output*: one of the *result* expressions if the corresponding _WHEN *condition*_ evaluates to trueor the *default_result* if all _WHEN *condition*_ clauses evaluate tofalse. If the optional _ELSE *default_result*_ clause is missing and all _WHEN *condition*_ clauses evaluate to falsethennull is returned.

Case c = (Case) e;

// Remove or foldable conditions that fold to FALSE
// Stop at the 1st foldable condition taht folds to TRUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: that.


@Override
protected Case mutate(Case instance) {
return randomCase();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this mutate logic will always have a changed, non-equal to original, instance of CASE. I think the mutation should have as many IfConditional instances as constructor values that can change (2 in this case with randomStringLiteral()). And from Case point of view, it will also need to mutate the randomIntLiteral() parameter as well. Something like this: https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/expression/function/scalar/string/ConcatFunctionPipeTests.java#L96

}

@Override
public void testTransform() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this one empty? Same for testReplaceChildren.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this Class as a quick fix to the failing NodeSubclassTests (because of the special IfConditional as an expression) and forgot to properly implement everything there.

Fixed.


@Override
protected TypeResolution resolveType() {
DataType expectedResultDataType = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not initializing expectedResultDataType directly with defaultElse().dataType()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we want to match the results of the THEN <result> clauses to the 1st non-null result of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that happen anyway in the following loop (that checks the conditions)?

}

for (IfConditional conditional : conditions) {
dataType = DataTypeConversion.commonType(dataType, conditional.dataType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case conditions.isEmpty() == false the first commonType check is with itself, isn't it? Maybe worth doing a for with an index that starts either at 0 or 1 depending on the previous isEmpty() check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I would just start with the defaultElse() type and then iterate over the conditions - this avoid repetition.

SELECT emp_no, CASE WHEN emp_no - 10000 < 10 THEN 'First 10' ELSE 'Second 10' END as "case" FROM test_emp WHERE emp_no >= 10005
ORDER BY emp_no LIMIT 10;

emp_no | case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following this PR - #41355 - can you change one of these tests to not have an alias?

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. This feature will make a lot of folks happy.


*Input*:

Multiple but at least one WHEN *condition* THEN *result* clause and optional ELSE *defaultResult* clause.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rephrase with something like "One or multiple WHEN ... are used and the ..."

@Override
public boolean foldable() {
return defaultElse.foldable() && (conditions.isEmpty() ||
(conditions.size() == 1 && conditions.get(0).condition().foldable() && conditions.get(0).result().foldable()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this method is correct - why is the defaultElse checked first - if a condition is foldable and true, defaultElse is not relevant.
I think it should be:

return (conditions.isEmpty() && defaultElse.foldable()) || (conditions.size() > 0 && conditions.get(0).condition().foldable());

return defaultElse.fold();
}
if (conditions.get(0).condition().fold() == Boolean.TRUE) {
return conditions.get(0).result().fold();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaultElse.fold() repetition should be avoided:

if (conditions.size() > 0 && conditions.get(0).condition().fold() == Boolean.TRUE) {
   ...
}

return defaultElse.fold();

}

for (IfConditional conditional : conditions) {
dataType = DataTypeConversion.commonType(dataType, conditional.dataType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I would just start with the defaultElse() type and then iterate over the conditions - this avoid repetition.

import java.util.Objects;

/**
* Helper function (cannot be directly from a query) to model a {@code WHEN <condition> ELSE <result>}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering this does not extend Function I wouldn't call it a Helper function - instead a conditional expression mapping WHEN ELSE backing CASE and potentially IIF

this(source, Arrays.asList(condition, result));
}

private IfConditional(Source source, List<Expression> children) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this constructor needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason after all, it's a leftover from a different design. changing it.

private final Expression result;

public IfConditional(Source source, Expression condition, Expression result) {
this(source, Arrays.asList(condition, result));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this(source, asList(condition,result));
this.condition = condition;
this.result = result;

if (newChildren.size() < 2) {
throw new IllegalArgumentException("expected at least [2] children but received [" + newChildren.size() + "]");
}
return new IfConditional(source(), newChildren);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IfConditional(source(), newChildren.get(0), newChildren(1));

@matriv matriv requested review from astefan and costin April 22, 2019 11:44
@matriv
Copy link
Contributor Author

matriv commented Apr 22, 2019

@costin @astefan Thank you for the review.
Please check again, hope didn't miss anything.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

*Input*:

One or multiple _WHEN *condition* THEN *result_* clauses are used and the expression can optionally have
an _ELSE *default_result_* clause. Every *condition* should be boolean expression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a boolean expression

@matriv matriv merged commit 8b25774 into elastic:master Apr 22, 2019
@matriv matriv deleted the impl-36200 branch April 22, 2019 16:26
matriv added a commit that referenced this pull request Apr 22, 2019
Implement the ANSI SQL CASE expression which provides the if/else
functionality common to most programming languages.

The CASE expression can have multiple WHEN branches and becomes a
powerful tool for SQL queries as it can be used in SELECT, WHERE,
GROUP BY, HAVING and ORDER BY clauses.

Closes: #36200
(cherry picked from commit 8b25774)
matriv added a commit to matriv/elasticsearch that referenced this pull request May 3, 2019
Add a TIP on how to use CASE to achieve custom bucketing
with GROUP BY.

Follows: elastic#41349
matriv added a commit that referenced this pull request May 6, 2019
* SQL: [Docs] Add example for custom bucketing with CASE

Add a TIP on how to use CASE to achieve custom bucketing
with GROUP BY.

Follows: #41349

* address comments

* address comment
matriv added a commit that referenced this pull request May 6, 2019
Add a TIP on how to use CASE to achieve custom bucketing
with GROUP BY.

Follows: #41349

(cherry picked from commit eb5f5d4)
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Implement the ANSI SQL CASE expression which provides the if/else
functionality common to most programming languages.

The CASE expression can have multiple WHEN branches and becomes a
powerful tool for SQL queries as it can be used in SELECT, WHERE,
GROUP BY, HAVING and ORDER BY clauses.

Closes: elastic#36200
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
* SQL: [Docs] Add example for custom bucketing with CASE

Add a TIP on how to use CASE to achieve custom bucketing
with GROUP BY.

Follows: elastic#41349

* address comments

* address comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: support CASE (WHEN) expression
5 participants