Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] LogicalPlanOptimizerTests testSimplifyComparisionArithmetics_floatDivision failing #108524

Closed
not-napoleon opened this issue May 10, 2024 · 7 comments · Fixed by #111376
Closed
Assignees
Labels
:Analytics/ES|QL AKA ESQL low-risk An open issue or test failure that is a low risk to future releases Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test-failure Triaged test failures from CI

Comments

@not-napoleon
Copy link
Member

The SimplifyComparisonArithimetics optimizer fails to optimize the expression 2 / float < 4. It seems like on the first pass through, it generates the expression float * 4.0 > 2, but then doesn't further optimize that to float > 0.5. I think it detects a floating point multiplication in the second expression, which is considered an unsafe optimization.

Build scan:
https://gradle-enterprise.elastic.co/s/zn7kv6lwmdrow/tests/:x-pack:plugin:esql:test/org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizerTests/testSimplifyComparisionArithmetics_floatDivision

Reproduction line:

./gradlew ':x-pack:plugin:esql:test' --tests "org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizerTests.testSimplifyComparisionArithmetics_floatDivision" -Dtests.seed=A8AF70E4F3429963 -Dtests.locale=zh-TW -Dtests.timezone=Asia/Chungking -Druntime.java=21

Applicable branches:
main

Reproduces locally?:
Yes

Failure history:
Failure dashboard for org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizerTests#testSimplifyComparisionArithmetics_floatDivision

Failure excerpt:

java.lang.AssertionError: Expected left side of comparison to be a field attribute but found [4]

  at __randomizedtesting.SeedInfo.seed([A8AF70E4F3429963:57AA5A3E32A7E354]:0)
  at org.junit.Assert.fail(Assert.java:89)
  at org.junit.Assert.assertTrue(Assert.java:42)
  at org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizerTests.doTestSimplifyComparisonArithmetics(LogicalPlanOptimizerTests.java:4421)
  at org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizerTests.testSimplifyComparisionArithmetics_floatDivision(LogicalPlanOptimizerTests.java:4487)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

@not-napoleon not-napoleon added :Analytics/ES|QL AKA ESQL >test-failure Triaged test failures from CI labels May 10, 2024
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) needs:risk Requires assignment of a risk label (low, medium, blocker) labels May 10, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@fang-xing-esql fang-xing-esql added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels May 22, 2024
@alex-spies alex-spies assigned alex-spies and unassigned alex-spies Jul 1, 2024
@fang-xing-esql
Copy link
Member

fang-xing-esql commented Jul 26, 2024

This test is originally from OptimizerRunTests in SQL, I compared the trace side by side between SQL and ES|QL, and it is related to dealing with fixed point numbers indeed.

Here is the trace from ESQL, after the first iteration to apply SimplifyComparisonsArithmetics, the predicate is transformed into 2[INTEGER] < 4.0[DOUBLE] * float{f}#9. In the second iteration of SimplifyComparisonsArithmetics, it is considered unsafe to transform it into float{f}#9 > 0.5, because the right hand side is a double, not a fixed point number.

[TRACE][o.e.x.e.o.LogicalPlanOptimizer] [testSimplifyComparisionArithmetics_floatDivision] Rule rules.SimplifyComparisonsArithmetics applied
Limit[1000[INTEGER]]                                                         = Limit[1000[INTEGER]]
\_Filter[2[INTEGER] / float{f}#9 < 4[INTEGER]]                               ! \_Filter[2[INTEGER] < 4.0[DOUBLE] * float{f}#9]
  \_EsRelation[types][!alias_integer, boolean{f}#4, byte{f}#5, constant_k..] =   \_EsRelation[types][!alias_integer, boolean{f}#4, byte{f}#5, constant_k..]

However, in SQL, after the first iteration of SimplifyComparisonsArithmetics, the predicate is transformed into one with different data type - 2[INTEGER] < 4.0[INTEGER] * test.float{f}#18, which makes it considered safe in the second iteration of SimplifyComparisonsArithmetics, 4.0[INTEGER] looks weird and confusing, could be a bug in SQL?

[TRACE][o.e.x.s.o.Optimizer      ] [testSimplifyComparisonArithmeticCommutativeVsNonCommutativeOps] Rule optimizer.OptimizerRules$SimplifyComparisonsArithmetics applied with changes
Project[[test.some.string{f}#6]]                                            = Project[[test.some.string{f}#6]]
\_Filter[2[INTEGER] / test.float{f}#18 < 4[INTEGER]]                        ! \_Filter[2[INTEGER] < 4.0[INTEGER] * test.float{f}#18]
  \_EsRelation[test][date{f}#4, some{f}#5, some.string{f}#6, some.string..] =   \_EsRelation[test][date{f}#4, some{f}#5, some.string{f}#6, some.string..]
...
[TRACE][o.e.x.s.o.Optimizer      ] [testSimplifyComparisonArithmeticCommutativeVsNonCommutativeOps] Rule optimizer.OptimizerRules$SimplifyComparisonsArithmetics applied with changes
Project[[test.some.string{f}#6]]                                            = Project[[test.some.string{f}#6]]
\_Filter[test.float{f}#18 * 4.0[INTEGER] > 2[INTEGER]]                      ! \_Filter[test.float{f}#18 > 0.5[INTEGER]]
  \_EsRelation[test][date{f}#4, some{f}#5, some.string{f}#6, some.string..] =   \_EsRelation[test][date{f}#4, some{f}#5, some.string{f}#6, some.string..]

If the isUnsafe checks are valid, this is working as expected.

@not-napoleon
Copy link
Member Author

That 4.0[INTEGER] from the SQL output looks very familiar. I suspect fixing #108388 exposed this behavior. That class cast exception was caused by exactly that type of thing, having a double value with an integer data type. I wonder if SQL is doing the wrong thing here.

@fang-xing-esql
Copy link
Member

fang-xing-esql commented Jul 26, 2024

That 4.0[INTEGER] from the SQL output looks very familiar. I suspect fixing #108388 exposed this behavior. That class cast exception was caused by exactly that type of thing, having a double value with an integer data type. I wonder if SQL is doing the wrong thing here.

Here are the difference between ES|QL's SimplifyComparisonsArithmetics and QL/SQL's, ES|QL specify DOUBLE when when creating a new Literal, in this case 4.0[DOUBLE], however QL/SQL uses the original Literal's type and created 4.0[INTEGER], IMO it is wrong.

ES|QL

    final Expression apply() {
            // force float point folding for FlP field
            Literal bcl = operation.dataType().isRationalNumber()
                ? new Literal(bcLiteral.source(), ((Number) bcLiteral.value()).doubleValue(), DataType.DOUBLE)
                : bcLiteral;

QL

            final Expression apply() {
                // force float point folding for FlP field
                Literal bcl = operation.dataType().isRational()
                    ? Literal.of(bcLiteral, ((Number) bcLiteral.value()).doubleValue())
                    : bcLiteral;

@not-napoleon
Copy link
Member Author

Ah, yeah, I remember that change. Using Literal.of there definitely seemed wrong, for exactly that reason. I'm inclined to agree with you that this behavior is correct and SQL is doing the wrong thing, and we should consider fixing this in SQL's version of the optimization. @bpintea or @astefan what do you think?

@bpintea
Copy link
Contributor

bpintea commented Jul 26, 2024

Yes, the QL code seems incorrect as it doesn't actually enforce the type -- operation's doesn't have to be same as bcLiteral's and Literal.of() invoked without a type takes first argument's type.
Wondering what masks this issue, how come it didn't surface earlier.

@not-napoleon
Copy link
Member Author

I only found it in ES|QL because it was getting a class cast exception trying to reconcile the blocks. I suspect since we don't have typed blocks in SQL, we never saw the type miss-match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL low-risk An open issue or test failure that is a low risk to future releases Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants