Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] ES|QL TelemetryIT.testMetrics failure #116003

Closed
thecoop opened this issue Oct 31, 2024 · 9 comments
Closed

[CI] ES|QL TelemetryIT.testMetrics failure #116003

thecoop opened this issue Oct 31, 2024 · 9 comments
Labels
:Analytics/ES|QL AKA ESQL low-risk An open issue or test failure that is a low risk to future releases Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test-failure Triaged test failures from CI

Comments

@thecoop
Copy link
Member

thecoop commented Oct 31, 2024

CI Link

https://gradle-enterprise.elastic.co/s/xrzv7jdcyi4ky/console-log/raw

Repro line

./gradlew ":x-pack:plugin:esql:internalClusterTest" --tests "org.elasticsearch.xpack.esql.action.TelemetryIT.testMetrics"

Does it reproduce?

Yes

Applicable branches

main

Failure history

No response

Failure excerpt

1> [2024-10-31T08:08:55,817][INFO ][o.e.x.e.a.TelemetryIT    ] [testMetrics] [TelemetryIT#testMetrics {TestCase=Test[query=FROM idx
  1> | EVAL ip = to_ip(host), x = to_string(host), y = to_string(host)
  1> | INLINESTATS max(id)
  1> , expectedCommands={INLINESTATS=1, STATS=1, FROM=1, EVAL=1}, expectedFunctions={TO_STRING=2, MAX=1, TO_IP=1}, success=true]}]: cleaned up after test
  1> [2024-10-31T08:08:55,818][INFO ][o.e.x.e.a.TelemetryIT    ] [testMetrics] [TestCase=Test[query=FROM idx
  1> | EVAL ip = to_ip(host), x = to_string(host), y = to_string(host)
  1> | INLINESTATS max(id)
  1> , expectedCommands={INLINESTATS=1, STATS=1, FROM=1, EVAL=1}, expectedFunctions={TO_STRING=2, MAX=1, TO_IP=1}, success=true]] after test
  2> REPRODUCE WITH: ./gradlew ":x-pack:plugin:esql:internalClusterTest" --tests "org.elasticsearch.xpack.esql.action.TelemetryIT.testMetrics {TestCase=Test[query=FROM idx
  2> | EVAL ip = to_ip(host), x = to_string(host), y = to_string(host)
  2> | INLINESTATS max(id)
  2> , expectedCommands={INLINESTATS=1, STATS=1, FROM=1, EVAL=1}, expectedFunctions={TO_STRING=2, MAX=1, TO_IP=1}, success=true]}" -Dtests.seed=2C24155C2F138C18 -Dtests.locale=es-PH -Dtests.timezone=America/Miquelon -Druntime.java=23
  2> REPRODUCE WITH: ./gradlew ":x-pack:plugin:esql:internalClusterTest" --tests "org.elasticsearch.xpack.esql.action.TelemetryIT.testMetrics {TestCase=Test[query=FROM idx
  2> | EVAL ip = to_ip(host), x = to_string(host), y = to_string(host)
  2> | INLINESTATS max(id)
  2> , expectedCommands={INLINESTATS=1, STATS=1, FROM=1, EVAL=1}, expectedFunctions={TO_STRING=2, MAX=1, TO_IP=1}, success=true]}" -Dtests.seed=2C24155C2F138C18 -Dtests.locale=es-PH -Dtests.timezone=America/Miquelon -Druntime.java=23
  2> REPRODUCE WITH: ./gradlew ":x-pack:plugin:esql:internalClusterTest" --tests "org.elasticsearch.xpack.esql.action.TelemetryIT.testMetrics {TestCase=Test[query=FROM idx
  2> | EVAL ip = to_ip(host), x = to_string(host), y = to_string(host)
  2> | INLINESTATS max(id)
  2> , expectedCommands={INLINESTATS=1, STATS=1, FROM=1, EVAL=1}, expectedFunctions={TO_STRING=2, MAX=1, TO_IP=1}, success=true]}" -Dtests.seed=2C24155C2F138C18 -Dtests.locale=es-PH -Dtests.timezone=America/Miquelon -Druntime.java=23
  2> java.lang.AssertionError: 
    Expected: is <true>
         but: was <false>
        at __randomizedtesting.SeedInfo.seed([2C24155C2F138C18]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
        at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2543)
        at org.elasticsearch.xpack.esql.action.TelemetryIT.assertAllUsages(TelemetryIT.java:217)
        at org.elasticsearch.xpack.esql.action.TelemetryIT.lambda$testQuery$0(TelemetryIT.java:188)

    java.lang.AssertionError: 
    Expected: is <true>
         but: was <false>
        at __randomizedtesting.SeedInfo.seed([2C24155C2F138C18]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
        at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2543)
        at org.elasticsearch.xpack.esql.action.TelemetryIT.assertAllUsages(TelemetryIT.java:217)
        at org.elasticsearch.xpack.esql.action.TelemetryIT.lambda$testQuery$0(TelemetryIT.java:188)

    java.lang.AssertionError: 
    Expected: is <true>
         but: was <false>
        at __randomizedtesting.SeedInfo.seed([2C24155C2F138C18]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
        at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2543)
        at org.elasticsearch.xpack.esql.action.TelemetryIT.assertAllUsages(TelemetryIT.java:217)
        at org.elasticsearch.xpack.esql.action.TelemetryIT.lambda$testQuery$0(TelemetryIT.java:188)
@thecoop thecoop added :Analytics/ES|QL AKA ESQL >test-failure Triaged test failures from CI needs:triage Requires assignment of a team area label labels Oct 31, 2024
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) needs:risk Requires assignment of a risk label (low, medium, blocker) and removed needs:triage Requires assignment of a team area label labels Oct 31, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@luigidellaquila luigidellaquila added the medium-risk An open issue or test failure that is a medium risk to future releases label Oct 31, 2024
@luigidellaquila luigidellaquila self-assigned this Oct 31, 2024
@luigidellaquila luigidellaquila removed the needs:risk Requires assignment of a risk label (low, medium, blocker) label Oct 31, 2024
@luigidellaquila
Copy link
Contributor

I think it's related to #115813

The underlying error is a query planning problem:

java.lang.IllegalStateException: can't find input for [max(id){r}#464]

the query is

FROM idx
| EVAL ip = to_ip(host), x = to_string(host), y = to_string(host)
| INLINESTATS max(id)

The failure is quite deterministic, I'll disable that specific query from the telemetry tests to avoid too much noise.

@costin maybe you have a clue on the root cause, this test never failed before AFAIK

luigidellaquila added a commit to luigidellaquila/elasticsearch that referenced this issue Oct 31, 2024
@luigidellaquila
Copy link
Contributor

Related to #110923

@costin
Copy link
Member

costin commented Oct 31, 2024

Thanks! Needs more investigation since as you said, this test passed so far...

@costin
Copy link
Member

costin commented Oct 31, 2024

Backport to 8.x at #116045

elasticsearchmachine pushed a commit that referenced this issue Oct 31, 2024
* ESQL: Refactor Join inside the planner (#115813)

First PR that introduces a Join as a first class citizen in the planner.
Previously the Join was modeled as a unary node, embedding the right
side as a local relationship inside the node but not exposed as a child.
This caused a lot the associated methods (like references, output and
inputSet) to misbehave and the physical plan rules to pick incorrect
information, such as trying to extract the local relationship fields
from the underlying source - the fix was to the local relationship
fields as ReferenceAttribute (which of course had its own set of
issues). Essentially Join was acting both as a source and as a streaming
operator.

This PR looks to partially address this by:
- refactoring Join into a proper binary node with left and right
 branches which are used for its references and input/outputSet.
- refactoring InlineStats to prefer composition and move the Aggregate
 on the join right branch. This reuses the Aggregate resolution out of
 the box; in the process remove the Stats interface.
- update some of the planner rules that only worked with Unary nodes.
- refactor Mapper into (coordinator) Mapper and LocalMapper.
- remove Phased interface by moving its functionality inside the planner
(no need to unpack the phased classes, the join already indicates the
two branches needed).
- massage the Phased execution inside EsqlSession
- improve FieldExtractor to handle binary nodes
- fix incorrect references in Lookup
- generalize ProjectAwayColumns rule

Relates #112266

Not all inline and lookup tests are passing:
- 2 lookup fields are failing due to name clashes (qualifiers should
 fix this)
- 7 or so inline failures with a similar issue

I've disabled the tests for now to have them around once we complete
 adding the functionality.

(cherry picked from commit 4ee98e8)

* ES|QL: Mute test for #116003 (#116005)

(cherry picked from commit 681f509)

---------

Co-authored-by: Luigi Dell'Aquila <[email protected]>
@luigidellaquila luigidellaquila removed their assignment Nov 4, 2024
jfreden pushed a commit to jfreden/elasticsearch that referenced this issue Nov 4, 2024
@alex-spies
Copy link
Contributor

This still fails (when unmuted) with current main. We currently do not work on INLINESTATS and the current work on JOINs will likely put INLINESTATS in an even worse state before we come back to it and finish it.

That is, until we pick up work on INLINESTATS again, this has to remain muted and this test failure cannot be worked on.

@alex-spies alex-spies added low-risk An open issue or test failure that is a low risk to future releases and removed medium-risk An open issue or test failure that is a medium risk to future releases labels Nov 19, 2024
@nik9000
Copy link
Member

nik9000 commented Nov 20, 2024

I think we can drop the usage of INLINESTATS in the test, right?

@alex-spies
Copy link
Contributor

That's already done in Luigi's mute.

Since the CI failure itself is resolved and we currently don't work on INLINESTATS, maybe it's better to add this failed query to the INLINESTATS follow-up meta issue than having a separate issue for it that'll stay open for a while.

@alex-spies
Copy link
Contributor

Moved the corresponding work on INLINESTATS into its meta issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL low-risk An open issue or test failure that is a low risk to future releases Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

6 participants