Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] TransformUsageIT.testUsage fails with incorrect stats call count #52931

Closed
dakrone opened this issue Feb 27, 2020 · 6 comments · Fixed by #52946, #62086 or #65742
Closed

[CI] TransformUsageIT.testUsage fails with incorrect stats call count #52931

dakrone opened this issue Feb 27, 2020 · 6 comments · Fixed by #52946, #62086 or #65742
Assignees
Labels
:ml/Transform Transform >test-failure Triaged test failures from CI

Comments

@dakrone
Copy link
Member

dakrone commented Feb 27, 2020

In https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob+fast+part2/3953/consoleFull / https://gradle-enterprise.elastic.co/s/nobrqofhrecye/tests/vkcr26w2ldob6-j74o4svwlg7nm

The TransformUsageIT.testUsage test failed with:

java.lang.AssertionError: 
Incorrect stat trigger_count expected:<3.0> but was:<2.0>
at __randomizedtesting.SeedInfo.seed([5671C4D5CB7154:BCF1ED1069B7A9DD]:0)
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:553)
at org.elasticsearch.xpack.transform.integration.TransformUsageIT.lambda$testUsage$2(TransformUsageIT.java:105)
at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:881)
...

I was unable to reproduce this on master with:

./gradlew ':x-pack:plugin:transform:qa:single-node-tests:integTestRunner' --tests "org.elasticsearch.xpack.transform.integration.TransformUsageIT.testUsage" -Dtests.seed=5671C4D5CB7154 -Dtests.security.manager=true -Dtests.locale=th-TH -Dtests.timezone=Africa/Bissau -Dcompiler.java=13
@dakrone dakrone added >test-failure Triaged test failures from CI :ml/Transform Transform labels Feb 27, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@hendrikmuhs
Copy link

hendrikmuhs commented Feb 28, 2020

Thanks for the report. I took a look and it seems to be a bad test. trigger_count is a counter for how many times the transform has been triggered, this is not fully under control by the test but runs in a separate thread (scheduler thread). The scheduler is configured to fire every second, which is probably what happened. I will take out the test for trigger_count or change it to test "greater than or equal".

@hendrikmuhs hendrikmuhs self-assigned this Feb 28, 2020
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Feb 28, 2020
hendrikmuhs pushed a commit that referenced this issue Mar 3, 2020
hendrikmuhs pushed a commit that referenced this issue Mar 3, 2020
@davidkyle
Copy link
Member

Another failure, reopening this ticket

Incorrect stat trigger_count
Expected: a value equal to or greater than <3.0>
     but: <2.0> was less than <3.0>

at __randomizedtesting.SeedInfo.seed([78BA13B22EB61635:C41D8F6692CACEBC]:0)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
at org.junit.Assert.assertThat(Assert.java:956)
at org.elasticsearch.xpack.transform.integration.TransformUsageIT.lambda$testUsage$2(TransformUsageIT.java:110)
at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:887)
at org.elasticsearch.xpack.transform.integration.TransformUsageIT.testUsage(TransformUsageIT.java:95)

https://gradle-enterprise.elastic.co/s/zxok2tigiaaks/tests/vkcr26w2ldob6-j74o4svwlg7nm
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob+fast+part2/4257/console

@davidkyle davidkyle reopened this Mar 16, 2020
@mayya-sharipova
Copy link
Contributor

Another failure on master intake: https://gradle-enterprise.elastic.co/s/zxok2tigiaaks

Doesn't reproduce for me:
./gradlew ':x-pack:plugin:transform:qa:single-node-tests:integTestRunner' --tests "org.elasticsearch.xpack.transform.integration.TransformUsageIT.testUsage" -Dtests.seed=78BA13B22EB61635 -Dtests.security.manager=true -Dtests.locale=es-CO -Dtests.timezone=Pacific/Tongatapu -Dcompiler.java=13

  | 2> java.lang.AssertionError: Incorrect stat trigger_count |  
  | Expected: a value equal to or greater than <3.0> |  
  | but: <2.0> was less than <3.0> |  
  | at __randomizedtesting.SeedInfo.seed([78BA13B22EB61635:C41D8F6692CACEBC]:0) |  
  | at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) |  
  | at org.junit.Assert.assertThat(Assert.java:956) |  
  | at org.elasticsearch.xpack.transform.integration.TransformUsageIT.lambda$testUsage$2(TransformUsageIT.java:110)

hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Jul 16, 2020
hendrikmuhs pushed a commit that referenced this issue Jul 17, 2020
hendrikmuhs pushed a commit that referenced this issue Sep 28, 2020
refactor how state is persisted, call doSaveState only from the indexer thread, except there is none.

fixes #60781
fixes #52931
fixes #51629
fixes #52035
hendrikmuhs pushed a commit that referenced this issue Sep 28, 2020
refactor how state is persisted, call doSaveState only from the indexer thread, except there is none.

fixes #60781
fixes #52931
fixes #51629
fixes #52035
@jaymode
Copy link
Member

jaymode commented Nov 19, 2020

Looks like this still fails in CI. I just had the following intake build fail:

Build scan: https://gradle-enterprise.elastic.co/s/2st3zfjq7zm26

org.elasticsearch.xpack.transform.integration.TransformUsageIT > testUsage FAILED
    java.lang.AssertionError: Incorrect stat trigger_count
    Expected: a value equal to or greater than <3.0>
         but: <2.0> was less than <3.0>
        at __randomizedtesting.SeedInfo.seed([A9DF1BDCA7D30DA:B63A6D697601E853]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.elasticsearch.xpack.transform.integration.TransformUsageIT.lambda$testUsage$2(TransformUsageIT.java:112)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:950)
        at org.elasticsearch.xpack.transform.integration.TransformUsageIT.testUsage(TransformUsageIT.java:102)

Reproduce line:

./gradlew ':x-pack:plugin:transform:qa:single-node-tests:javaRestTest' --tests "org.elasticsearch.xpack.transform.integration.TransformUsageIT.testUsage" -Dtests.seed=A9DF1BDCA7D30DA -Dtests.security.manager=true -Dtests.locale=id -Dtests.timezone=Europe/Tirane -Druntime.java=11

Did not reproduce locally.

@jaymode jaymode reopened this Nov 19, 2020
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Nov 20, 2020
hendrikmuhs pushed a commit that referenced this issue Nov 20, 2020
log the full usage response on failure (individual counts are logged already)

relates #52931
hendrikmuhs pushed a commit that referenced this issue Nov 20, 2020
log the full usage response on failure (individual counts are logged already)

relates #52931
@hendrikmuhs
Copy link

The last failure gives us a hint, thanks to the extended logging:

usage counters:

pages_processed=4, 
index_time_in_ms=309, 
search_failures=0, 
index_failures=0, 
search_total=4, 
processing_total=4, 
documents_indexed=54, 
trigger_count=2, 
documents_processed=2000, 
search_time_in_ms=413, 
index_total=2, 
exponential_avg_checkpoint_duration_ms=1134.0, 
exponential_avg_documents_processed=2000.0, 
processing_time_in_ms=28, 
exponential_avg_documents_indexed=54.0}

test_usage/_stats:

pages_processed=2, 
index_time_in_ms=221, 
search_failures=0, 
index_failures=0, 
search_total=2, 
processing_total=2, 
documents_indexed=27, 
trigger_count=1, 
documents_processed=1000, 
search_time_in_ms=227, 
index_total=1, 
exponential_avg_checkpoint_duration_ms=746.0, 
exponential_avg_documents_processed=1000.0, 
processing_time_in_ms=27, 
exponential_avg_documents_indexed=27.0

test_usage_continuous/_stats

pages_processed=2, 
index_time_in_ms=88, 
search_failures=0, 
index_failures=0, 
search_total=2, 
processing_total=2, 
documents_indexed=27, 
trigger_count=2, 
documents_processed=1000, 
search_time_in_ms=186, 
index_total=1, 
exponential_avg_checkpoint_duration_ms=388.0, 
exponential_avg_documents_processed=1000.0, 
processing_time_in_ms=1, 
exponential_avg_documents_indexed=27.0

Usage returns a sum for all fields, this seems to work, except for trigger_count. The sum is 3 (2 + 1), but usage returns 2 (1+1, both must have been triggered at least once)

The difference between the _stats calls and the _usage calls is the way they are queried. _stats might be taken from in-memory information if the transform is running and from the index if not. _usage uses an aggregation query.

The mismatch seems to originate from a mismatch between the in memory and indexed information. A trigger by design never saves state, but transform only saves state if it is necessary.

It seems to me we should ignore trigger_count when comparing _stats with _usage. However as a minimal fix we could check gte 2, as both transform must have been triggered at lease once.

hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Dec 2, 2020
hendrikmuhs pushed a commit that referenced this issue Dec 2, 2020
relax test for trigger count, it might have been incremented but not persisted

fixes #52931
hendrikmuhs pushed a commit that referenced this issue Dec 2, 2020
relax test for trigger count, it might have been incremented but not persisted

fixes #52931
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment