Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] TransformInternalIndexIT.testUpdateDeletesOldTransformConfig #69057

Closed
andreidan opened this issue Feb 16, 2021 · 8 comments · Fixed by #69186 or #69617
Closed

[CI] TransformInternalIndexIT.testUpdateDeletesOldTransformConfig #69057

andreidan opened this issue Feb 16, 2021 · 8 comments · Fixed by #69186 or #69617
Assignees
Labels
:ml/Transform Transform Team:ML Meta label for the ML team >test-failure Triaged test failures from CI

Comments

@andreidan
Copy link
Contributor

andreidan commented Feb 16, 2021

Build scan: https://gradle-enterprise.elastic.co/s/qvjg4l7qfqdxq

Repro line:

./gradlew ':x-pack:plugin:transform:internalClusterTest' --tests "org.elasticsearch.xpack.transform.integration.TransformInternalIndexIT.testUpdateDeletesOldTransformConfig" \
  -Dtests.seed=180F5806D3F6A363 \
  -Dtests.security.manager=true \
  -Dtests.locale=hi \
  -Dtests.timezone=Europe/Volgograd \
  -Druntime.java=11 \
  -Dtests.fips.enabled=true

Reproduces locally?: no

Applicable branches: 7.12

Failure history:

Failure excerpt:

java.lang.AssertionError: test leaves indices that were not deleted: .transform-notifications-000002
Expected: []
     but: was [".transform-notifications-000002"]
	at __randomizedtesting.SeedInfo.seed([180F5806D3F6A363:75F6F23A53C81E91]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.elasticsearch.test.ESSingleNodeTestCase.tearDown(ESSingleNodeTestCase.java:134)
@andreidan andreidan added >test-failure Triaged test failures from CI :ml/Transform Transform labels Feb 16, 2021
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Feb 16, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@hendrikmuhs
Copy link

This seems to be a weakness in ESSingleNodeTestCase

When teardown is run, an index is initializing, so the delete call does not include it:

This message shows the shard initialization:

16:28:09   1> [2021-02-16T19:28:08,465][INFO ][o.e.c.r.a.AllocationService] [node_s_0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.transform-notifications-000002][0]]]).

but this message is after the test is already deleting indices:

16:28:09   1> [2021-02-16T19:28:08,238][INFO ][o.e.c.m.MetadataDeleteIndexService] [node_s_0] [.transform-internal-001/yG-HhVHuSp-OdgPQskwkIg] deleting index
16:28:09   1> [2021-02-16T19:28:08,239][INFO ][o.e.c.m.MetadataDeleteIndexService] [node_s_0] [transform-index-deletes-old/tIb4lwATTmijfRYNxL73kg] deleting index
16:28:09   1> [2021-02-16T19:28:08,239][INFO ][o.e.c.m.MetadataDeleteIndexService] [node_s_0] [.transform-internal-006/BNu6CXqJSbeFm3nK3uch1w] deleting index

ESRestTestCase has a method called ensureNoInitializingShards for that reason, it is called before running the teardown. I think we should add the same to ESSingleNodeTestCase.

@hendrikmuhs hendrikmuhs self-assigned this Feb 16, 2021
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Feb 19, 2021
@dimitris-athanasiou
Copy link
Contributor

@matriv
Copy link
Contributor

matriv commented Feb 23, 2021

Another failure (7.12 branch): https://gradle-enterprise.elastic.co/s/sthnt7j7vleza

@matriv
Copy link
Contributor

matriv commented Feb 23, 2021

hendrikmuhs pushed a commit that referenced this issue Feb 23, 2021
…#69186)

ensure shards aren't initializing at test teardown, so indexes that are initializing are not missed
for deletion.

fixes #69057
hendrikmuhs pushed a commit that referenced this issue Feb 24, 2021
…#69186)

ensure shards aren't initializing at test teardown, so indexes that are initializing are not missed
for deletion.

fixes #69057
hendrikmuhs pushed a commit that referenced this issue Feb 24, 2021
…#69186)

ensure shards aren't initializing at test teardown, so indexes that are initializing are not missed
for deletion.

fixes #69057
hendrikmuhs pushed a commit that referenced this issue Feb 24, 2021
…#69186)

ensure shards aren't initializing at test teardown, so indexes that are initializing are not missed
for deletion.

fixes #69057
@henningandersen
Copy link
Contributor

Looks like this still fails with same error in:

https://gradle-enterprise.elastic.co/s/pysgbxwlswsqq

java.lang.AssertionError: test leaves indices that were not deleted: .transform-notifications-000002

based on commit: 1430e52, which includes #69419.

@droberts195
Copy link
Contributor

droberts195 commented Feb 25, 2021

I have an idea for what this is. In the REST tests we wait for pending tasks before deleting indices:

One of the reasons we had to add that was because of the way an async side effect of deleting an ML job is to write a notification to the notifications index, and that could end up creating the index after it had been deleted by the end-of-test cleanup.

I expect the same is true of updating a transform. If so what's happening is that .transform-notifications-000002 is getting deleted by the end-of-test cleanup, but then sometimes gets recreated in between that and the check of whether the test left indices behind.

So when I moved testUpdateDeletesOldTransformConfig from being a REST test to being an internal cluster test (due to the system indices changes) I needed to copy the "wait for pending tasks" bit to the internal cluster test cleanup, but didn't. However, since this single test seems to be the only one that actually needs this functionality, it's probably better to add it as a bespoke @After method for this particular test file, so that it doesn't slow down other tests. I will try that.

@droberts195
Copy link
Contributor

I expect the same is true of updating a transform.

Yes, it's in:

droberts195 added a commit that referenced this issue Mar 1, 2021
Fixes an internal cluster test that was migrated from being
a REST test by the system indices project.

The REST test framework waits for pending tasks between tests
whereas the internal cluster test framework does not.  The
particular test that was migrated requires this functionality,
as updating a transform triggers an async side effect: writing
an audit message.

This PR does not add the wait for pending tasks step to all
internal cluster tests.  Clearly it is not required for most,
and could slow down tests that don't require it.  The new
method can be called from elsewhere or moved up the class
hierarchy in the future if it is needed elsewhere in the
future.

Closes #69057
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml/Transform Transform Team:ML Meta label for the ML team >test-failure Triaged test failures from CI
Projects
None yet
7 participants