Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SearchableSnapshotsLicenseIntegTests.testShardAllocationOnInvalidLicense fails when recreating its license #72329

Closed
nik9000 opened this issue Apr 27, 2021 · 5 comments · Fixed by #72528 or #77757
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs feedback_needed Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@nik9000
Copy link
Member

nik9000 commented Apr 27, 2021

Build scan:
https://gradle-enterprise.elastic.co/s/2bpdnr5jqux6y/tests

Repro line:

./gradlew ':x-pack:plugin:searchable-snapshots:internalClusterTest' --tests "org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsLicenseIntegTests.testShardAllocationOnInvalidLicense" -Dtests.seed=5470D1E16498B727 -Dtests.locale=be -Dtests.timezone=Canada/Pacific -Druntime.java=11

Reproduces locally?: no

Applicable branches: master

Failure history:
https://build-stats.elastic.co/goto/d7017643bd9139fb127d1f231d6d81d2

image

Failure excerpt:

java.lang.AssertionError: expected:<UPGRADED_TO_TRIAL> but was:<TRIAL_ALREADY_ACTIVATED> |  
...
at __randomizedtesting.SeedInfo.seed([5470D1E16498B727:D8600A79DC686337]:0)
@nik9000 nik9000 added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Apr 27, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Apr 27, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@tlrx tlrx self-assigned this Apr 28, 2021
tlrx added a commit that referenced this issue May 3, 2021
…License (#72528)

This test fails sometimes on CI (see #72329) when recreating the 
license. It's not clear to me why that happens but I suspect batched 
cluster state updates, so this pull request adds some waiting points 
in the test.

Closes #72329
tlrx added a commit to tlrx/elasticsearch that referenced this issue May 3, 2021
…License (elastic#72528)

This test fails sometimes on CI (see elastic#72329) when recreating the 
license. It's not clear to me why that happens but I suspect batched 
cluster state updates, so this pull request adds some waiting points 
in the test.

Closes elastic#72329
tlrx added a commit that referenced this issue May 3, 2021
…License (#72528) (#72611)

This test fails sometimes on CI (see #72329) when recreating the 
license. It's not clear to me why that happens but I suspect batched 
cluster state updates, so this pull request adds some waiting points 
in the test.

Backport of #72528
@droberts195
Copy link
Contributor

It looks like the same error just occurred in https://gradle-enterprise.elastic.co/s/h7xi5tf25tkjm

@tlrx
Copy link
Member

tlrx commented Jun 28, 2021

I merged #74621 to add some debug logging traces, hopefully this will help me to understand why the test fails.

@tlrx
Copy link
Member

tlrx commented Aug 4, 2021

We're waiting for this CI failure to happen again so that we can do more investigation now more logging was added.

In case someone see a CI failure please add the build scan here 🙏🏻

@tlrx
Copy link
Member

tlrx commented Sep 15, 2021

Failed again today in https://gradle-enterprise.elastic.co/s/o5xbwrtlotewu

tlrx added a commit to tlrx/elasticsearch that referenced this issue Sep 15, 2021
…lidLicense

This tests sometimes fails because it expects the last PostStartTrialRequest
to always "upgrade" the current license that it just nullified to a trial
license; but there is a race in this test with the LicenceService that detects
that no license exists in the cluster state (because the test set it to null)
and self generates a trial license for the cluster too. When the self generation
is processed before the PostStartTrialRequest the latter will return a
TRIAL_ALREADY_ACTIVATED response.

Since the purpose of this test is to verify that the searchable snapshot shards
failed when the license change and came back when the trial license if activated
again, I think we can just adjust the test to accommodate for the 2 types of
responses.

Closes elastic#72329
tlrx added a commit that referenced this issue Sep 15, 2021
…lidLicense (#77757)

This tests sometimes fails because it expects the last PostStartTrialRequest
to always "upgrade" the current license that it just nullified to a trial
license; but there is a race in this test with the LicenceService that detects
that no license exists in the cluster state (because the test set it to null)
and self generates a trial license for the cluster too. When the self generation
is processed before the PostStartTrialRequest the latter will return a
TRIAL_ALREADY_ACTIVATED response.

Since the purpose of this test is to verify that the searchable snapshot shards
failed when the license change and came back when the trial license if activated
again, I think we can just adjust the test to accommodate for the 2 types of
responses.

Closes #72329
tlrx added a commit to tlrx/elasticsearch that referenced this issue Sep 15, 2021
…lidLicense (elastic#77757)

This tests sometimes fails because it expects the last PostStartTrialRequest
to always "upgrade" the current license that it just nullified to a trial
license; but there is a race in this test with the LicenceService that detects
that no license exists in the cluster state (because the test set it to null)
and self generates a trial license for the cluster too. When the self generation
is processed before the PostStartTrialRequest the latter will return a
TRIAL_ALREADY_ACTIVATED response.

Since the purpose of this test is to verify that the searchable snapshot shards
failed when the license change and came back when the trial license if activated
again, I think we can just adjust the test to accommodate for the 2 types of
responses.

Closes elastic#72329
tlrx added a commit to tlrx/elasticsearch that referenced this issue Sep 15, 2021
…lidLicense (elastic#77757)

This tests sometimes fails because it expects the last PostStartTrialRequest
to always "upgrade" the current license that it just nullified to a trial
license; but there is a race in this test with the LicenceService that detects
that no license exists in the cluster state (because the test set it to null)
and self generates a trial license for the cluster too. When the self generation
is processed before the PostStartTrialRequest the latter will return a
TRIAL_ALREADY_ACTIVATED response.

Since the purpose of this test is to verify that the searchable snapshot shards
failed when the license change and came back when the trial license if activated
again, I think we can just adjust the test to accommodate for the 2 types of
responses.

Closes elastic#72329
tlrx added a commit to tlrx/elasticsearch that referenced this issue Sep 15, 2021
…lidLicense (elastic#77757)

This tests sometimes fails because it expects the last PostStartTrialRequest
to always "upgrade" the current license that it just nullified to a trial
license; but there is a race in this test with the LicenceService that detects
that no license exists in the cluster state (because the test set it to null)
and self generates a trial license for the cluster too. When the self generation
is processed before the PostStartTrialRequest the latter will return a
TRIAL_ALREADY_ACTIVATED response.

Since the purpose of this test is to verify that the searchable snapshot shards
failed when the license change and came back when the trial license if activated
again, I think we can just adjust the test to accommodate for the 2 types of
responses.

Closes elastic#72329
elasticsearchmachine pushed a commit that referenced this issue Sep 15, 2021
…lidLicense (#77757) (#77761)

This tests sometimes fails because it expects the last PostStartTrialRequest
to always "upgrade" the current license that it just nullified to a trial
license; but there is a race in this test with the LicenceService that detects
that no license exists in the cluster state (because the test set it to null)
and self generates a trial license for the cluster too. When the self generation
is processed before the PostStartTrialRequest the latter will return a
TRIAL_ALREADY_ACTIVATED response.

Since the purpose of this test is to verify that the searchable snapshot shards
failed when the license change and came back when the trial license if activated
again, I think we can just adjust the test to accommodate for the 2 types of
responses.

Closes #72329
elasticsearchmachine pushed a commit that referenced this issue Sep 15, 2021
…lidLicense (#77757) (#77760)

This tests sometimes fails because it expects the last PostStartTrialRequest
to always "upgrade" the current license that it just nullified to a trial
license; but there is a race in this test with the LicenceService that detects
that no license exists in the cluster state (because the test set it to null)
and self generates a trial license for the cluster too. When the self generation
is processed before the PostStartTrialRequest the latter will return a
TRIAL_ALREADY_ACTIVATED response.

Since the purpose of this test is to verify that the searchable snapshot shards
failed when the license change and came back when the trial license if activated
again, I think we can just adjust the test to accommodate for the 2 types of
responses.

Closes #72329
elasticsearchmachine pushed a commit that referenced this issue Sep 15, 2021
…lidLicense (#77757) (#77759)

This tests sometimes fails because it expects the last PostStartTrialRequest
to always "upgrade" the current license that it just nullified to a trial
license; but there is a race in this test with the LicenceService that detects
that no license exists in the cluster state (because the test set it to null)
and self generates a trial license for the cluster too. When the self generation
is processed before the PostStartTrialRequest the latter will return a
TRIAL_ALREADY_ACTIVATED response.

Since the purpose of this test is to verify that the searchable snapshot shards
failed when the license change and came back when the trial license if activated
again, I think we can just adjust the test to accommodate for the 2 types of
responses.

Closes #72329
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs feedback_needed Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
4 participants