Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid race condition in ILMHistorySotre #53039

Merged
merged 4 commits into from
Mar 4, 2020

Conversation

probakowski
Copy link
Contributor

This change modifies ILMHistoryStore to always apply correct settings and mappings,
even if template is deleted and not yet recreated. This ensures that ILM history index
is correctly managed by ILM and also fixes flaky history tests that were prone to
triggering this race.

This commit also refactors and simplifies ILM history tests.

Closes #50353 and #52853

This change modifies ILMHistoryStore to always apply correct settings and mappings,
even if template is deleted and not yet recreated. This ensures that ILM history index
is correctly managed by ILM and also fixes flaky history tests that were prone to
triggenring this race.

This commit also refactors and simplifies ILM history tests.

Closes elastic#50353 and elastic#52853
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@probakowski
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc

@probakowski
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @probakowski! LGTM

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to be overall, but I think these tests will still end up failing due to the race condition between index creation via _bulk requests and manually being checked in the history store.

I think in order to fully fix them we need to add the settings to make the history store as non-async as possible when indexing, so we can prevent the alias/concrete-index issues that were causing failures in the past.

We can definitely merge this (thanks for doing this!), but I think we may still end up with some failures down the road.

@probakowski
Copy link
Contributor Author

Thanks @andreidan and @dakrone, I'll merge it as soon as build passes.
I think right now window for race is pretty small. Only possibility I see is that something deletes both history index and template between check in ILMHistoryStore#ensureHistoryIndex and actual indexing AND template is not recreated in time. Although not impossible it would be pretty hard to hit.
For what it's worth I've run 3000 back-to-back tests without single failure.

@probakowski
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc and run elasticsearch-ci/default-distro

@probakowski
Copy link
Contributor Author

@elasticmachine update branch

@probakowski probakowski merged commit 2a9c1f8 into elastic:master Mar 4, 2020
@probakowski probakowski deleted the fix-ilmhistorystore branch March 4, 2020 07:37
probakowski added a commit to probakowski/elasticsearch that referenced this pull request Mar 4, 2020
* Avoid race condition in ILMHistorySotre

This change modifies ILMHistoryStore to always apply correct settings and mappings,
even if template is deleted and not yet recreated. This ensures that ILM history index
is correctly managed by ILM and also fixes flaky history tests that were prone to
triggenring this race.

This commit also refactors and simplifies ILM history tests.

Closes elastic#50353 and elastic#52853

* Review comment

Co-authored-by: Elastic Machine <[email protected]>
probakowski added a commit that referenced this pull request Mar 9, 2020
* Avoid race condition in ILMHistorySotre (#53039)

* Avoid race condition in ILMHistorySotre

This change modifies ILMHistoryStore to always apply correct settings and mappings,
even if template is deleted and not yet recreated. This ensures that ILM history index
is correctly managed by ILM and also fixes flaky history tests that were prone to
triggenring this race.

This commit also refactors and simplifies ILM history tests.

Closes #50353 and #52853

* Review comment

Co-authored-by: Elastic Machine <[email protected]>

* fixed tests

* backport #53306

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] TimeSeriesLifecycleActionsIT testHistoryIsWrittenWithFailure
5 participants