-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] DownsampleActionIT testRollupNonTSIndex failing #103981
Comments
and add more logging for when test fails next time. Relates to elastic#103981
and add more logging for when test fails next time. Relates to #103981
Failed again at https://gradle-enterprise.elastic.co/s/4hzqteph2qrpo, reinstating the mute. |
increase ILM logging to TRACE. Relates to elastic#103981
increase ILM logging to TRACE. Relates to #103981
No reported failures since the last ~2 weeks, which was when #106563 was merged. I think that change stabilized this test as well. |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a single failure over the past week, over more than 2k runs. Re-enabling the test for coverage, keeping an eye for further failures. |
No more failures over the past week. Closing it for now. |
Seems like a race condition:
Waiting for longer should allow for things to work. |
look_ahead_time is 1m, need to wait for longer than that. Related to #103981
Failed on 8.14 https://gradle-enterprise.elastic.co/s/ns3igizthf3eq @kkrik-es was the fix backported? |
Yeah I saw the failure.. I didn't backport the fix, will do so if I don't see another failure in main this week. |
No more failures, closing (and backporting the fix) for now. |
This PR fixes two test failures #103981 & #105437 and refactors the code a bit to make things more explicit. **What was the issue** These tests were creating an index with a policy before that policy was created. This could cause an issue if ILM would run after the index was created but before the policy was created. When ILM runs before the policy is added, the following happen: - the index encounters an error the ILM state sets that the current step is `null`, which makes sense since there is no policy to retrieve a step from. - A `null` step does not qualify to be executed periodically, which also makes sense because probably nothing changed, so chances are the index will remain in this state. - The test keeps waiting for something to happen, but this is not happening because no cluster state updates are coming like they would have if this was a "real" cluster. - Until the test tear down starts, then the index gets updates with the ILM policy but it's a bit too late. The previous scenario is confirmed by the logging too. ``` ----> The index gets created referring a policy that does not exist yet, ILM runs at least twice before the policy is there [2024-06-12T20:14:28,857][....] [index-sanohmhwxl] creating index, ...... [2024-06-12T20:14:28,870][....] [index-sanohmhwxl] retrieved current step key: null [2024-06-12T20:14:28,871][....] unable to retrieve policy [policy-tohpA] for index [index-sanohmhwxl], recording this in step_info for this index java.lang.IllegalArgumentException: policy [policy-tohpA] does not exist -----> Only now the policy is added [2024-06-12T20:14:29,024][....] adding index lifecycle policy [policy-tohpA] -----> ILM is running periodically but because the current step is null it ignores it [2024-06-12T20:15:23,791][....] job triggered: ilm, 1718223323790, 1718223323790 [2024-06-12T20:15:23,791][....] retrieved current step key: null [2024-06-12T20:15:23,791][....] maybe running periodic step (InitializePolicyContextStep) with current step {"phase":"new","action":"init","name":"init"} ``` This can also be locally reproduced by adding a 5s thread sleep before adding the policy. **The fix** Adding a non existing policy to an index is a not a supported path. For this reason, we refactored the test to reflect a more realistic scenario. - We add the policy as an argument in `private void createIndex(String index, String alias, String policy, boolean isTimeSeries)`. This way it's clear that a policy could be added. - We created the policy before adding the index, it does not appear that adding the policy later is crucial for the test, so simplifying it sounded like a good idea. - Simplified `testRollupIndexInTheHotPhaseWithoutRollover` that ensures that a downsampling action cannot be added in the hot phase without rollover. An index is not necessary for this test, so again simplifying it makes the purpose of the test more clear. Fixes: #103981 Fixes: #105437
…9787) This PR fixes two test failures elastic#103981 & elastic#105437 and refactors the code a bit to make things more explicit. **What was the issue** These tests were creating an index with a policy before that policy was created. This could cause an issue if ILM would run after the index was created but before the policy was created. When ILM runs before the policy is added, the following happen: - the index encounters an error the ILM state sets that the current step is `null`, which makes sense since there is no policy to retrieve a step from. - A `null` step does not qualify to be executed periodically, which also makes sense because probably nothing changed, so chances are the index will remain in this state. - The test keeps waiting for something to happen, but this is not happening because no cluster state updates are coming like they would have if this was a "real" cluster. - Until the test tear down starts, then the index gets updates with the ILM policy but it's a bit too late. The previous scenario is confirmed by the logging too. ``` ----> The index gets created referring a policy that does not exist yet, ILM runs at least twice before the policy is there [2024-06-12T20:14:28,857][....] [index-sanohmhwxl] creating index, ...... [2024-06-12T20:14:28,870][....] [index-sanohmhwxl] retrieved current step key: null [2024-06-12T20:14:28,871][....] unable to retrieve policy [policy-tohpA] for index [index-sanohmhwxl], recording this in step_info for this index java.lang.IllegalArgumentException: policy [policy-tohpA] does not exist -----> Only now the policy is added [2024-06-12T20:14:29,024][....] adding index lifecycle policy [policy-tohpA] -----> ILM is running periodically but because the current step is null it ignores it [2024-06-12T20:15:23,791][....] job triggered: ilm, 1718223323790, 1718223323790 [2024-06-12T20:15:23,791][....] retrieved current step key: null [2024-06-12T20:15:23,791][....] maybe running periodic step (InitializePolicyContextStep) with current step {"phase":"new","action":"init","name":"init"} ``` This can also be locally reproduced by adding a 5s thread sleep before adding the policy. **The fix** Adding a non existing policy to an index is a not a supported path. For this reason, we refactored the test to reflect a more realistic scenario. - We add the policy as an argument in `private void createIndex(String index, String alias, String policy, boolean isTimeSeries)`. This way it's clear that a policy could be added. - We created the policy before adding the index, it does not appear that adding the policy later is crucial for the test, so simplifying it sounded like a good idea. - Simplified `testRollupIndexInTheHotPhaseWithoutRollover` that ensures that a downsampling action cannot be added in the hot phase without rollover. An index is not necessary for this test, so again simplifying it makes the purpose of the test more clear. Fixes: elastic#103981 Fixes: elastic#105437
…109791) This PR fixes two test failures #103981 & #105437 and refactors the code a bit to make things more explicit. **What was the issue** These tests were creating an index with a policy before that policy was created. This could cause an issue if ILM would run after the index was created but before the policy was created. When ILM runs before the policy is added, the following happen: - the index encounters an error the ILM state sets that the current step is `null`, which makes sense since there is no policy to retrieve a step from. - A `null` step does not qualify to be executed periodically, which also makes sense because probably nothing changed, so chances are the index will remain in this state. - The test keeps waiting for something to happen, but this is not happening because no cluster state updates are coming like they would have if this was a "real" cluster. - Until the test tear down starts, then the index gets updates with the ILM policy but it's a bit too late. The previous scenario is confirmed by the logging too. ``` ----> The index gets created referring a policy that does not exist yet, ILM runs at least twice before the policy is there [2024-06-12T20:14:28,857][....] [index-sanohmhwxl] creating index, ...... [2024-06-12T20:14:28,870][....] [index-sanohmhwxl] retrieved current step key: null [2024-06-12T20:14:28,871][....] unable to retrieve policy [policy-tohpA] for index [index-sanohmhwxl], recording this in step_info for this index java.lang.IllegalArgumentException: policy [policy-tohpA] does not exist -----> Only now the policy is added [2024-06-12T20:14:29,024][....] adding index lifecycle policy [policy-tohpA] -----> ILM is running periodically but because the current step is null it ignores it [2024-06-12T20:15:23,791][....] job triggered: ilm, 1718223323790, 1718223323790 [2024-06-12T20:15:23,791][....] retrieved current step key: null [2024-06-12T20:15:23,791][....] maybe running periodic step (InitializePolicyContextStep) with current step {"phase":"new","action":"init","name":"init"} ``` This can also be locally reproduced by adding a 5s thread sleep before adding the policy. **The fix** Adding a non existing policy to an index is a not a supported path. For this reason, we refactored the test to reflect a more realistic scenario. - We add the policy as an argument in `private void createIndex(String index, String alias, String policy, boolean isTimeSeries)`. This way it's clear that a policy could be added. - We created the policy before adding the index, it does not appear that adding the policy later is crucial for the test, so simplifying it sounded like a good idea. - Simplified `testRollupIndexInTheHotPhaseWithoutRollover` that ensures that a downsampling action cannot be added in the hot phase without rollover. An index is not necessary for this test, so again simplifying it makes the purpose of the test more clear. Fixes: #103981 Fixes: #105437
Build scan:
https://gradle-enterprise.elastic.co/s/ruzhsbyskkuk6/tests/:x-pack:plugin:ilm:qa:multi-node:javaRestTest/org.elasticsearch.xpack.ilm.actions.DownsampleActionIT/testRollupNonTSIndex
Reproduction line:
Applicable branches:
main
Reproduces locally?:
Didn't try
Failure history:
Failure dashboard for
org.elasticsearch.xpack.ilm.actions.DownsampleActionIT#testRollupNonTSIndex
Failure excerpt:
The text was updated successfully, but these errors were encountered: