-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] DocsClientYamlTestSuiteIT test {yaml=reference/watcher/example-watches/example-watch-clusterstatus/line_137} failing - (#115809) #117354
[CI] DocsClientYamlTestSuiteIT test {yaml=reference/watcher/example-watches/example-watch-clusterstatus/line_137} failing - (#115809) #117354
Conversation
Pinging @elastic/es-data-management (Team:Data Management) |
Don't forget to remove the entries from |
@@ -1131,6 +1135,19 @@ protected static void wipeAllIndices(boolean preserveSecurityIndices) throws IOE | |||
} | |||
final Request deleteRequest = new Request("DELETE", Strings.collectionToCommaDelimitedString(indexPatterns)); | |||
deleteRequest.addParameter("expand_wildcards", "open,closed,hidden"); | |||
|
|||
// If system index warning, ignore but log | |||
deleteRequest.setOptions(RequestOptions.DEFAULT.toBuilder().setWarningsHandler(warnings -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we talked about this yesterday in the team meeting, but I don't remember if we talked about excluding the triggered-watches
data stream from this DELETE request. Is that not an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember discussing that but I was a bit hesitant to do that as I think there's logic that reads that index and may affect the execution / retrieval of watches if it's left over between tests.
Perhaps @masseyke can weigh in here if he thinks it's safe to keep between tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a bit more digging and yes .triggered-watches
hanging around between tests does affect watch execution (see
Lines 122 to 130 in 8efd08b
/** | |
* Checks if any of the loaded watches has been put into the triggered watches index for immediate execution | |
* | |
* Note: This is executing a blocking call over the network, thus a potential source of problems | |
* | |
* @param watches The list of watches that will be loaded here | |
* @param clusterState The current cluster state | |
* @return A list of triggered watches that have been started to execute somewhere else but not finished | |
*/ |
@nielsbauman Are you happy with that conclusion?
🤦🏻♂️ Yep totally forgot... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investigating, @lukewhiting! This LGTM, but I think it makes sense if @masseyke also has a quick look anyway - since this is concerning watcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it seems reasonable. It's worth pinging @smalyshev since we're kind of expanding the fix he just did in #117301.
// and: https://github.com/elastic/elasticsearch/issues/115809 | ||
deleteRequest.setOptions(RequestOptions.DEFAULT.toBuilder().setWarningsHandler(warnings -> { | ||
for (String warning : warnings) { | ||
if (warning.startsWith("this request accesses system indices:")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if shutting down warnings about all system indexes is going a bit too far here. Maybe having an list of allowed ones would be better? Also, maybe there's a way to clean up that before we get here? For async unfortunately there isn't since it can be created by multiple APIs which don't have cleanup functions right now, but not sure what the status with watches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look @smalyshev! To follow up your points:
I wonder if shutting down warnings about all system indexes is going a bit too far here
I did wonder that too but I also was struggling to think when failing the test due to a warning on cleanup would be helpful information... The request succeeded and the index is deleted, the warning is a known side effect of what we are doing here so this just feels like noise...
Maybe having an list of allowed ones would be better?
That's possible but difficult in our case. .triggered-watches
is a data stream (I think? Or at least ILM managed) so the actual index name in the error will change over time. We could do some pattern matching on the warning text but that starts to feel quite fragile...
maybe there's a way to clean up that before we get here? For async unfortunately there isn't
Alas this is one of those async cases... Watches can run in the background between tests and cause this index to appear and need cleaning up at any time.
I'm happy to add some more complex pattern matching to this but I would be interested to know the problem it would be solving (Not saying there isn't one, More likely I'm just too new here to spot it yet 😅).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reply by @smalyshev via DM:
19:29
Stas Malyshev
Two notes there: 1. I'd like to keep the warnings handler code separate if possible, code with multiple levels of lambdas nested into each other is kinda hard to read, imho
2. Even if .triggered-watches is a data stream, we still could scan the list and use startsWith to check for specific warnings, not? If it's too hard though then I think it's ok to ignore all system indices there probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved the warning handler to it's own method to stop the nested lambdas.
The problem with checking which index threw the warning is that it's part way through the warning message so there would need to be code to extract it, then do the starts with.
I feel that with the fragility of such string parsing code (Especially when we have no guarantees on the future consistency of that warning message) and the probability of other indices cropping up in the future that need ignoring, IT's best to just deal with this once and be done with it.
That said, if anyone can come up with a downside to muting all warnings during this cleanup, I'm happy to do a u-turn and get that ignore list function in place.
The tag |
💔 Backport failed
You can use sqren/backport to manually backport by running |
…atches/example-watch-clusterstatus/line_137} failing - (elastic#115809) (elastic#117354) * Ignore system index access errors in YAML test index cleanup method * Remove test mute * Swap the logic back as it was right the first time * Resolve conflict with latest merge * Move warning handler into it's own method to reduce nesting (cherry picked from commit cda2fe6)
…atches/example-watch-clusterstatus/line_137} failing - (elastic#115809) (elastic#117354) * Ignore system index access errors in YAML test index cleanup method * Remove test mute * Swap the logic back as it was right the first time * Resolve conflict with latest merge * Move warning handler into it's own method to reduce nesting (cherry picked from commit cda2fe6)
…atches/example-watch-clusterstatus/line_137} failing - (#115809) (#117354) (#117972) * Ignore system index access errors in YAML test index cleanup method * Remove test mute * Swap the logic back as it was right the first time * Resolve conflict with latest merge * Move warning handler into it's own method to reduce nesting (cherry picked from commit cda2fe6)
Sometimes a watch can run during a YAML test, creating some history indices. During test cleanup these get deleted but can throw a warning if these systems indices exist and have been deleted.
This patch introduces code to swallow (but log) system index access warnings during YAML test cluster cleanup.
Fixes #115809
Fixes #116884