-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Rollup] Disallow index patterns that match rollup indices #30491
[Rollup] Disallow index patterns that match rollup indices #30491
Conversation
We should not allow the user to configure index patterns that also match rollup indices (either the index specified in the config, or previously created jobs). Matching against rollup indices will almost certainly lead to bad behavior, because the expected fields in the user index won't exist in the rollup (due to renaming). It is also unlikely a user will want to match against a rollup index. For example, it is quite natural for a user to specify `metricbeat-*` as the index pattern, and then store the rollups in `metricbeat-rolled`. This will start throwing errors as soon as the rollup index is created because the indexer will try to search it.
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, the index pattern should not match the rollup index name, this is an important validation. However I wonder if the second validation is really needed/possible. What happens if you define a rollup index that match an index pattern of another job ?
} | ||
if (Regex.isSimpleMatchPattern(indexPattern)) { | ||
if (Regex.simpleMatch(indexPattern, rollupIndex)) { | ||
throw new IllegalArgumentException("Index pattern would match rollup index name which is not allowed."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch !
/** | ||
* Ensure that the index pattern doesn't match any pre-existing rollup indices. We don't need to check | ||
* if the pattern would match it's own rollup index because the request already validated that. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can be less strict and only forbid mixing rollup and non-rollup indices in the same pattern ?
It should be fine to create the job if the pattern matches rollup indices only. Although I wonder if the problem of mixing indices (rollup and non-rollup) is not caught by the field validation. As you mentioned they have different field names so the validation of field names should fail all the time ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the field name validation, although i'm still a bit worried about unintended consequences of matching against a rollup index. Barring the "rollup a rollup" usecase, it doesn't seem wise to let users match against an existing rollup if we can help it?
But maybe it's not worth the extra complexity? If it matches another rollup index, that index will exist so we can validate against the fields and it should throw an exception. It also doesn't protect against future rollup indices from matching the defined pattern, so we'd be relying on field level validation there anyway.
I think I'm convinced. Will make the changes, should simplify the PR a bunch :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it turns out to be a bit more complicated. The field-level validation is broken, due to how field caps behaves.
Field caps only tells you the caps for the fields that match, in the resolved indices. But it doesn't tell you about the indices that were resolved but didn't match any field. So we won't know the index pattern matched some random index without the correct fields.
That's unrelated to this change to this PR, but needs to be fixed for us to simplify. Going to see what it takes, we may have to manually resolve the indices anyway to check. Will keep you updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. We can also add the matching indices in field caps's response. Could be per field : the list of index that doesn't have a definition for the field or global: the list of indices that match the pattern and then you can infer which index is missing from the field's response. I think it's a valid feature to add even if we don't use it in rollups.
@jimczi I pushed a commit to prune this back to just validating that the pattern doesn't self-match. I'll follow up with the field caps limitation and see what we can do there, then circle back to Rollup and add a few tests to make sure the field-name validation works as expected after we tweak field caps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @polyfractal
Jenkins, run gradle build tests |
We should not allow the user to configure index patterns that also match the index which stores the rollup index. For example, it is quite natural for a user to specify `metricbeat-*` as the index pattern, and then store the rollups in `metricbeat-rolled`. This will start throwing errors as soon as the rollup index is created because the indexer will try to search it. Note: this does not prevent the user from matching against existing rollup indices. That should be prevented by the field-level validation during job creation.
* elastic/6.x: [Rollup] Disallow index patterns that match the rollup index (#30491) Revert "Fixing MixedClusterClientYamlTestSuiteIT" Fixing MixedClusterClientYamlTestSuiteIT Add get mappings support to high-level rest client (#30889) [DOCS] Fixes security example (#31082) Allow terms query in _rollup_search (#30973)
* ccr: [DOCS] Creates rest-api folder in docs [Rollup] Disallow index patterns that match the rollup index (elastic#30491) Replace exact numDocs by soft-del count in SegmentInfo (elastic#31086) Upgrade to Lucene-7.4.0-snapshot-0a7c3f462f (elastic#31073) Add cors support to NioHttpServerTransport (elastic#30827) [DOCS] Fixes security example (elastic#31082) Allow terms query in _rollup_search (elastic#30973) Removing erroneous repeat Adapt bwc versions after backporting elastic#30983 to 6.4 [Tests] Muting RatedRequestsTests#testXContentParsingIsNotLenient TEST: Retry synced-flush if ongoing ops on primary (elastic#30978) Fix docs build. Only auto-update license signature if all nodes ready (elastic#30859) Add BlobContainer.writeBlobAtomic() (elastic#30902) Add a doc value format to binary fields. (elastic#30860)
* ccr: [DOCS] Creates rest-api folder in docs [Rollup] Disallow index patterns that match the rollup index (elastic#30491) Replace exact numDocs by soft-del count in SegmentInfo (elastic#31086) Upgrade to Lucene-7.4.0-snapshot-0a7c3f462f (elastic#31073) Add cors support to NioHttpServerTransport (elastic#30827) [DOCS] Fixes security example (elastic#31082) Allow terms query in _rollup_search (elastic#30973) Removing erroneous repeat Adapt bwc versions after backporting elastic#30983 to 6.4 [Tests] Muting RatedRequestsTests#testXContentParsingIsNotLenient TEST: Retry synced-flush if ongoing ops on primary (elastic#30978) Fix docs build. Only auto-update license signature if all nodes ready (elastic#30859) Add BlobContainer.writeBlobAtomic() (elastic#30902) Add a doc value format to binary fields. (elastic#30860)
We should not allow the user to configure index patterns that also match rollup indices (either the index specified in the config, or previously created jobs).
Matching against rollup indices will almost certainly lead to bad behavior, because the expected fields in the user index won't exist in the rollup (due to renaming). It is also unlikely a user will want to match against a rollup index.
For example, it is quite natural for a user to specify
metricbeat-*
as the index pattern, and then store the rollups inmetricbeat-rolled
. This will start throwing errors as soon as the rollup index is created because the indexer will try to search it.While we're at it, we should forbid the index pattern being equal to the rollup index too.
/cc @cdahlqvist