-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to filter query based on data tier #68135
Comments
Pinging @elastic/es-search (Team:Search) |
++ to this concept. If I'm understanding it right, having this would make it easy to use DLS to make roles that can only query specific data tiers? e.g. regular users can query hot/warm. power users can also query Cold/Frozen. |
Yes that could definitely be a use for this as well! |
The easiest way to make it work would be to define a |
Given the way the tier checking code works would this make sense to have a metadata field called I can see arguments both ways for |
Started with a PR for this but struggling - not sure how the field mapper API allows me to get hold of the current DiscoveryNode in order to get hold of node role settings so I can call a tester method on DataTier similar to this one, The alternative is perhaps for the field mapper to get hold of the index settings and test |
You only need the node settings to call the static function you linked. You can access them through |
I see only a subset of settings using that - it is only the |
That's because the default values are not materialized in the |
Is the thinking to allow filtering on the properties of the node an index lives on, or on the properties of the index (e.g. ILM Phase, or whether a given index is partial or full searchable snapshot), or other? |
In my opinion, these should be different features. Mappings are about the data so the |
Search node-routing preference or querying index-allocation preference?
That question was about how we test node roles and whether we offer a focused subset of them (tier = hot/cold/warm/frozen) or whether we allow testing any node role (anyKey = anyValue). The other question I raised and Steve mentioned is about testing hot/warm/cold etc against the index-allocation preference. Filtering by node role seems to make more sense to me as that determines the real performance characteristics while the index-allocation preference is a perhaps-unfulfilled aspiration and the index may be stuck on the wrong node. |
I was thinking of making the
My point is not really about which use-case is more important, it's more that I would only use a metadata field to expose something that is intrinsic to the data, not an allocation detail. I have no objections to enabling filtering by node role, but I don't think we should do it using mappings or the query DSL: preference feels like a better fit to me for this use-case. |
I agree - my assumption was this was just being done in the query syntax because it was somehow easier for Kibana to express. |
I agree that filtering by node role should be done through the preference but that seems like another feature as Adrien noticed. The |
I would suggest that maybe we consider using the |
I updated the PR to consult the |
I'd keep it |
My thinking was that tier setting for an index might actually be wrong - while it has a preference to be allocated to a particular tier of node it may be stuck somewhere else. Generally things gravitate from hot to colder tiers so a delayed movement is likely to be in a warmer tier than expected and that's probably not a problem for searches that will tend to prefer warmer end of things (at least for autocomplete). |
I think |
I found some unintended consequences from treating this as a queryable index field as opposed to a node-routing preference:
While the ability to query the tier as a field may be a convenience to some (preferable to node routing) it may be worth considering the number of field blacklists that need to be maintained where the new field is an inconvenience. |
…es on the roles defined (explicitly or implicitly)for a node. Closes elastic#68135
New _tier metadata field that supports term, terms, exists and wildcard queries on the first data tier preference stated for an index. Closes #68135
…rd queries on the first data tier preference stated for an index. Backport of 3aee4c1 Closes elastic#68135
Some use cases have the desire to query data within a certain tier (or set of tiers), for example, in the presence of a data stream or alias using ILM, query data only in the "hot" tier. (See: #47881 where users have asked for ILM supporting aliases so that queries can target a specific lifecycle of data).
It could be nice to have a general purpose query that could be used for regular searching (as well as aggregations) that allowed specifying a "tier" of data to query. This would allow a query like:
This is especially nice when users start using searchable snapshots for their data, as it would allow bypassing indices in other tiers (such as "cold" and "frozen") without requiring any sort of download of data.
One question that may come up is "why not just use a time range filter for getting the most recent data?". This is useful when only consuming a single set of data (such as a single data stream), but if we had a first-class query for data tier searching, multiple data streams and aliases could be queried that have differing "hot" tier definitions without requiring the user to both be aware of the timing for the tier and separate the filter range based on specific index patterns. For example: searching three data streams that have data in the hot phases for 7, 14, and 21 days respectively, using
tier: hot
is much simpler than specifying three different range filters tied to three different data stream index names.This also helps some of the use cases in #47881 while being accessible to both data streams and aliases.
If this is of interest, we could perform the filtering for this prior to any query execution as the tier is accessible through the index metadata and could be rewritten to exclude indices that aren't in the specified tier.
The text was updated successfully, but these errors were encountered: