-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RAC] RFC: Index naming and hierarchy #98912
Comments
@spong @dgieselaar @tsg @jasonrhodes @kobelb @XavierM @yctercero @dhurley14 please review 🙂 |
Thanks @banderror for putting this up! Couple of questions:
|
With regards to |
++, I like something like It might be good for all registries to have a short and sweet name without
Somehting like
I'm not sure on this one, the data streams don't include the version so I'm thinking to start without first. We can add it later if we really need it. |
Thank you for comments, this is very helpful 👍 Kibana version in the name
I just followed the existing implementations as well, so not sure. Off the top of my head I'd imagine this version number could be helpful for document migrations. On the other hand, migrations could be built on top of a different number specifically used for tracking changes in the schema. For example, migration system for Other than that, I don't have any ideas regarding use cases for Kibana version in the index name. I would support your suggestions and remove it for now 👍 Multiple indices for alerts, rule execution events etc
I think yes, but I'm also open for any objections. Why I'd say separate indices are a better option:
Naming in general I like the suggested data stream naming scheme! So if I got it right, this is what we're gonna have:
Examples: My questions regarding this naming:
Agree 👍 I will add this check to the implementation. Stack rules
I'm not really aware of any requirements for stack rules, maybe I've missed that part. To clarify, stack rules are the rules that can be created from the Stack Management UI (
Do we have any requirements/plans for stack rules? In terms of RAC, stack rules == rules which will be created directly from the unified alerting app? |
Oh yes, and I will of course update the RFC, just want us to agree on most of the details. |
Trying to understand the difference between the "alerts" and "execlog" and current event log indices. I assume "alerts" is intended to hold data regarding the alert being run - for index threshold, that would include the threshold being tested again, the value calculated from the es aggs call to compare to the threshold, etc. And so the "execlog" indices would be like the current event log, which just capture the execution times/duration, status, etc. Which I think means the event log itself becomes unneeded, eventually. But I'd like to understand the field differences between the event log and execlog. Because I'm wondering if we can live with the current event log for now, especially given the following:
|
re: kibana version in index name We did this for the event log, because it solved a problem for us, and we noticed other Kibana apps doing this - something in o11y, but not sure exactly what. The problem is: "migrating" indices when Kibana is upgraded. Obviously (I hope), we weren't planning on doing " Adding a version to the name makes this problem go away! We always create a new index template, alias, and initial index when Kibana is updated. And then we end up using Obviously, this doesn't handle every case. If the structure between Kibana versions changes "too much", we could be in a position where we wouldn't be able to validly query old data, and similar sorts of problems. But this felt like the best thing to do, back when we wrote this. If we want to explore not using the version in the index name, then I think we need to have a really good story for what happens when the mappings change when Kibana is updated. Haven't thought too much about this (since it's not a problem for the event log). |
@dgieselaar Is this question driven by the desire to view/manage alerts from stack rules and alerts from within each solution? If so, is it just limited to stack rules? Is there the desire to view/manage alerts from o11y rules inside security and security alerts from o11y? At one point, I saw the suggestion to determine the index to write to based on the consumer of the alert, not the producer. Is that something that is up for consideration? |
@ymao1 ++, I think using the consumer makes more sense. This way create alerts directly in the indices where we need them, and we don't have to query across solutions. |
@pmuellr For eventlog, do you store the full version including the patch level (e.g. 7.11.2)? This is what Beats used to do before the new indexing strategy. It does help on upgrades, but it can mean creating a lot of indices in case of frequent upgrades. The question is if we really need it, or is it enough to trigger an ILM rotation when we do an upgrade that changes the mapping. This might get complicated if we have multiple Kibana instances and they aren't upgraded all at once (is that permitted?). FWIW, when the index naming strategy was discussed, I pressed on the need of including the version: https://github.com/elastic/observability-dev/issues/283#issuecomment-527372212 My only reason for going first without is that this is what the new indexing strategy also does the same, and it can be added later. |
That's good to hear, as this questions is one of the main blockers for migrating Stack Rules to Alerts-as-Data. |
Regarding consumer vs producer, could you explain what that actually means? What would be examples of producers and consumers in terms of RAC? From the current code of detection engine, I can see that:
So seems like for our rules producer and consumer are always the same thing - our app. Is that assumed to change in some way? Would stack rules be able to generate alerts for solutions? Does the naming already discussed here ( |
@spong @dgieselaar @banderror :) I will incorporate all the feedback from this RFC to #98353 |
@spong regarding version in the index name and Tudor's comment #98912 (comment) I'd say maybe we should stick to the same approach as we already have in the |
@banderror The alerting framework maintains the idea of producer and consumer, where the producer is the solution creating the rule type ( If the index schema is based on the producer where security produced rules are written to |
Oh I see that now, thank you @ymao1 for the clear explanation.
Gotcha. Maybe it means that a rule type (a stack rule type in this case), instead of indexing alerts directly, will need to use some kind of an indexing "strategy" injected into it, which would know how to properly index the alert into the destination alerts-as-data index (and would respect the document schema and mappings). Otherwise we would need to have the same mappings in all alerts-as-data indices which I'm not sure would it be feasible or not. Do you think this naming might work |
Regarding Kibana version in the name and migrations/rollovers. This is how it's implemented in Security for Basically, we don't have Kibana version in the index name, but instead we maintain index template version in the code, and we have two mechanisms for two cases which both use this version:
We could use the same or similar approach for RAC indices. Or maybe there can be cons to that like
|
re:
We could alleviate this concern by scheduling a task, as that is guaranteed to be picked up by a single Kibana instance (I think Gidi or Patrick suggested that). |
++ on this - which is why I think we should always include all technical fields in the shared component template. That should hopefully be a few dozen only. But that could allow users to point any rule to any index, and all the other stuff would be metadata, which may or may not need a runtime field to be queryable. |
That was me, but even that's not super obvious, we'd need to understand the exact flow you want to support. BTW rolling upgrades are not supported, so it's less about instances being upgraded separately, and more about more than one instance being booted at the same time. |
I updated the proposal in the description based on all your feedback. Thank you! |
Fwiw, consumer is currently not Observability for rule types, but APM/Uptime etc. My suggestion would be to not tightly couple this to the the alerting framework's interpretation of consumer. Generally, I feel we should avoid technically depending on the index name, and treat it as a scoping mechanism for allowing administrators to more easily grant access to subsets of data. Preferably we use a query when we query alerts instead of reconstructing the entire index alias. Not sure if that is being suggested here, but wanted to call that out. |
@dgieselaar, is the thought then that solutions would need to explicitly allow-list/enable which stack rules they support and then we'd combine those component templates with the solution-specific component templates so the solution indices have all the necessary fields to support stack rules? Or would solutions just include _all stack rule component templates by default so there's no ambiguity between which solutions support which stack rules? @banderror -- updated RFC LGTM! 👍 May want to have a section with regards to storing |
@dgieselaar - Wouldn't diverging here make it far harder though? RBAC is already a complicated mechanism, if we start diverging on this (using something other than FeatureID in consumers/producers) we're adding another moving part to this mechanism. I'm not necessarily objecting here, but flagging that this would come at a cost to maintainability/reliability, and we should step with caution. cc @ymao1 |
Cool, I understand how users can control their queries to scope which alerts are returned in both of those cases. I'm less clear on the case @MikePaquette mentioned as one of the deal breakers above:
A visualization built on the alerts index (or an embedded Lens viz in a solution UI) isn't going to be auto-scoped to alerts in the current space. I just want to confirm that we don't expect that having the space ID in the index names is going to solve this either, as far as I understand. |
If the user has wide access to the alerting indices, and they use Discover or Visualise, there is a manual step in both cases: either selecting the right index to query (easier), or use the correct filter (a bit harder, but manageable). This filter needs to be added any time the user does something with alerts as data (ML jobs on top, rules on top of rules). So the overhead is adding up a bit, but I'm not sure how big of a problem will be in practice. For embedded Lens, we can easily pass the correct index glob to Lens, so it works out of the box. I think we should also be able to pass the filter to Lens, so except for a bit more work in our code, I think this is the same. So far, this is adding the filters for convenience, but it's not secure in any way (the user can still access all alerts via Discover). The more common use case, I think, is that the Administrator will define ES roles so that the Analyst can only see the alerts from a particular space. This is where the difference is, IMO: If we separate the alerts by spaces in different indices, it's very easy for the Administrator to configure the roles correctly. If we don't separate the alerts, it's no longer possible to do this at all in Basic or Gold license levels. It is possible in Platinium+ via DLS, but the configuration is also more complex. I'm worried in particular about the following scenario:
If we would separate the alerts into indices by space, the admin would be able to define the right permissions in step 2. I agree that on a technical level we'd prefer to not have the space id in the index name, but I want to make sure that everyone understands the consequences here. |
This is the part that I'm not fully in agreement. I see the index name addition as an extra convenience for users that access data directly. It can and should be totally ignored by the Kibana handlers code ( by quering The shared alerts feature, when we add it, will only rely on the field in the documents, not on the index name. What space id that we put in the index name can be arbitrarily one of the spaces, or |
With the proposal I made previously, it wouldn't be the only way to separate alerts. They could choose to use the "Alerting namespace" feature to separate alerts into different indices. This comes with the risk of having so many "Alerting namespaces" that they have too many small indices and have an over-sharding issue. If that's the case, larger Alerting indices with DLS comes with an advantage.
At a certain level of scale, our users end up needing a Platinum license. I don't think that's a bad thing. DLS was created to address this specific use-case where we want users to have access to a subset of documents in an index.
If users rely on the index-name, then it would break their Dashboards and Visualizations as soon as we allow alerts to be shared in multiple spaces. We will put ourselves in a corner here where we need a breaking change to allow alerts to be shared in multiple spaces. I would like for us to have the flexibility of allowing alerts to be shared in multiple spaces within the 8.x timeframe, as we have many users who would like this. |
Thanks for the answers @kobelb.
This does address my main concern, so I'd be generally happy with it, but it sounds like we're pushing the problem of maintaining the indexing strategy to our users instead of having it defined by us.
Ok, I can see the risk here, but if we allow users to specify their own arbitrary namespace, they will potentially hit the same problem, except that we won't be able to help with mitigation because we can't know what strategy they adopted. Also, we do have space-ids in the GA-ed security solution already. So we can say that we're trading a breaking change in 8.x that we might need, for a breaking change in 7.x now. |
It's true... And I think that's the major flaw with my proposal. However, given the fact that we can't predefine an indexing strategy that will work for all users because it's really Alerting usage dependent, I think it's the best option we have.
When you say "the same problem", are you referring to users switching their namespacing strategy and then having to go update all of their Dashboards and Visualizations to use the new namespacing strategy? If so, I do agree that it'd be a frustrating user experience, and I think we should recommend that users only create visualizations/dashboards against
We have every intention of allowing alerts to be shared in multiple spaces in the 8.x timeframe. We have the opportunity of preparing for this future in the 8.0 release. Otherwise, we're going to have to wait until 9.x to allow alerts to be shared in multiple spaces. |
It's a common pattern to process alerts in pipelines to do the following:
Another pattern is to process alerts in data pipelines to mark a rule as flapping when it causes too many alert state change per time interval. Marking a rule as "flapping" causes a change in the notification strategy (e.g. only notify once every 30 mins). For all the use cases described above, we anticipate to post process alerts in pipelines that are very similar to the pipeline used on our "primal signals" that are logs, metrics, traces, or security events. Does it make sense @kobelb |
Pinging @elastic/security-solution (Team: SecuritySolution) |
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
I did a small test of the two approaches that we're discussing here - index name with This is the indices and simplistic objects I used for testing (Kibana Dev Tools syntax):
I also created two Kibana users: the 1st one has access to User 1 having access to User 2 having access to I added two index patterns to be able to query those indices in visualizations: Finally, I created two visualizations (in the form of tables) - a table per index pattern, and combined them into a dashboard. This is how this dashboard looks like if you log in as a superuser (you have access to all of the above indices): If you log in as User 1, you will see only documents from Same for User 2: Notice that the visualization (table) itself does not specify a space id, it uses an index pattern which doesn't include space id - in both approaches that we are comparing: |
@banderror nice! So both of these cases appear to do what we want them to do, it's just that one requires Platinum-only document-level security, and the other is achievable just with index name matching permissions, is that accurate? One thing I have trouble keeping in mind is how these relate to the idea of Kibana RBAC, i.e. customers who only use feature controls to set up a role. So if they give a user access to a feature called "alerts" (or whatever it would be called), would this kind of space-awareness exist for them due to the fact that the internal user will only read documents from the current space somehow? I'm not clear on how this works, but it seemed to be part of earlier conversations, and I want to be super clear about how those things relate to each other. Thanks! |
Correct. When users access alerts using custom UIs and HTTP APIs in Kibana that are integrated with the Kibana RBAC model, the user will only be able to see the subset of the alerts that they have access to: |
@jasonrhodes Yep, I think so. More and more I feel like it would be great to stick to @kobelb 's suggestion (no space id in the name + document-level security). It would simplify index management - no bootstrapping indices would be needed per space on the fly, e.g. in rule executors or route handlers or any other "lazy bootstrapping" logic. I might be missing/forgetting something, but the only blocker (maybe not) is being able to isolate alerts (at least security detection alerts) from different spaces in custom visualizations and dashboards for users with Basic license. Basically, like @MikePaquette said in #98912 (comment), spaces should work out of the box for Basic-licensed users, without any "surprises". This makes me thinking:
I think this could be a win-win solution if this is possible. Also, if we stick to DLS for space isolation, this probably would be a foundational thing for all the future custom indices in Kibana, e.g. based on the next implementation of saved objects etc. |
Spaces will work out of the box for a majority of basic-licensed users. The only exception is when users need to use the "escape hatch" to query the hidden
We can open up a licensing issue to discuss moving DLS to the basic license. However, DLS is currently a Platinum level feature, and moving it Basic is a rather large jump.
Unfortunately, no. When users are querying the Elasticsearch documents directly, Kibana's RBAC model doesn't apply. |
For a little more context on why I'm jumping in late on this issue: while @banderror is on vacation I'm going to be attempting to push forward the consolidated rule registry data client implementation. As part of that effort, I'm working on enhancing the index template creation, versioning, and bootstrapping strategy. Whether or not space IDs can potentially be included in concrete index names has significant implications on when/how the indices are bootstrapped. I agree with @jasonrhodes that a lot of this feels like trying to fit a too-small sheet to a bed. We have competing requirements that, to my knowledge, cannot be satisfied in the current version of Kibana with a Basic license.
After reading through all the comments here, I've reached the opposite conclusion from @banderror - I think optionally including the space ID in the index name comes with significant advantages and few downsides. To summarize what I'm proposing:
Advantages
Disadvantages
I agree with @MikePaquette here - taking features away from existing users in a new release would be an awful user experience. This hard requirement for the Security Solution means we must have either space ID or a custom namespace in the index name until we have some way in Kibana of controlling access to data indices by space (e.g. solutions might register a data index as "space-controlled" and Kibana then adds a "space filter" to every request that hits that index). I think using custom namespaces to implement this capability would add additional complexity and cognitive load to users who just want to replicate their existing environment since they would have to manually create and maintain a custom namespace for each Kibana space. It's also not clear to me what the desired user workflows around multi-space alerts in security would be, or if there's a clear need for multi-space security alerts (where a single underlying document is shared). There are cases where we would want an alert to be created in multiple spaces so the rule would be multi-space in a sense, but until an analyst checks the alert we can't know whether the alert state (open/closed etc) should be shared across spaces. To support this, we would likely want to create copies of the alert in each space and build out a feature "close alert in all spaces" that analysts could choose in the UI as an alternative to "close alert (implicitly only in this space)". In this scenario, alert documents are still single-space. (brief slack thread on the topic) |
I've stated this before, but I don't think we should be making poor architectural decisions because of license levels. Including the space-id in the index-name has critical disadvantages that we should not overlook:
|
Removing features for existing customers is a non-starter IMO, and an architecture that requires this would entirely block the ability for the security solution to use the RAC infrastructure. Requiring existing customers to pay for a higher tier license to continue receiving the same feature set constitutes "removing features" in my mind. Without space IDs in the index name, the security solution would not be able to migrate its full feature set to RAC infrastructure until Kibana provides "data with saved objects constraints".
This is why the concrete indices would be created as-needed. If users do create alerts in 1000 different spaces and they end up with indices for all of them, it's reasonable to think that they are separating the alerts intentionally and would want to be able to have strong separation between the spaces without jumping through extra hoops. If only a few of their spaces are actually used for alerts, then only a few indices are created. This is a tried-and-true strategy that the security solution has used to workaround the lack of "data with saved objects constraints" in Kibana.
I don't see how this blocks multi space alerting. The alerts that live in these indices with space IDs are and will always be single space alerts. When multi space alerting becomes available, we can create a new index without the space ID in the name and write multi-space alerts to that index instead. We can write single space alerts there as well, and tell users that dashboards that were pointing to Having "legacy" single space alerts in an index that has the space ID in the name should also be less confusing to users than either manually managing namespaces to match up with Kibana spaces or adding document level security. In general I don't think we should be making immediate sacrifices to user experience in the hopes of being forwards-compatible with multi-space alerting, given that multi space alerting hasn't been fully designed yet. |
We remove features all the time (generally after a reasonable period with deprecation notices). The security solution can migrate a very large amount of its features to the RAC infrastructure even before Kibana provides "data with saved object constraints". The only features that I'm aware of that are blocked are using applications like Discover, Visualize, Lens with the alert-data using Kibana's RBAC model.
Just because it works with the scale of security detections right now, doesn't mean it will continue to work. Additionally, we currently have some alerts for stack monitoring that are being created in every space.
This reinforces the argument I was making previously. Multi-space alerts are incompatible with alerts indices with space IDs in their names. We have a potential workaround where we write single space alerts to alert indices with a space ID in their name, but we have to stop doing that as soon as these alerts exist in multiple spaces. Whenever we would switch which index these alerts are being written to, we would break any features that were reading from the original single space alert indexes.
What immediate sacrifices are we making? My primary goal is to prevent us from generalizing a solution that the detection engine has implemented that puts us in a corner and prevents us from implementing multi-space alerting. We have a fairly good idea of how we want this to work, it's not some long-term, multiple years out project. We have users that want multi-space alerting as soon as possible. |
It's important to note that in the proposal the space ID in the index name is optional.
Customers that use spaces to segment alerts and ensure that analysts in certain roles can only see the alerts they are supposed to see will suddenly lose the ability to create the proper permissions at the Basic level. Currently this can be achieved using index level permissions.
We wouldn't need to switch the index that features are reading from - we can build our features to ignore the space ID in the index name and simply read from Users will have to update the index pattern they use for custom visualizations and dashboards, but that's an acceptable breakage with hidden indices. Is there a specific operation that is not possible or logically does not make sense if the index name contains the space ID? The only one that comes to mind would be adding an additional space to a specific alert document's
This is why, in my proposal, including the space ID in the index name is an optional feature, determined by the rule writers within each solution. The security solution can continue to use the space ID, and solutions that don't need it can use a single index.
The key feature that's blocked is the ability to prevent users in certain roles from being able to access alerts that they should not access at the Basic tier. |
@marshallmain if we don't include the space-id in the index-name, the only situations where we won't have space-based isolation and authorization is when end-users access the indices directly. Are we in agreement on this? As far as I'm aware, this is isolated to a very few edge-case situations: Discover, Visualize, Lens, etc. I believe this will also be the case with "alerts on alerts", at least for the short-term. Are there other significant use cases that I'm missing? |
This feels like the important comment to just make sure we've fully addressed. Security users with a Basic license currently use Kibana spaces to segment their alerts. From a solution perspective, it sounds like we're saying that will still be possible without using Space ID in the name, is that right? The particular magic that makes that work never seems to stick in my memory, but we should avoid a false dichotomy if this is possible in both solutions.
I think I understand what you're saying here, but just to be clear, Security can migrate "a very large amount of its features" but the ones that they can't migrate will be dropped, they can't continue to support those features in the old model while also migrating the others. In other words, once they migrate to the new indices, any feature that's not migratable will be removed from the UI. So the question is: what are the features that will be dropped? It sounds like it would not be anything in the Security UI, but rather, custom visualizations, dashboards, etc. that end users built that relied on Space ID being in the index name in order to properly segment their queries, is that correct? If we can clarify that officially, then I think we can pull @MikePaquette back in to see whether that's still a deal-breaker. |
This conversation has moved into our architecture design document. I'll summarize the complete results of that discussion here ASAP and close this ticket. |
Related to:
#93729
#95903
#98353
elastic/elasticsearch#72181
Summary
There's more and more questions and concerns being raised regarding rule monitoring implementation for RAC. I'm working on #98353 which implements "event log" abstraction within rule registry that will be used for writing and reading both alerts and rule execution logs.
This RFC proposes some naming and structure for RAC indices, hierarchy for rule registries, and lists a few open questions and concerns.
Proposal
Index aliases naming convention would be similar to Elastic data stream naming scheme:
{prefix}-{consumer}.{additional.log.name}-{kibana space}
where:
{prefix}
will be.alerts
by default. Users will be able to override it in Kibana config and set it to any other value, e.g..alerts-xyz
or.whatever
(for compatibility with legacy Kibana multitenancy).{consumer}
will indicate the rule/alert consumer in terms of Alerting framework (more info). We will probably havesecurity
,observability
andstack
as consumers.{additional.log.name}
will be used to specify concrete indices for alerts, execution events, metrics and whatnot. We will probably have onlyalerts
andevents
as concrete logs. It might be good to have short and sweet names without-
or.
in them (perhaps_
is ok).{kibana space}
will indicate space id, for exampledefault
. In case of space agnostic alerts, theno.space
placeholder can be used.Examples of concrete index names
For clarity, this section contains the concrete index names that are created the Security and Observability solutions.
Security:
(1) The Alert documents that support human workflow and are updatable.
(2) Rule-specific execution events and metrics created by Security rules to enhance our observability of alerting.
Technically it will be possible to derive child logs from these
alerts
andevents
, e.g..alerts-security.alerts.ml-{kibana space}
. Although we don't think that we need this in Security at this point.Observability:
(1) The Alert documents that are updatable. Same exact semantics as for Security (1)
(2) Supporting documents (evaluation) for the Alerts + execution logs to be used for the Observability of Alerting
(3) Example of space agnostic alert index. This can be used by the space-agnostic Rule types like the Stack Monitoring might need. The
no.space
"space" is not a valid Kibana space name, so this pattern can be used as a placeholder.Diagram
Here's a structural diagram showing some rule execution dependencies in the context of RAC and how the proposed indices fit the whole picture:
diagram source
The text was updated successfully, but these errors were encountered: