-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution] PoC of Prebuilt Detection Rules package with historical versions #137420
Comments
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
Pinging @elastic/security-solution (Team: SecuritySolution) |
I have created a Fleet package containing a large number of detection rules to verify how that approach scales. And here are some limitations that I've encountered along the way. Max files per packageIf we try to build a package with many files using
This limitation is not that significant for our use-case as the current number of detection rules in Security Solution with all their historical versions created in the past 2.5 years is ~4200. So we have a substantial buffer before reaching the 65k rules limit. Saved objects import limitFurther, if we try to install a package that contains more than 10000 saved objects, the installation fails with the following error:
The max number of objects to import is controlled by the
We can try to further tweak Elasticsearch settings, but it doesn't seem practical for our use case as the solution won't be universal. So installing more than 10000 saved objects is not a good option, and we need to consider alternatives. AlternativesThe current package installation method implies that we import package assets as saved objects and persist them locally. But on the solution side, we need only a fraction of assets to be available. E.g., if a given detection rule has 10 versions, we don't need to read all of them, we only need two versions to build diff (more on diff #137446). So probably, we could read package assets directly without installing them. An EPR API already allows us to read the list of package assets {
"name": "security_detection_engine",
"title": "Prebuilt Security Detection Rules",
"version": "8.1.1",
"assets": [
"/package/security_detection_engine/8.1.1/NOTICE.txt",
"/package/security_detection_engine/8.1.1/changelog.yml",
"/package/security_detection_engine/8.1.1/manifest.yml",
"/package/security_detection_engine/8.1.1/docs/README.md",
"/package/security_detection_engine/8.1.1/kibana/security_rule/000047bb-b27a-47ec-8b62-ef1a5d2c9e19.json",
"/package/security_detection_engine/8.1.1/kibana/security_rule/00140285-b827-4aee-aa09-8113f58a08f3.json",
"/package/security_detection_engine/8.1.1/kibana/security_rule/0022d47d-39c7-4f69-a232-4fe9dc7a3acd.json",
"/package/security_detection_engine/8.1.1/kibana/security_rule/0136b315-b566-482f-866c-1d8e2477ba16.json",
"/package/security_detection_engine/8.1.1/kibana/security_rule/015cca13-8832-49ac-a01b-a396114809f6.json"
]
} And given the list of package assets, we could encode rule versions in file names and only read rules we know were updated. So we could skip saved objects installation altogether. @elastic/fleet could you please take a look at that approach? Do you have any concerns or limitations associated with it? It could potentially increase the number of requests Kibana makes to |
The 10k limit is because our installation saved object tracks all the installed assets as part of a nested document here An alternative approach could be to look at a more efficient way of recording these rules on the installation object that does not take 1 nested doc per rule but still allows us to find the rules when we come to uninstall the package. The alternative proposal of not installing them at all would probably need a change to the package spec to add a way to indicate to fleet not to install the assets, currently we attempt to install everything in the kibana directory. Though really if these are never going to be installed in kibana, maybe they would be a new kind of asset all together and would not live in the kibana folder? |
@hop-dev Yeah, that makes sense to me. Would that be a significant change on the Fleet side if we were to introduce the new asset type? And what would be the best way to approach that change, do we need to create a ticket/proposal to start the discussion? Meanwhile, I'll continue this PoC to see if we could work efficiently (performance-wise, etc.) with package assets without installing them on the solution side. |
This error points to something that is likely a problematic mapping type in your mappings for security rules. We have other saved object types that have scaled to 100k+ objects without any problems, so I don't think this is a fundamental problem with saved objects. I'd take a good look at your mapping types and see if there's something that could be tweaked there. That said, I also like @hop-dev's suggestion of putting this into some other opaque saved object type to contain the history. You probably don't need all of the same fields to be mapped on these historical rules. Adding new SO types to the package-spec and Fleet's installation logic is pretty low-effort.
We are actually planning to remove this direct file access API very soon and already have some beta/testing versions of the registry available where this API is removed. I will DM you the internal email thread about this change if you'd like to add any feedback. Packages should be completely self-contained and the package registry is not currently intended to be used in this manner. I'd like to explore other options if viable before we consider adding this API back. |
Expanded the description with the "Why we need that change" section for folks who need more context. You could find more context here, but there are too many unrelated details. |
@joshdover, @banderror, and I have synced on the current package limitations and discussed different approaches we could use to overcome them.
|
@mtojek I'd like to get your input on this one. In thinking about how to support this type of use case (looking up old versions of a package asset in order to support a 3-way merge for user customizations), it does seem like the asset download API would be quite advantageous over an ever-growing package that contains all of the historical rules. Let me know if we should chat in more detail about this. |
@joshdover Thanks for the ping! I'm on the fence, to be honest as the intention for the "arbitrary files API" is to expose only static resources like docs or images - artifacts used to render UI. All package configuration is retrieved from package revisions. On the other hand, we could expand the set of extractable assets and extract ones from already published packages. It shouldn't be a big deal for us. As I've written in the email thread, I wouldn't mark this as a blocker for Package Storage v2 migration, as we can iterate on this. I'm sure that you had a deep discussion around this topic and wouldn't like to propose "yet another" idea, especially since I'm not working closely with security rules. Looking at the thread, it seems that rules should be handled similarly to the Git model. A user can modify any file and merge/overwrite with remote changes. Let's consider the following case - if the user's integration is 100 revs behind the latest package revision, Kibana will have to call the Package Storage 100 times (assuming that all revisions are in a single file), which will take time. I'm not sure if option 2. isn't the best choice in terms of predictability and stable implementation. 100 MB doesn't sound like a big issue considering it's only for the purpose of rules conflict resolution. Option 4.: |
I think that a package should work the same independently of its source. Making a package depend on its historical versions may bring problems:
Also, the use of the asset download API brings security concerns. We are trying to introduce signed packages, but there is no way in this API to ensure that the asset is signed, or comes without any unexpected modification from a signed package. This is specially relevant for a security solution, an attacker that could alter the access to this API could send fake historical assets that produce diffs that convert the detection rules in noops. I think though that there are alternatives to include all historical rules in the package or anywhere else:
|
@mtojek, @jsoriano Thanks for taking the time to look into the issue. I'll try to answer your questions, but it would probably be more productive to have a separate meeting to ensure we're on the same page.
We'll need to fetch only two versions to build a diff: the latest available version and the original "forked" rule version. I.e., we'll call the Package Storage only two times. Because we store the entire rule object as a historical version, any versions in-between the two we are comparing are unnecessary for the diff algorithm.
Yeah, that option could work out well. We will probably build an experiment to measure the performance, decide whether it is okay for us, and compare it with other available options.
Could you please elaborate on your proposal? Not sure I understand it. And as for air-gapped systems, we have a filesystem-based distribution method of detection rules. That means we bundle all rules with Kibana and use them as the default fallback if rules from Fleet are unavailable. That said, we want to reconsider the rule distribution method in the future and migrate to a Fleet package that gets installed at build time.
The package itself would not depend on its historical versions. What we are planning to do is to start bundling all released rule versions together as package assets. We'll also add a graceful degradation mechanism on the solution side, so the rule upgrade process will work even if some assets are missing or clients continue to use outdated packages.
Yeah, thanks for highlighting that. We'll have to weigh whether it poses a security risk for us. Ultimately a human operator makes a final decision on whether rules need to be updated or not. If the detection rules package was compromised, it would be clearly visible in the UI, so the risk could not be that high.
For the diff algorithm, yes, two versions would be enough. But we also plan to add the ability to roll back to any historical rule version, so we still have to be able to access all rule versions somehow.
Not sure what problem that could solve? The total number of objects that we store seemingly wouldn't change. Also, we considered storing diffs in the early stages of technical discussions. Still, We decided not to follow that path as it offers no clear advantages. Moreover, the reconstruction algorithm itself is much harder in terms of implementation and also computationally more expensive. |
@mtojek @jsoriano thank you for your feedback! I agree with @xcrzx, let's schedule a meeting so we could answer your questions and resolve any confusion and concerns around this PoC. I also feel like maybe we've done not the best job explaining our needs and goals in the first place. I updated the ticket description and added more information to the documents mentioned in it. I hope it can be helpful for getting more context before the meeting. @joshdover Dmitrii will schedule something on Monday, please join as well if you have time! |
Yes the 10k limit is imposed by Elasticsearch on The 10k objects import limit is a safe guard for protecting the Kibana server's memory because all imported objects are loaded into memory. At the time we couldn't agree about what's the best way to protect the memory so we also added an additional Having said that, I don't think adding 10k (or 1M for your worst case scenario of "10000 rules, 100 historical versions each") saved objects feels like the right data structure for what we're trying to achieve. I don't fully understand all the details but it seems to me like we should be able to have a dedicated "rule history" saved object type that could contain e.g. 100 revisions of that rule in a format that's more compact than a saved object per rule revision. |
Hey @rudolf, thanks for joining the discussion!
I think it would be easier to remove the mapping for the
Thanks, that setting could be very useful for our use case. That said, I think we should measure the impact of importing large sets of saved objects on memory consumption. If it turns out to be too high, we will probably need to import assets in chunks so to protect Kibana from potential OOM.
We consider different options, including storing all historical rule versions in a single saved object. All of them have their pros and cons. For example, the business logic we will implement will require reading individual rule versions, like the ones matching a specific range according to semver. So storing all versions in separate saved objects would allow us to read them efficiently in one database request. Otherwise, we will need to implement more complex logic, like reading rule versions and filtering them in memory. Also, we'll be constrained in the types of requests we can execute against the "compressed" structure. Likely, we wouldn't be able to aggregate data efficiently if we need to in the future for our business logic. That's why we try to consider alternatives first. @rudolf What do you consider a significant number of saved objects that could affect Kibana's performance? And what are the implications of having, let's say, 20000-30000 saved objects? |
Yes, it makes sense to explore the tradeoff here between how much of the processing happens in Elasticsearch vs Kibana.
We have a handful of clusters with > 500k saved objects without any complaints of performance. But the size of the saved objects are a bigger problem than the amount, there's one report where an 11GB So I suspect similar to import/export we would have to benchmark this to try to come up with some kind of upper limit. |
…ects (#148141) **Resolves: #147695, #148174 **Related to: #145851, #137420 ## Summary This PR improves the stability of the Fleet packages installation process with many saved objects. 1. Changed mappings of the `installed_kibana` and `package_assets` fields from `nested` to `object` with `enabled: false`. Values of those fields were retrieved from `_source`, and no queries or aggregations were performed against them. So the mappings were unused, while during the installation of packages containing more than 10,000 saved objects, an error was thrown due to the nested field limitations: ``` Error installing security_detection_engine 8.4.1: The number of nested documents has exceeded the allowed limit of [10000]. This limit can be set by changing the [index.mapping.nested_objects.limit] index level setting. ``` 2. Improved the deletion of previous package assets by switching from sending multiple `savedObjectsClient.delete` requests in parallel to a single `savedObjectsClient.bulkDelete` request. Multiple parallel requests were causing the Elasticsearch cluster to stop responding for some time; see [this ticket](#147695) for more info. **Before** ![Screenshot 2022-12-28 at 11 09 35](https://user-images.githubusercontent.com/1938181/209816219-ade6dd0a-0d56-4acc-929e-b88571f0fe81.png) **After** ![Screenshot 2022-12-28 at 13 56 44](https://user-images.githubusercontent.com/1938181/209816209-16c69922-4ae2-4589-9aa4-5a28050037f4.png)
Closing this issue as completed. See this PR for more info on PoC results: #145851 |
Epic: https://github.com/elastic/security-team/issues/1974 (internal)
Background info:
Summary
To allow users to customize prebuilt detection rules, we need to find a way to distribute the rules with all their historical versions. And as our main rule distribution method is via a Fleet package, we need to ensure that it can handle an increased number of saved objects in the package.
security_rule/[ruleId]:[ruleVersion].json'
Why we need that change
Currently, we allow a limited set of modifications to prebuilt detection rules. Users can modify only rule exceptions and actions. Other rule fields, like description, query, tags, etc., are not modifiable, which creates unavoidable inconvenience for our users. We call this constraint "rule immutability". We want to remove the immutability constraint of prebuilt rules, allow users to make any necessary adjustments, and still receive rule updates.
More on rule customization in the Architecture Design Document. We recommend reading the following relevant sections:
After that more context info can be found in:
Todo
Verify that:
ruleId
.ruleId
.Reach out to stakeholders for feedback (consider doing it via opening an RFC):
security-rule
Saved ObjectsThe text was updated successfully, but these errors were encountered: