Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Investigation] Smart Limits for Detection Rules #4150

Open
Mikaayenson opened this issue Oct 11, 2024 · 9 comments
Open

[Investigation] Smart Limits for Detection Rules #4150

Mikaayenson opened this issue Oct 11, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request Team: TRADE

Comments

@Mikaayenson
Copy link
Contributor

Repository Feature

Core Repo - (rule management, validation, testing, lib, cicd, etc.)

Problem Description

With #3842 we limited the number of versions per rule released down to 2 versions. We've been asked by D&R and our PM @approksiu to investigate the LOE to implement smart limits elastic/kibana#187645 . There are hurdles to implement this feature for both publishing and ingesting rules. This issue tracker will help to understand any technical limitations on our end.

This issue is purely an investigation to provide more insight to what limitations /options we can control from this side.

Desired Solution

Details on options / hurdles to implement smart limits. Examples:

  • Limit based on latest over a period of time?
  • Limit based on rule changes?
  • etc.

Considered Alternatives

We may not need to implement complicated smart limits at all based on some initial testing by @xcrzx who may have a solution that supports the number of rule versions published to a limit that we theoretically would not reach (considering the qualitative balance between the number of rules vs maintaining and publishing a number of high quality rules).

This would be another option if they are successful.

Additional Context

  • [Security Solution] Smart limits for the package with prebuilt rules kibana#187645
  • Since we rely on epr for historical context, we will need to traverse history and at least ship one time or maintain someway a list of the baseline (initial) rule so that the rule customization feature coming upstream will know how to best diff and provide a better out of box experience for users. We need to consider that some rules may have been released at a stack version (with a rule version) that we no longer release to. We also need to consider other things like rule forks and rule releases against different stack versions.
@Mikaayenson
Copy link
Contributor Author

Mikaayenson commented Oct 24, 2024

Update Oct 24

@xcrzx Was able to test loading rules with a limit of about 20k rules. We may be able to remove the limit altogether starting in 8.17. We still need to wait on the serverless testing to confirm.

Long term, we all agree it would be better to have a distribution mechanism that uses git (where kibana pulls directly from the detection-rules repo branches).

@xcrzx
Copy link

xcrzx commented Nov 5, 2024

Hey @Mikaayenson, I finished testing the package limits on Serverless last week, and the results are in this PR. Long story short, there were no issues installing a 15k-rule package on a Serverless project. Memory consumption was stable and well within the 1GB limit, averaging around 40-45%, which is a great result. I couldn’t test a 20k-rule package due to some other test environment limitations, but I don’t anticipate any significant differences.

We can set the total assets-per-package limit pretty high, like the tested 15k. We’ll probably never reach it, but it’s safer to keep the limit in place in case we get close before migrating from Fleet to a Git-based rule distribution system. Do you know how many saved objects would be in the package if we include current rules with all their historical versions?

@shashank-elastic
Copy link
Contributor

Hey @xcrzx today we have roughly around ~1250 rules in each rule packages for versions such as 8.12 --- 8.15 ( ball park numbers that can vary on actual) refer.

Previous to the limiting of historical rules to the latest 2 we would have roughly around ~3700 rules in a package for older releases refer

Without having the latest rules added to the EPR historical data, its difficult to arrive at the actual numbers of rules that would be in the packages once we remove the limiting of versions. Given that the tested limit was roughly around 15k, I don't see the rule assets reaching this number immediately (may be grow over time as we add more versions to the package) even if we build today without limiting historical versions.

@xcrzx
Copy link

xcrzx commented Nov 11, 2024

Thanks for the info, @shashank-elastic!

The stream-based package installation has been merged, so we can plan the limit increase. It’s set for release in Kibana 8.17, meaning we can raise the package limit starting with that version. For earlier Kibana versions, we should still limit the package to the two most recent versions per rule to prevent OOM issues

Let’s also discuss what we are planning to do if a package reaches the 15k limit. I’ve been considering an algorithm with these steps:

  1. Select all current rule versions and add them to the package.
  2. Gather all historical rule versions and sort them from newest to oldest.
  3. Add the newest historical versions until the package size reaches 15k.

This would allow us to discard only the oldest rule versions, the ones users are least likely to need, providing an optimal rule upgrade experience.

However, I recall there were challenges with retrieving the creation date of rule versions, which could complicate sorting from newest to oldest. Is that still the case?

I’d also welcome any thoughts on this approach or alternative implementation ideas. Let me know if you see a different way forward.

cc @Mikaayenson @banderror

@banderror
Copy link

banderror commented Nov 23, 2024

@Mikaayenson @approksiu @xcrzx /cc @shashank-elastic

Short recap of our last zoom call and the timing:

  • Mika will allocate resources and prioritize creating a package with all historical rule versions. We acknowledged that there are not many engineers who can work on this.
  • We need the package ASAP to be able to start testing prebuilt rule customization and upgrade workflows properly.
  • Not having historical versions in the package is a release blocker for the prebuilt rule customization epic. Currently, it causes high-impact bugs in the upgrade functionality that we're about to finish developing: [Security Solution] Prebuilt rule customization is lost on upgrade when Base version is missing kibana#201500.
  • TRADE team will try to prepare some package for testing as soon as Monday, December 2nd. It needs to be pushed to EPR as a prerelease package.
  • Our goal is to finish all the work on the package by Monday, December 16th.

@Mikaayenson
Copy link
Contributor Author

Mikaayenson commented Nov 25, 2024

@shashank-elastic we may be able to do something like this to download all the rules.

import json
import semver
from pathlib import Path
from semver import Version
from detection_rules.integrations import get_integration_manifests, SecurityDetectionEngine

# Constants
MIN_VERSION = "8.14.1"
MAX_VERSION = "8.17.0"
OUTPUT_FOLDER = "all_sde_folders"

# Fetch and sort manifests
manifests = get_integration_manifests("security_detection_engine", False, MAX_VERSION)
sorted_manifests = sorted(manifests, key=lambda manifest: Version.parse(manifest["version"]), reverse=True)

# Filter versions greater than or equal to the minimum version
filtered_versions = [
    semver.VersionInfo.parse(manifest["version"])
    for manifest in sorted_manifests
    if semver.VersionInfo.parse(manifest["version"]) >= semver.VersionInfo.parse(MIN_VERSION)
]

# Print filtered versions
print("Filtered Versions:", filtered_versions)

# Ensure output folder exists
output_path = Path(OUTPUT_FOLDER)
output_path.mkdir(parents=True, exist_ok=True)

# Load assets and save them to disk
sde = SecurityDetectionEngine()

for version in filtered_versions:
    assets = sde.load_integration_assets(version)
    for asset, data in assets.items():
        asset_path = output_path / f"{asset}.json"
        with open(asset_path, "w", encoding="utf-8") as f:
            f.write(json.dumps(data))

@Mikaayenson
Copy link
Contributor Author

Mikaayenson commented Nov 25, 2024

all_sde_folders.zip

@shashank-elastic Here are all the rules and some sample output.

54902e45-3467-49a4-8abc-529f2c8cfb80_211.json ad959eeb-2b7b-4722-ba08-a45f6622f005_4.json   ff6cf8b9-b76c-4cc1-ac1b-4935164d1029.json
54a81f68-5f2a-421e-8eed-f888278bb712.json     adb961e0-cb74-42a0-af9e-29fc41f88f5f.json     ff6cf8b9-b76c-4cc1-ac1b-4935164d1029_1.json
54a81f68-5f2a-421e-8eed-f888278bb712_108.json adb961e0-cb74-42a0-af9e-29fc41f88f5f_105.json ff6cf8b9-b76c-4cc1-ac1b-4935164d1029_201.json
54a81f68-5f2a-421e-8eed-f888278bb712_109.json adb961e0-cb74-42a0-af9e-29fc41f88f5f_106.json ff9b571e-61d6-4f6c-9561-eb4cca3bafe1.json
54a81f68-5f2a-421e-8eed-f888278bb712_2.json   adb961e0-cb74-42a0-af9e-29fc41f88f5f_107.json ff9b571e-61d6-4f6c-9561-eb4cca3bafe1_103.json
54a81f68-5f2a-421e-8eed-f888278bb712_210.json adb961e0-cb74-42a0-af9e-29fc41f88f5f_108.json ff9b571e-61d6-4f6c-9561-eb4cca3bafe1_104.json
54a81f68-5f2a-421e-8eed-f888278bb712_3.json   adb961e0-cb74-42a0-af9e-29fc41f88f5f_109.json ff9bc8b9-f03b-4283-be58-ee0a16f5a11b.json
54a81f68-5f2a-421e-8eed-f888278bb712_4.json   adb961e0-cb74-42a0-af9e-29fc41f88f5f_110.json ff9bc8b9-f03b-4283-be58-ee0a16f5a11b_1.json
54a81f68-5f2a-421e-8eed-f888278bb712_5.json   adbfa3ee-777e-4747-b6b0-7bd645f30880.json     ff9bc8b9-f03b-4283-be58-ee0a16f5a11b_2.json
54a81f68-5f2a-421e-8eed-f888278bb712_6.json   adbfa3ee-777e-4747-b6b0-7bd645f30880_1.json   ff9bc8b9-f03b-4283-be58-ee0a16f5a11b_3.json
54a81f68-5f2a-421e-8eed-f888278bb712_7.json   adbfa3ee-777e-4747-b6b0-7bd645f30880_2.json   ff9bc8b9-f03b-4283-be58-ee0a16f5a11b_4.json
54a81f68-5f2a-421e-8eed-f888278bb712_8.json   adbfa3ee-777e-4747-b6b0-7bd645f30880_3.json   ff9bc8b9-f03b-4283-be58-ee0a16f5a11b_5.json
54c3d186-0461-4dc3-9b33-2dc5c7473936.json     adbfa3ee-777e-4747-b6b0-7bd645f30880_4.json
54c3d186-0461-4dc3-9b33-2dc5c7473936_103.json adbfa3ee-777e-4747-b6b0-7bd645f30880_5.json
(detection-rules-build) ➜  all_sde_folders git:(main) ✗ ls -la |wc -l
    8320
(detection-rules-build) ➜  all_sde_folders git:(main) ✗ 

You can probably use this small script that will check the version lock file and filter out rules that aren't in the current lock.

import shutil
from pathlib import Path

from detection_rules.version_lock import load_versions, DeprecatedRulesFile

# Paths
source_folder = Path("all_sde_folders")
destination_folder = Path("filtered_all_assets")

# Ensure the destination folder exists
destination_folder.mkdir(parents=True, exist_ok=True)
version_lock = load_versions()
deprecated_version_lock = DeprecatedRulesFile.load_from_file().to_dict()

# Get all filenames from the source folder
files = list(source_folder.glob("*.json"))

# Extract UUIDs from filenames and copy matching files
for file in files:
    # Extract UUID from the filename (before the first underscore or `.json`)
    uuid = file.stem.split("_")[0]

    # Check if the UUID exists in the version lock file
    if uuid in version_lock:
        if uuid in deprecated_version_lock:
            print(f"skipped deprecated: {uuid}")
        else:
            # Copy the file to the destination folder
            shutil.copy(file, destination_folder / file.name)

print(f"Filtered files have been moved to {destination_folder}.")

Then we can add these as a beta package to epr.

@banderror @xcrzx Just an FYI, so that we do NOT break out release workflow, the package will be 8.16.2-beta.1.

@shashank-elastic
Copy link
Contributor

The Filtered assets have been created filtered_all_assets.zip

We will now create a Beta package PR for 8.16.2-beta.1. We will not create a 8.`7 version beta package as this will really mess up or regular release cadences.

Once the Package is created will notify the issue

@shashank-elastic
Copy link
Contributor

The test package created with historical rules is available in the EPR - https://epr.elastic.co/package/security_detection_engine/8.16.2-beta.1/ with the kibana version constrict of ^8.17.0

cc @Mikaayenson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Team: TRADE
Projects
None yet
Development

No branches or pull requests

4 participants