Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] max_signals validation follow up fixes #182643

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ export type AlertsIndexNamespace = z.infer<typeof AlertsIndexNamespace>;
export const AlertsIndexNamespace = z.string();

export type MaxSignals = z.infer<typeof MaxSignals>;
export const MaxSignals = z.number().int().min(1);
export const MaxSignals = z.number().int().min(1).max(1000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be coordinated with a detection-rules release as some default to 10k as well, right?

Or do the pre-built rules flow through a different part of the API?

I'm sure you're tracking this, so don't mind me....just saw these errors with my rules this morning so that's how If found myself here 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great question, I didn't know we had detection rules with that high of a max_signals count. That could potentially cause problems when trying to duplicate rules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @spong for this catch 🙏

Additionally, the way it's implemented is a breaking change in the API. If the value is higher than 1000, attempts to create or update a rule will fail with 400, which wasn't the case before. There shouldn't be a max limit at the schema level, we should calculate the final value of max_signals to use in rule executors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great question, I didn't know we had detection rules with that high of a max_signals count. That could potentially cause problems when trying to duplicate rules.

Yep, I think rule installation and upgrade will be broken as well for those prebuilt rules that have this value higher than 1000.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there are 18 rules in the detection-rules repo that have max_signals set to 10k.

Do I get it right that even if we don't merge this PR, in the default Kibana configuration where xpack.alerting.rules.run.alerts.max == 1000 all these rules, including the Endpoint Security one, will be executing with warnings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be the expected action for rule executions with max_signals > 1000 now so those 18 rules would run with warnings out of the box

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Not a strong opinion, but I think this UX might not be the most optimal. If the expectation from the TRaDE team is that those 18 rules should promote as many external alerts as possible (hense the 10k setting), I would suggest to:

  1. Either override the system's max limit with the rule's max_signals, for these rules. This could be done via an additional rule parameter max_signals_force: boolean and an additional checkbox in the UI. (not sure that it's a good idea).
  2. Or not generate a Warning status if max_signals > xpack.alerting.rules.run.alerts.max AND it's a prebuilt rule. (looks like a good tradeoff to me).
  3. Or maybe set the max_signals value for those 18 rules to 1000, if the TRaDE team would be ok with that. (might be not an option if the goal is to promote as many external alerts as possible and let users increase the xpack.alerting.rules.run.alerts.max setting for that).

Maybe there's something else we could do. But I think we should try to make sure rules work out of the box with default settings. The Endpoint Security rule is the most basic one, it is installed and enabled automatically with Elastic Defend, and it would be weird to see it not succeeding, unless the user changes some system setting in the Kibana config, for all Kibana instances in their cluster.

@jpdjere @dplumlee Could we please sync on this in the simplified protections channel with the TRADE team, Kseniia and Alex and see what they have to say?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think options 2 or 3 would be my preferences off the top of my head. Something to consider is that even if we don't display a warning, the rule would still be hard capped by the xpack.alerting.rules.run.alerts.max config value out of the box and, should the rule ever reach that value (however unlikely), would throw an error from the alerting side of things letting us know we hit it. With that in mind, maybe option 3 would be best, but we'd need to check with TRaDE team for the intent behind that 10k setting.

Copy link
Contributor

@banderror banderror May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like option 3 is viable based on Justin's response:

As an example if an environment had say 100k hosts and an upstream integration, such as elastic endpoint happens to generate a batch of alerts over that time, we want to allow them all to match
As you can see, the larger the deployment, the greater the likelihood of hitting that capacity.

So I guess my vote goes to the 2nd option + documenting this in multiple places + maybe even showing a warning on the Rule Details page (not a warning rule status, but a static callout or something like that with a link to the public docs). I added a topic to discuss this to the Simplified Protection's agenda.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the argument for changing this to max of 1000? In addition to Garrett's comment, I'm not entirely sure we can't still set xpack.alerting.rules.run.alerts.max to something greater than 1000, I think we just "don't support it" as in we can't guarantee things don't break on your cluster. Is there something else that would limit it outside of that config value? I get the argument that it's weird to give users the ability to put insanely big numbers if we just use the minimum of 2 values but we could do that same kind of wacky stuff with other form components too that wouldn't actually end up having an impact on the rule. And we also have the same workflow if the config value was set to 10 and the user entered a value like 50 - we would just use the value they didn't enter.

The initial reason we uncapped the limit was to make it easier to still create/edit rules with the default or existing values even after the alerting config value had changed. Now users can change unrelated fields in existing rules without having to go in and also modify max_signals in order to get the rule update to pass API validation. I also understand that setting it that high makes it a very rare edge case so I don't care too much about it (outside of it potentially causing issues with prebuilt rules), but just wanted to explain the genesis of why it was originally set that way!


export type ThreatSubtechnique = z.infer<typeof ThreatSubtechnique>;
export const ThreatSubtechnique = z.object({
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,19 @@ describe('rules schema', () => {
);
});

test('max_signals cannot be greater than 1000', () => {
const payload: RuleCreateProps = {
...getCreateRulesSchemaMock(),
max_signals: 1001,
};

const result = RuleCreateProps.safeParse(payload);
expectParseError(result);
expect(stringifyZodError(result.error)).toEqual(
'max_signals: Number must be less than or equal to 1000'
);
});

test('max_signals can be 1', () => {
const payload: RuleCreateProps = {
...getCreateRulesSchemaMock(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,11 @@ export const appErrorToErrorStack = (error: AppError): Error => {
: '';
const stringifiedError = getStringifiedStack(error);
const adaptedError = new Error(
`${String(error.body.message).trim() !== '' ? error.body.message : error.message} ${statusCode}`
postprocessErrorString(
`${
String(error.body.message).trim() !== '' ? error.body.message : error.message
} ${statusCode}`
)
);
// Note although all the Typescript typings say that error.name is a string and exists, we still can encounter an undefined so we
// do an extra guard here and default to empty string if it is undefined
Expand Down Expand Up @@ -239,3 +243,8 @@ export const isEmptyObjectWhenStringified = (item: unknown): boolean => {
return false;
}
};

function postprocessErrorString(str: string): string {
// Remove the `[request body]` prefix added by Zod for request validation errors
return str.replace(/\[request body\]:/g, '');
}
Comment on lines +247 to +250
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to remove the [request body] part of the string? How can we distinguish errors in the body from errors in the query part of the request?