-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RAC] Rule registry plugin #95903
[RAC] Rule registry plugin #95903
Conversation
Without looking at the code yet, here's a review of the top-level comment:
Does an ILM policy imply this is immutable data, else we'll need to figure out how mutated documents work with ILM. For example, presumably you can't mutate a document which was ILM'd to a frozen index.
Is that
Interesting - a new take on programmatic mappings? Presumably we generate ES mappings from this? I like! But it will presumably have to grow to accept all the ES mapping-specific stuff we need, perhaps with an extra property hold them them, or something.
That could be expensive; we could provide a capability to allow the specific rules of interest in the search call as well. Or if there's enough filtering available in search, perhaps we wouldn't need that. I guess the degenerate case would be searching for a single rule; hopefully we can optimize that path, I'm guessing that sort of search would end up being used.
No update, so ... mot mutable, so far! I like!
As part of this not-yet-scheduled PR #93704 , we'd be recording the start date of the a new alert instance, in the alert's persisted instance state, so we'll get that for "free". Though not clear if it's in a useful place for RAC purposes. In any case, we're aligned that we should start tracking the "start" date for what we call instanceIds. The current thought is to persist that, so we'd add Likewise, we should start tracking something like a UUID for these series of alerts, but we'll wait to see what RAC settles on, and use that. We could use just the instanceId + date, which is presumably unique, and could actually be useful to SEE (as a human), but UUID does feel better.
So, the question is, if for some reason the rule registry wants to use one of these fields we're already using, because they have application-level data already available in ECS format, how do we deal with that?
I think all the fields up to here are already ECS fields, so if we want to be ECS-friendly, the |
import { createLifecycleRuleTypeFactory } from '../../../../rule_registry/server'; | ||
import { APMRuleRegistry } from '../../plugin'; | ||
|
||
export const createAPMLifecyleRuleType = createLifecycleRuleTypeFactory<APMRuleRegistry>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export const createAPMLifecyleRuleType = createLifecycleRuleTypeFactory<APMRuleRegistry>(); | |
export const createAPMLifecycleRuleType = createLifecycleRuleTypeFactory<APMRuleRegistry>(); |
|
||
registerErrorCountAlertType({ | ||
alerting, | ||
config$: mockedConfig$, | ||
...dependencies, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to spread here if you take out the curly braces.
|
||
registerErrorCountAlertType({ | ||
alerting, | ||
config$: mockedConfig$, | ||
...dependencies, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to spread here if you take out the curly braces.
config$: Observable<APMConfig>; | ||
} | ||
import { RegisterRuleDependencies } from './register_apm_alerts'; | ||
import { createAPMLifecyleRuleType } from './create_apm_lifecycle_rule_type'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import { createAPMLifecyleRuleType } from './create_apm_lifecycle_rule_type'; | |
import { createAPMLifecycleRuleType } from './create_apm_lifecycle_rule_type'; |
alertParams.aggregationType === '95th' ? 95 : 99, | ||
], | ||
aggs: { | ||
metric: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kibana/x-pack/plugins/apm/server/lib/helpers/latency_aggregation_type/index.ts
Lines 10 to 28 in d678ffb
export function getLatencyAggregation( | |
latencyAggregationType: LatencyAggregationType, | |
field: string | |
) { | |
return { | |
latency: { | |
...(latencyAggregationType === LatencyAggregationType.avg | |
? { avg: { field } } | |
: { | |
percentiles: { | |
field, | |
percents: [ | |
latencyAggregationType === LatencyAggregationType.p95 ? 95 : 99, | |
], | |
}, | |
}), | |
}, | |
}; | |
} |
latency
instead of metric
.
'apm_oss.errorIndices': 'apm-*', | ||
'apm_oss.transactionIndices': 'apm-*', | ||
/* eslint-enable @typescript-eslint/naming-convention */ | ||
}) as Observable<APMConfig>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do of<APMConfig>(
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not, because the definition is incomplete. I can try though.
@@ -161,6 +145,28 @@ export class APMPlugin implements Plugin<APMPluginSetup> { | |||
config: await mergedConfig$.pipe(take(1)).toPromise(), | |||
}); | |||
|
|||
const apmRuleRegistry = plugins.observability.registry.create({ | |||
namespace: 'apm', | |||
fieldMap: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does service.name
go here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oic it's in the observability registry. The inheritance is nice but if you're just looking at the code you would have to walk the tree to grok the fieldmap, at least in your head. Hopefully good types will mitigate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The types are magic 👽
const api = await annotationsApiPromise; | ||
return api?.getScopedAnnotationsClient(...args); | ||
}, | ||
registry: plugins.ruleRegistry.create({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
registry
isn't really clear. registry of what? Would ruleRegistry
be ok or does that make it confusing because of the plugin with the same name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point, will change!
writeFile( | ||
outputFieldMapFilename, | ||
` | ||
export const ecsFieldMap = ${JSON.stringify(fields, null, 2)} as const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do it this way instead of writing out a .json and importing that directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exact types!
core: CoreSetup, | ||
plugins: { alerting: AlertingPluginSetupContract } | ||
): RuleRegistryPluginSetupContract { | ||
const globalConfig = this.initContext.config.legacy.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a non legacy config we can use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can look into that. Stole this from somewhere else, I think the event log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like most all usages throughout plugin initialization use initContext.config.legacy.get()
, even kibana security. @kobelb are you aware of a non-legacy way or retrieving, or any other tangential implications here with regards to https://github.com/elastic/cloud/issues/77641?
|
||
const ruleUuids: string[] = []; | ||
|
||
for await (const response of pitFinder.find()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!
thanks for the feedback @pmuellr, good stuff.
I wouldn't say it guarantees immutable data. I have not fully figured out how updating would look. But I assume even from a frozen index we can read + index a new document. And, maybe we can assume we don't allow the user to directly set the ILM policy, but e.g. only allow them to configure the retention period.
Good one, I'll change
Yep - right now it just picks array/required from the object, and passes everything else to the field mapping. I hope we can stick with this for a while, we can always add another precompile step later if needed.
Fair point. I think for now it might be good enough, and we can optimize later, your suggestions would be a good start.
I'm not sure what you mean. Do you mean that fields from the alert they want to index conflict with the fields that are used in the root rule registry?
Agree, I will prefix everything not in ECS - suggestions welcome :) |
...fields, | ||
'kibana.rac.alert.id': id, | ||
}; | ||
return alertInstanceFactory(id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious to see how the alerting notion of an "alert instance id" works out here. I know we've been talking about an "alert id", which would identify a "span" (not sure of the current terminology) of a number of consecutive alerts being triggered. We don't have such a thing in alerting today, but would be straight-forward to create UUIDs for these.
I think it may work out well to keep this as an rule-specific thing, so that for rules that care about these "spans", they'd need to make sure they provide unique id
values when appropriate, but for rules that don't, they can reuse id
values as is often done in rules today (for example, an id
could identify a term used in grouping for an agg, which would not be unique over time).
Assuming we eventually add an automagic UUID to identify a span of consecutive alerts, the rule registry layer could make use of that directly.
}; | ||
|
||
if (scopedRuleRegistryClient && trackedAlertStatesOfRecovered.length) { | ||
const { events } = await scopedRuleRegistryClient.search({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if rather than searching for this data, we could cache it in either the rule state or rule instance state, both of which are persisted in the rule SO. Problem with storing it there is bulking up those SOs, which could cause general perf issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've considered it, but indeed, the amount of data that would have to be stored in the rule state would be significantly more than what currently is needed. Esp with 100+ alerts being fired for an execution. Plus, it's caching, and we all know cache invalidation sucks 😄
💚 Build Succeeded
Metrics [docs]Async chunks
History
To update your PR or re-run it, just comment with: |
Co-authored-by: Kibana Machine <[email protected]>
Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>
…to-metrics-tab * 'master' of github.com:elastic/kibana: (44 commits) [Exploratory View]Additional metrics for kpi over time (elastic#96532) [Fleet] UI changes on hosted policy detail view (elastic#96337) Stacked line charts incorrectly shows one term as 100% (elastic#96203) [Fleet] Create enrollment API keys as current user (elastic#96464) [Lens] Make table and metric show on top Chart switcher (elastic#96601) skip flaky suite (elastic#96691) [Lens] Hide "Show more errors" once expanded (elastic#96605) [Discover] Unskip histogram hiding test (elastic#95759) skip flyout test, add linked issue elastic#96708 skip copy_to_space_flyout_internal.test.tsx elastic#96708 fix config validation (elastic#96502) Document telemetry fields for stack security features (elastic#96638) [Partial Results] Move inspector adapter integration into search source (elastic#96241) [RAC] Rule registry plugin (elastic#95903) [APM] Run precommit tasks sequentially (elastic#96551) [Maps] fix Kibana does not recognize a valid geo_shape index when attempting to create a Tracking Containment alert (elastic#96633) [Security Solution] [Cases] Small UI bugfixes (elastic#96511) [Actions UI] Changed PagerDuty action form UI to fill payload fields according to the API docs for Resolve and Acknowledge events. (elastic#96363) App Search: Result Component Updates (elastic#96184) [Alerting] Preconfigured alert history index connector (elastic#94909) ...
…Registry (#96015) ## Summary This PR starts the migration of the Security Solution rules to use the rule-registry introduced in #95903. This is a pathfinding effort in porting over the existing Security Solution rules, and may include some temporary reference rules for testing out different paradigms as we move the rules over. See #95735 for details Enable via the following feature flags in your `kibana.dev.yml`: ``` # Security Solution Rules on Rule Registry xpack.ruleRegistry.index: '.kibana-[USERNAME]-alerts' # Only necessary to scope from other devs testing, if not specified defaults to `.alerts-security-solution` xpack.securitySolution.enableExperimental: ['ruleRegistryEnabled'] ``` > Note: if setting a custom `xpack.ruleRegistry.index`, for the time being you must also update the [DEFAULT_ALERTS_INDEX](https://github.com/elastic/kibana/blob/9e213fb7a5a0337591a50a0567924ebe950b9791/x-pack/plugins/security_solution/common/constants.ts#L28) in order for the UI to display alerts within the alerts table. --- Three reference rule types have been added (`query`, `eql`, `threshold`), along with scripts for creating them located in: ``` x-pack/plugins/security_solution/server/lib/detection_engine/reference_rules/scripts/ ``` Main Detection page TGrid queries have been short-circuited to query `.alerts-security-solution*` for displaying alerts from the new alerts as data indices. To test, checkout, enable the above feature flag(s), and run one of the scripts from the above directory, e.g. `./create_reference_rule_query.sh` (ensure your ENV vars as set! :) Alerts as data within the main Detection Page 🎉 <p align="center"> <img width="500" src="https://user-images.githubusercontent.com/2946766/119911768-39cfba00-bf17-11eb-8996-63c0b813fdcc.png" /> </p> cc @madirey @dgieselaar @pmuellr @yctercero @dhurley14 @marshallmain
…Registry (elastic#96015) ## Summary This PR starts the migration of the Security Solution rules to use the rule-registry introduced in elastic#95903. This is a pathfinding effort in porting over the existing Security Solution rules, and may include some temporary reference rules for testing out different paradigms as we move the rules over. See elastic#95735 for details Enable via the following feature flags in your `kibana.dev.yml`: ``` # Security Solution Rules on Rule Registry xpack.ruleRegistry.index: '.kibana-[USERNAME]-alerts' # Only necessary to scope from other devs testing, if not specified defaults to `.alerts-security-solution` xpack.securitySolution.enableExperimental: ['ruleRegistryEnabled'] ``` > Note: if setting a custom `xpack.ruleRegistry.index`, for the time being you must also update the [DEFAULT_ALERTS_INDEX](https://github.com/elastic/kibana/blob/9e213fb7a5a0337591a50a0567924ebe950b9791/x-pack/plugins/security_solution/common/constants.ts#L28) in order for the UI to display alerts within the alerts table. --- Three reference rule types have been added (`query`, `eql`, `threshold`), along with scripts for creating them located in: ``` x-pack/plugins/security_solution/server/lib/detection_engine/reference_rules/scripts/ ``` Main Detection page TGrid queries have been short-circuited to query `.alerts-security-solution*` for displaying alerts from the new alerts as data indices. To test, checkout, enable the above feature flag(s), and run one of the scripts from the above directory, e.g. `./create_reference_rule_query.sh` (ensure your ENV vars as set! :) Alerts as data within the main Detection Page 🎉 <p align="center"> <img width="500" src="https://user-images.githubusercontent.com/2946766/119911768-39cfba00-bf17-11eb-8996-63c0b813fdcc.png" /> </p> cc @madirey @dgieselaar @pmuellr @yctercero @dhurley14 @marshallmain # Conflicts: # x-pack/plugins/security_solution/server/plugin.ts
…Registry (#96015) (#100940) ## Summary This PR starts the migration of the Security Solution rules to use the rule-registry introduced in #95903. This is a pathfinding effort in porting over the existing Security Solution rules, and may include some temporary reference rules for testing out different paradigms as we move the rules over. See #95735 for details Enable via the following feature flags in your `kibana.dev.yml`: ``` # Security Solution Rules on Rule Registry xpack.ruleRegistry.index: '.kibana-[USERNAME]-alerts' # Only necessary to scope from other devs testing, if not specified defaults to `.alerts-security-solution` xpack.securitySolution.enableExperimental: ['ruleRegistryEnabled'] ``` > Note: if setting a custom `xpack.ruleRegistry.index`, for the time being you must also update the [DEFAULT_ALERTS_INDEX](https://github.com/elastic/kibana/blob/9e213fb7a5a0337591a50a0567924ebe950b9791/x-pack/plugins/security_solution/common/constants.ts#L28) in order for the UI to display alerts within the alerts table. --- Three reference rule types have been added (`query`, `eql`, `threshold`), along with scripts for creating them located in: ``` x-pack/plugins/security_solution/server/lib/detection_engine/reference_rules/scripts/ ``` Main Detection page TGrid queries have been short-circuited to query `.alerts-security-solution*` for displaying alerts from the new alerts as data indices. To test, checkout, enable the above feature flag(s), and run one of the scripts from the above directory, e.g. `./create_reference_rule_query.sh` (ensure your ENV vars as set! :) Alerts as data within the main Detection Page 🎉 <p align="center"> <img width="500" src="https://user-images.githubusercontent.com/2946766/119911768-39cfba00-bf17-11eb-8996-63c0b813fdcc.png" /> </p> cc @madirey @dgieselaar @pmuellr @yctercero @dhurley14 @marshallmain # Conflicts: # x-pack/plugins/security_solution/server/plugin.ts
@@ -22,11 +22,13 @@ const environmentLabels: Record<string, string> = { | |||
}; | |||
|
|||
export const ENVIRONMENT_ALL = { | |||
esFieldValue: undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgieselaar This (esFieldValue
) is being passed to the DOM elements causing the following error:
export function parseEnvironmentUrlParam(environment: string) { | ||
if (environment === ENVIRONMENT_ALL_VALUE) { | ||
return ENVIRONMENT_ALL; | ||
} | ||
|
||
if (environment === ENVIRONMENT_NOT_DEFINED_VALUE) { | ||
return ENVIRONMENT_NOT_DEFINED; | ||
} | ||
|
||
return { | ||
esFieldValue: environment, | ||
value: environment, | ||
text: environment, | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaict this should be:
export function getEsFieldValue(environment: string) {
if (environment === ENVIRONMENT_ALL_VALUE) {
return ENVIRONMENT_ALL;
}
if (environment === ENVIRONMENT_NOT_DEFINED_VALUE) {
return ENVIRONMENT_NOT_DEFINED;
}
return environment
}
Then you can call it the specific places where needed (currently just for alerts)
const environmentParsed = parseEnvironmentUrlParam( | ||
alertParams.environment | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be:
const environment = getEsFieldValue(alertParams.environment)
The rule registry plugin aims to make it easy for rule type producers to have their rules produce the data that they need to build rich experiences on top of a unified experience, without the risk of mapping conflicts.
A rule registry creates a template, an ILM policy, and an alias. The template mappings can be configured. It also injects a client scoped to these indices.
It also supports inheritance, which means that producers can create a registry specific to their solution or rule type, and specify additional mappings to be used.
The rule registry plugin creates a root rule registry, with the mappings defined needed to create a unified experience. Rule type producers can use the plugin to access the root rule registry, and create their own registry that branches off of the root rule registry. The rule registry client sees data from its own registry, and all registries that branches off of it. It does not see data from its parents.
Note that writing is disabled by default. It can be enabled by setting
xpack.ruleRegistry.writeEnabled: true
in kibana.yml.Creating a rule registry
To create a rule registry, producers should add the
ruleRegistry
plugin to their dependencies. They can then use theruleRegistry.create
method to create a child registry, with the additional mappings that should be used by specifyingfieldMap
:fieldMap
is a key-value map of field names and mapping options:ECS mappings are generated via a script in the rule registry plugin directory. These mappings are available in x-pack/plugins/rule_registry/server/generated/ecs_field_map.ts.
To pick many fields, you can use
pickWithPatterns
, which supports wildcards with full type support.If a registry is created, it will initialise as soon as the core services needed become available. It will create a (versioned) template, alias, and ILM policy, but only if these do not exist yet.
Rule registry client
The rule registry client can either be injected in the executor, or created in the scope of a request. It exposes a
search
method and abulkIndex
method. Whensearch
is called, it first gets all the rules the current user has access to, and adds these ids to the search request that it executes. This means that the user can only see data from rules they have access to.Both
search
andbulkIndex
are fully typed, in the sense that they reflect the mappings defined for the registry.Schema
The following fields are available in the root rule registry:
@timestamp
: the ISO timestamp of the alert event. For the lifecycle rule type helper, it is always the value ofstartedAt
that is injected by the Kibana alerting framework.event.kind
: signal (for the changeable alert document), state (for the state changes of the alert, e.g. when it opens, recovers, or changes in severity), or metric (individual evaluations that might be related to an alert).event.action
: the reason for the event. This might beopen
,close
,active
, orevaluate
.tags
: tags attached to the alert. Right now they are copied over from the rule.rule.id
: the identifier of the rule type, e.g.apm.transaction_duration
rule.uuid
: the saved objects id of the rule.rule.name
: the name of the rule (as specified by the user).rule.category
: the name of the rule type (as defined by the rule type producer)kibana.rac.producer
: the producer of the rule type. Usually a Kibana plugin. e.g.,APM
kibana.rac.alert.id
: the id of the alert, that is unique within the context of the rule execution it was created in. E.g., for a rule that monitors latency for all services in all environments, this might beopbeans-java:production
.kibana.rac.alert.uuid
: the unique identifier for the alert during its lifespan. If an alert recovers (or closes), this identifier is re-generated when it is opened again.kibana.rac.alert.status
: the status of the alert. Can beopen
orclosed
.kibana.rac.alert.start
: the ISO timestamp of the time at which the alert started.kibana.rac.alert.end
: the ISO timestamp of the time at which the alert recovered.kibana.rac.alert.duration.us
: the duration of the alert, in microseconds. This is always the difference between either the current time, or the time when the alert recovered.kibana.rac.alert.severity.level
: the severity of the alert, as a keyword (e.g. critical).kibana.rac.alert.severity.value
: the severity of the alert, as a numerical value, which allows sorting.This list is not final - just a start. Field names might change or moved to a scoped registry. If we implement log and sequence based rule types the list of fields will grow. If a rule type needs additional fields, the recommendation would be to have the field in its own registry first (or in its producer’s registry), and if usage is more broadly adopted, it can be moved to the root registry.
Reference implementation: lifecycle rule type
A lifecycle rule type helper is available in x-pack/plugins/rule_registry/server/rule_registry/rule_type_helpers/create_lifecycle_rule_type_factory. This function wraps the rule type executor, to track fired alerts. If an alert starts firing, it gets assigned a uuid and a start date. On successive violations, the uuid is persisted. If an alert no longer fires, it recovers, and its state is dropped. These alerts are indexed as alert state documents (so append-only, but without individual measurements).
TBD: