[Spike] Investigate separate Index lifecycle policies for each datastream #453

paulb-elastic · 2022-02-09T15:35:09Z

All monitors added in Monitor Management, use data streams to write back results to ES. There is a separate data stream for each monitor type (ICMP, HTTP or TCP), with browser monitors being further split down between the type of data we store (network, screenshot etc.).

In addition, the namespace that’s been defined when setting up the monitor (which will be default by default), is appended to the name of the data stream.

This can be visualised for example, with this set of monitors:

All these have been left on the default namespace of default except for the Test Browser in my_namespace monitor, which has been given a namespace of my_namespace:

In Index Management, we can see all the data streams that we use for all of these monitors (all begin with synthetics-...):

As you can see, there is one for each type of monitor, within each namespace, and browser monitors are further split into ...browser..., ...browser.network... and ...browser.screenshot....

However, all of these separate data streams, all use the same synthetics Index Lifecycle Policy:

As a result, every type of monitor, and each category of the browser results, are subject to the same retention period:

This means it’s not possible for users to be able to granularly configure the retention periods based on the type of monitor, or type of data.

For example, a typical use case may be to keep browser result data for 13 months (to allow year-on-year comparison), network data for 3 months, and screenshots for 1 month. This allows the user to balance how much storage they are consuming for these results, based on the value of that data being available.

Spike Expectations

This spike is to investigate if we can automatically configure a separate Index Lifecycle Policy for each of the data streams. It’s reasonable to imagine a 1:1 set up between each data stream to a separate Index Lifecycle Policy, even if they all begin with the same, default configuration. This then allows users to further configure these based on their needs and to control the amount of storage being consumed.

One consideration is that the data stream does not exist until a monitor is created, in a given namespace. So, in the above example, there is no data stream called synthetics-browser.network-my_namespace until a browser monitor is created in Monitor Management and saved in the my_namespace namespace. The first result will begin writing to the new synthetics-browser.network-my_namespace data stream, which will be subject to the existing synthetics Index Lifecycle Policy.

This spike needs to look into how we would be able to create these Index Lifecycle Policies on demand, and if there are any other implications of this.

You could imagine users making use of the namespace setting to further configure different data streams (and, by extension, the Index Lifecycle Policies) for monitors that should have different retention periods based on their business value, or a namespace (and associated less valuable monitor results) used to move data through warm/cold/frozen/delete phases quicker.

The text was updated successfully, but these errors were encountered:

dominiqueclarke · 2022-02-09T16:46:31Z

Looks like Fleet may be keen to allow configuring ILM policies per data stream, based on this comment. https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/services/epm/packages/_install_package.ts#L124
I also seem to remember having this conversation a while back.

@jen-huang Has moving forward with ILM policies per data stream been discussed at all on the Fleet side?

jen-huang · 2022-02-14T23:36:14Z

Hi @dominiqueclarke, it is possible* today to set different policies for different datasets (i.e. browser.network, http) by making use of the component templates that Fleet already installs. It is not possible to set them at the namespace level (i.e. http-jenNamespace). We have a proposal of how to achieve namespace-level policies, which involves creating even more component templates, but that effort is currently deferred.

This issue has more details of what we have today that enables dataset-level customization vs what we would need to achieve namespace-level customization: elastic/kibana#121118

*as soon as elastic/kibana#121184 is fixed

dominiqueclarke · 2022-02-24T21:14:36Z

As Jen mentioned, it is possible today to create different ILM policies tied to specific datasets within the integration package spec. Unfortunately, we are blocked on namespace-level customizations as mentioned above.

Draft POC: elastic/integrations#2744

This draft creates separate ILM policies for each data set, browser, browser.screenshot, browser.network, HTTP, ICMP, and TCP. We can move forward with defining a default policy for each dataset once the requirements for that policy are defined by @drewpost. Our users could then customize these default assets if desired. elastic/observability-docs#1578

Sample data stream with segmented ILM policy

drewpost · 2022-02-26T13:46:12Z

@dominiqueclarke - We have the retention period defined by data type requirements already however we didn't go into the depth of hot/cold storage tiers as this was an option that the implementation gave us. Is that storage tier definition all you need (alongside the retention periods) to define OOTB settings?

dominiqueclarke · 2022-03-09T17:00:44Z

@drewpost Sorry for the delay. That is correct.

dominiqueclarke · 2022-03-14T14:16:28Z

Findings of the spike

cc: @drewpost @paulb-elastic @andrewvc

Segmenting by data set

Segmenting by data set is possible today in the Integration Package spec. Defaults for each data set can be specified, resulting in the creation of new ILM policies for each data set and component templates for each data set pointing to the specified ILM policies.

Segmenting by namespace

Segmenting by namespace is currently in the investigation and definition phase for Fleet, with work expected to begin in a future release. Once implemented, Fleet will generate an additional component template <type>-<dataset>-<namespace>@custom, to allow user-defined customization per namespace. This feature will build upon the existing feature set allowing for segmenting by data set. More information: elastic/kibana#121118

Moving forward in 8.2.0

Defaults per data set can be specified in the Elastic Synthetics Integration package as early as 8.2.0. Establishing defaults per data set will not conflict with the enhancements coming in down the line, as the work will build upon the existing component template hierarchy used to generate index templates. @drewpost to provide the desired defaults for each data stream and data set (HTTP, ICMP, TCP, browser, browser.network, and browser.screenshot). @paulb-elastic to decide when to prioritize this work and whether we can move forward with including defaults as early as 8.2.0.

Moving forward with segmenting by namespace

Synthetics will require the ability to generate namespace-specific component templates and index templates on the fly. Uptime's UI Monitor Management and the Synthetics Service leverages Fleet-based data-stream architecture but saves monitors as saved objects instead of Fleet integration policies. Because monitors are not stored as Fleet integration policies, Fleet will not be notified by default when a user creates a new monitor with a non-default namespace.

To leverage allow Uptime to utilize the namespace segmentation feature, Fleet should expose a method on their plugin contract to generate component and index templates for a given package and namespace. The use case for Synthetics is defined here: elastic/kibana#121118 (comment)

Once exposed, Uptime will need to ensure that proper component and index templates are installed when a new monitor is saved. If the namespace of the monitor is anything but default, Synthetics will invoke the Fleet service to generate the corresponding component and index templates.

andrewvc · 2022-03-23T16:33:17Z

@dominiqueclarke thanks for digging up all those answers. It seems to me that we can create a new issue to encapsulate our ultimate plans to create a lifecycle policy for namespaces, and between that issue and #462 we can close this one out.

Does that sound right?

dominiqueclarke · 2022-03-23T17:19:50Z

@andrewvc Yep, @paulb-elastic actually already created an issue off the back of this spike #462

paulb-elastic · 2022-03-28T15:03:16Z

Thank you @dominiqueclarke for finding out how to proceed. Closing ths as discussed ^^.

paulb-elastic mentioned this issue Feb 9, 2022

Document how to retain data ILM policies elastic/synthetics#286

Closed

dominiqueclarke self-assigned this Feb 23, 2022

mostlyjason mentioned this issue Mar 14, 2022

[Fleet] Add namespace-specific index and component templates elastic/kibana#121118

Closed

5 tasks

This was referenced Mar 15, 2022

[Request] Document how to retain data with fleet integration and ILM policies elastic/observability-docs#907

Closed

Separate Index lifecycle policies for each dataset #462

Closed

paulb-elastic closed this as completed Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spike] Investigate separate Index lifecycle policies for each datastream #453

[Spike] Investigate separate Index lifecycle policies for each datastream #453

paulb-elastic commented Feb 9, 2022

dominiqueclarke commented Feb 9, 2022

jen-huang commented Feb 14, 2022

dominiqueclarke commented Feb 24, 2022 •

edited

Loading

drewpost commented Feb 26, 2022

dominiqueclarke commented Mar 9, 2022

dominiqueclarke commented Mar 14, 2022

andrewvc commented Mar 23, 2022

dominiqueclarke commented Mar 23, 2022

paulb-elastic commented Mar 28, 2022

[Spike] Investigate separate Index lifecycle policies for each datastream #453

[Spike] Investigate separate Index lifecycle policies for each datastream #453

Comments

paulb-elastic commented Feb 9, 2022

Spike Expectations

dominiqueclarke commented Feb 9, 2022

jen-huang commented Feb 14, 2022

dominiqueclarke commented Feb 24, 2022 • edited Loading

drewpost commented Feb 26, 2022

dominiqueclarke commented Mar 9, 2022

dominiqueclarke commented Mar 14, 2022

Findings of the spike

Segmenting by data set

Segmenting by namespace

Moving forward in 8.2.0

Moving forward with segmenting by namespace

andrewvc commented Mar 23, 2022

dominiqueclarke commented Mar 23, 2022

paulb-elastic commented Mar 28, 2022

dominiqueclarke commented Feb 24, 2022 •

edited

Loading