Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Play around with index_mode in Fleet #147684

Closed
wants to merge 1 commit into from

Conversation

ruflin
Copy link
Member

@ruflin ruflin commented Dec 16, 2022

The draft PR has several purpose

  • Better understand how the setup of data stream settings works on my end
  • Leave comments to further discuss the expected end behaviour
  • Learn more about the Fleet code base on my end

This PR is not expected to be merged but to have conversations.

The draft PR has several purpose

* Better understand how the setup of data stream settings works on my end
* Leave comments to further discuss the expected end behaviour
* Learn more about the Fleet code base on my end

This PR is not expected to be merged but to have conversations.
@ruflin
Copy link
Member Author

ruflin commented Dec 16, 2022

@nchaulet @kpollich Would be great to have some conversations on this PR. Have a look at the comments.

One thing I'm curious about is how do you test with real packages during development. Running your own registry on the side or do you use the package upload?

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #41 / APM API tests basic no data Service group counts with alerts "before all" hook for "returns the correct number of alerts"

Metrics [docs]

Unknown metric groups

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
fleet 61 67 +6
osquery 109 115 +6
securitySolution 440 446 +6
total +20

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
fleet 70 76 +6
osquery 110 117 +7
securitySolution 517 523 +6
total +21

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ruflin

@ruflin
Copy link
Member Author

ruflin commented Dec 16, 2022

The testing description in #140035 I think mostly answers my question around the manual testing of packages for these kind of scenarios. Seems the registry is the answer.

@kpollich
Copy link
Member

The testing description in #140035 I think mostly answers my question around the manual testing of packages for these kind of scenarios. Seems the registry is the answer.

Yes this is accurate. Fleet developers typically point at either the live epr.elastic.co registry in development, or use the one provided by the elastic-package cli's stack up command for custom package use cases. We can also run the registry from source if needed.

We also rely heavily on our test package fixtures + FTR tests here: https://github.com/elastic/kibana/tree/main/x-pack/test/fleet_api_integration/apis/fixtures/test_packages

Copy link
Member

@kpollich kpollich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin - Took a pass at answering some of the questions in your comments. @nchaulet @hop-dev @juliaElastic you will probably have some additional insight to the specifics here as well, so feel free to chime in anywhere I may have missed something. Thanks all!

@@ -65,6 +67,7 @@ export const ExperimentDatastreamSettings: React.FunctionComponent<Props> = ({
typeof syntheticSourceExperimentalValue !== 'undefined'
? syntheticSourceExperimentalValue
: isSyntheticSourceEnabledByDefault,
// I initially was looking for this but couldn't find it as I was looking for time_series. Should we align on naming?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point about alignment here. I went with tsdb because it seemed that's how the Elasticsearch teams were talking about the feature in their issues/PR's. Generally, though, I'm in favor of avoiding acronyms/abbreviations in code wherever possible so I'd be +1 for moving to time_series in code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fully honest, I find time_series pretty confusing as all data streams are time series. But unfortunately it is what the Elasticsearch team has selected so at least in the code base, I would align it. What we show as "text" in the UI can be a different story.

Comment on lines +99 to +103
// This sets the features on change, will it update data stream itself?
// Is this were rollover should happen?
// Ideally there would be a "data management" part in Fleet (or outside) where on a change
// the policy API could just make a call with the requested change and would get back if it was applied or
// not, not having to think about templates, data streams etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the frontend state tracking of the opt-in state for each experimental indexing feature. The package_policy API handles actually updating templates here:

// Handle component template/mappings updates for experimental features, e.g. synthetic source
await handleExperimentalDatastreamFeatureOptIn({ soClient, esClient, packagePolicy });

export async function handleExperimentalDatastreamFeatureOptIn({
soClient,
esClient,
packagePolicy,
}: {
soClient: SavedObjectsClientContract;
esClient: ElasticsearchClient;
packagePolicy: PackagePolicy | NewPackagePolicy;
}) {
if (!packagePolicy.package?.experimental_data_stream_features) {
return;
}
// If we're performing an update, we want to check if we actually need to perform
// an update to the component templates for the package. So we fetch the saved object
// for the package policy here to compare later.
let installation;
if (packagePolicy.package) {
installation = await getInstallation({
savedObjectsClient: soClient,
pkgName: packagePolicy.package.name,
});
}
for (const featureMapEntry of packagePolicy.package.experimental_data_stream_features) {
const existingOptIn = installation?.experimental_data_stream_features?.find(
(optIn) => optIn.data_stream === featureMapEntry.data_stream
);
const isSyntheticSourceOptInChanged =
existingOptIn?.features.synthetic_source !== featureMapEntry.features.synthetic_source;
const isTSDBOptInChanged = existingOptIn?.features.tsdb !== featureMapEntry.features.tsdb;
if (!isSyntheticSourceOptInChanged && !isTSDBOptInChanged) continue;
const componentTemplateName = `${featureMapEntry.data_stream}@package`;
const componentTemplateRes = await esClient.cluster.getComponentTemplate({
name: componentTemplateName,
});
const componentTemplate = componentTemplateRes.component_templates[0].component_template;
if (isSyntheticSourceOptInChanged) {
const body = {
template: {
...componentTemplate.template,
mappings: {
...componentTemplate.template.mappings,
_source: {
mode: featureMapEntry.features.synthetic_source ? 'synthetic' : 'stored',
},
},
},
};
await esClient.cluster.putComponentTemplate({
name: componentTemplateName,
// @ts-expect-error - TODO: Remove when ES client typings include support for synthetic source
body,
});
}
if (isTSDBOptInChanged && featureMapEntry.features.tsdb) {
const mappingsProperties = componentTemplate.template?.mappings?.properties ?? {};
// All mapped fields of type keyword and time_series_dimension enabled will be included in the generated routing path
// Temporarily generating routing_path here until fixed in elasticsearch https://github.com/elastic/elasticsearch/issues/91592
const routingPath = builRoutingPath(mappingsProperties);
if (routingPath.length === 0) continue;
const indexTemplateRes = await esClient.indices.getIndexTemplate({
name: featureMapEntry.data_stream,
});
const indexTemplate = indexTemplateRes.index_templates[0].index_template;
const indexTemplateBody = {
...indexTemplate,
template: {
...(indexTemplate.template ?? {}),
settings: {
...(indexTemplate.template?.settings ?? {}),
index: {
mode: 'time_series',
routing_path: routingPath,
},
},
},
};
await esClient.indices.putIndexTemplate({
name: featureMapEntry.data_stream,
body: indexTemplateBody,
});
}
}
// Update the installation object to persist the experimental feature map
await updateDatastreamExperimentalFeatures(
soClient,
packagePolicy.package.name,
packagePolicy.package.experimental_data_stream_features
);
// Delete the experimental features map from the package policy so it doesn't get persisted
delete packagePolicy.package.experimental_data_stream_features;
}

Is this where rollover should happen

Automatic rollover should happen in the above server-side code. I don't think it'd be too challenging, but there might be caveats of which I am unaware. Issue at #143448

Ideally there would be a "data management" part in Fleet (or outside) where on a change the policy API could just make a call with the requested change and would get back if it was applied or not, not having to think about templates, data streams etc.

I'm in agreement here. We've talked about something like a /api/fleet/customize API to handle cases like this around pushing custom configuration objects. @joshdover has a great pass at designing this kind of API at #121118, namely the Customize API section feels relevant here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this is already all started to be tracked. The reason I'm keen on #143448 to happen is that I think at the moment when a user "toggles" any of the features, the user assumes it is working now but in reality the user has to trigger a rollover first. We already do rollovers on upgrades when needed so I would assume the functionality + error handling mostly already exists.

/api/fleet/customize

I might go further and not put it into the fleet plugin. From my perspective it goes so far, that this is a data management API that should be exposed by Kibana "core" and is only used by Fleet. Need to read up on the full issue and comment there.

Comment on lines +137 to +140
// It seems this logic does check for the data stream setting. What if template and data stream are not in sync (no rollover yet)
// Is there a way in Elasticsearch to say: For data stream X, what template will apply? The reason I'm asking
// is that there is a chance that users made some modification in the settings or that for a namespace an additional
// template was set in place. So if we only look at the data stream, the info might be wrong / outdated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this logic does check for the data stream setting. What if template and data stream are not in sync (no rollover yet)

This is correct. Our current implementation does not consider whether the currently "visible" value of a given toggle has actually been applied (rolled over) to the given data stream. I think this okay-ish for now, since the features are presented quite heavily as experimental, and I expect them to be removed in favor of package spec support as the source of truth in the near future.

If we had automated rollover #143448, I think this issue would go away, as well. So, if that's low hanging fruit as I think it is, maybe that's the quick solve here.

Is there a way in Elasticsearch to say: For data stream X, what template will apply?

The reason I'm asking is that there is a chance that users made some modification in the settings or that for a namespace an additional template was set in place. So if we only look at the data stream, the info might be wrong / outdated.

This I do not know. Poking around the index template and data stream docs for Elasticserach didn't turn up much, but there might be some internal API to expose this. Or maybe there's a clever search operation we could use here. Would be interesting to look into it further, but not something I'm immediately aware of.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know in case the Fleet team is interested in having such an API, I might be able to contribute it to Elasticsearch.

@@ -143,6 +147,7 @@ export const PackagePolicyInputPanel: React.FunctionComponent<{
const match = packagePolicy.package!.experimental_data_stream_features.find(
(feat) => feat.data_stream === dsName
);
// If the user updates the TSDB setting, will this make its way to the template? Could not find where.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep happens here:

const indexTemplateBody = {
...indexTemplate,
template: {
...(indexTemplate.template ?? {}),
settings: {
...(indexTemplate.template?.settings ?? {}),
index: {
mode: 'time_series',
routing_path: routingPath,
},
},
},
};
await esClient.indices.putIndexTemplate({
name: featureMapEntry.data_stream,
body: indexTemplateBody,
});
}

@@ -538,6 +538,7 @@ export function parseDataStreamElasticsearchEntry(
parsedElasticsearchEntry.privileges = expandedElasticsearch.privileges;
}

// we parse the settings from the data stream, what abou tthe values from the index template?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not following this one. Could you provide any more info?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To decide what to show to the user, we parse the value of the current data stream setting. But what if through a package upgrade the value of the template has been adjusted but the data stream still has the old value. This partially goes back to the above discussion around triggering a rollover when this happens which would resolve this mostly.

The second missing part also what we touched on above: There is no way currently to tell which template applies to a data stream. We know it today because of the convention but if we would have templates per namespace too, it gets trickier. An ES API for it would help here.

The more I think about all these scenarios the more I think all this handling should only be temporary in Fleet itself but be done in Elasticsearch itself. Tell Elasticsearch to change a setting for a data stream and it will do all the updates and rollovers needed ...

Comment on lines +273 to +276
// TODO: Is this the place that index_mode should be added as it belongs into template.settings.index.mode
// On package upgrade, what happens if the previous package version had index_mode: time_series and now it is
// switched back? This would work for the template but not when the data stream is rolled over
// Ideally TSDB can be disabled again but is not possible today -> validation on the package side needed?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point that we could set index.mode as normal here, but the next rollover would fail. Not a very good experience around upgrading package versions in that case. We could probably catch the case where a package tried to disable tsdb after enabling it in a previous version here and throw an error, but it'd result in a package version that cannot be upgraded to. So, it makes the most sense to add some package spec validation here blocking a release that includes a change from index_mode: time_series -> default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to the Elasticsearch team about the issue that tsdb currently can't be disabled. elastic/elasticsearch#92544 was opened to make it possible to turn it off again. If this lands, this problem should go away.

@@ -281,6 +285,7 @@ export function buildComponentTemplates(params: {
(dynampingTemplate) => Object.keys(dynampingTemplate)[0]
);

// In case of index_mode: time_series this should be set to true by default except it is overwritten
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me - we should try to map out the relationship between synthetic source + time series indexing in code to make this clear. Rather have some declarative structure that describes that relationship rather than a bunch of chained conditionals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++. There should be a single place to look up and change all this logic. Similar to a state machine.

Comment on lines +185 to +187
// These are tests for source_mode set in the package. Couldn't find any integration tests around the policy part?
// Is this all covered by unit tests? I'm especially curious how an upgrade around this feature is tested?
// And how do you test it manually with uploading a package when you run from source -> run your own registry?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are tests for source_mode set in the package. Couldn't find any integration tests around the policy part?

I don't think we have integration tests for the toggling functionality/API interactions here, so those would probably be good to add.

We do have unit tests around the policy generation in

There's UI driven tests around the toggling that assert on whether we're generating the expected API payloads, defaulting to the expected values based on the existing data streams, etc here as well: https://github.com/elastic/kibana/blob/86527753a2db307138ea69e0ba04551717a91844/x-pack/plugins/fleet/public/applications/fleet/sections/agent_policy/create_package_policy_page/components/steps/components/experimental_datastream_settings.test.tsx

And how do you test it manually with uploading a package when you run from source -> run your own registry?

We create a new version of package that includes time series features, copy that new version over to a data directory, then point the local registry at that data directory. Then, the test package version is available as normal in Kibana, Fleet API, etc for testing upgrades. We don't use the upload package feature in this case. The registry service takes a config that includes locations of package manifests on disk, so we can't point it at a few directories that it'll use to source packages.

Comment on lines +16 to +17
source_mode: "synthetic"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some assertions around updates in

describe('updates all assets when updating a package to a different version', async () => {
, but I don't think there are specific integration tests around upgrades where source_mode or index_mode are toggled. Definitely worth adding some, especially since the package spec values will be the sole source of truth in the future and the UI toggles will be removed.

Copy link
Member Author

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kpollich Appreciate the time you spent to respond to all the comments in detail and share more background info. It's good to see the team is already aware of many of the challenges.

One take away for me is: Fleet has to do a lot of data management / changes to data streams, templates and from my perspective more of this should happen in Elasticsearch directly. Please reach out if you have some ideas on what you would like to see in Elasticsearch directly.

Comment on lines +99 to +103
// This sets the features on change, will it update data stream itself?
// Is this were rollover should happen?
// Ideally there would be a "data management" part in Fleet (or outside) where on a change
// the policy API could just make a call with the requested change and would get back if it was applied or
// not, not having to think about templates, data streams etc.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this is already all started to be tracked. The reason I'm keen on #143448 to happen is that I think at the moment when a user "toggles" any of the features, the user assumes it is working now but in reality the user has to trigger a rollover first. We already do rollovers on upgrades when needed so I would assume the functionality + error handling mostly already exists.

/api/fleet/customize

I might go further and not put it into the fleet plugin. From my perspective it goes so far, that this is a data management API that should be exposed by Kibana "core" and is only used by Fleet. Need to read up on the full issue and comment there.

Comment on lines +137 to +140
// It seems this logic does check for the data stream setting. What if template and data stream are not in sync (no rollover yet)
// Is there a way in Elasticsearch to say: For data stream X, what template will apply? The reason I'm asking
// is that there is a chance that users made some modification in the settings or that for a namespace an additional
// template was set in place. So if we only look at the data stream, the info might be wrong / outdated.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know in case the Fleet team is interested in having such an API, I might be able to contribute it to Elasticsearch.

@@ -538,6 +538,7 @@ export function parseDataStreamElasticsearchEntry(
parsedElasticsearchEntry.privileges = expandedElasticsearch.privileges;
}

// we parse the settings from the data stream, what abou tthe values from the index template?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To decide what to show to the user, we parse the value of the current data stream setting. But what if through a package upgrade the value of the template has been adjusted but the data stream still has the old value. This partially goes back to the above discussion around triggering a rollover when this happens which would resolve this mostly.

The second missing part also what we touched on above: There is no way currently to tell which template applies to a data stream. We know it today because of the convention but if we would have templates per namespace too, it gets trickier. An ES API for it would help here.

The more I think about all these scenarios the more I think all this handling should only be temporary in Fleet itself but be done in Elasticsearch itself. Tell Elasticsearch to change a setting for a data stream and it will do all the updates and rollovers needed ...

Comment on lines +273 to +276
// TODO: Is this the place that index_mode should be added as it belongs into template.settings.index.mode
// On package upgrade, what happens if the previous package version had index_mode: time_series and now it is
// switched back? This would work for the template but not when the data stream is rolled over
// Ideally TSDB can be disabled again but is not possible today -> validation on the package side needed?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to the Elasticsearch team about the issue that tsdb currently can't be disabled. elastic/elasticsearch#92544 was opened to make it possible to turn it off again. If this lands, this problem should go away.

@@ -281,6 +285,7 @@ export function buildComponentTemplates(params: {
(dynampingTemplate) => Object.keys(dynampingTemplate)[0]
);

// In case of index_mode: time_series this should be set to true by default except it is overwritten
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++. There should be a single place to look up and change all this logic. Similar to a state machine.

@ruflin ruflin closed this Dec 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants